docs: add Android LLM runner page and HuggingFace by omkar-334 · Pull Request #19611 · pytorch/executorch

omkar-334 · 2026-05-15T06:25:48Z

Summary

New docs/source/llm/run-on-android.md, a Java reference for the executorch-android AAR runner. Same shape as run-on-ios.md. Covers LlmModule, the LlmModuleConfig builder, LlmGenerationConfig, the LlmCallback methods, load/stop/resetContext, and the image/audio prefill variants. Points at LlamaDemo.
Added run-on-android to the LLM toctree in working-with-llms.md, sitting between the Qualcomm page and iOS.
In getting-started.md, swapped the two GitHub example links for the in-docs Android and iOS pages so users stay in the docs.
Added a tip admonition to using-executorch-export.md under Model Preparation, sending HF Hub users to export-llm-optimum.md before the manual flow.
Cleaned up export-llm-optimum.md. Removed the leftover "Method 1" framing since only the CLI path is documented, bumped the orphaned subheadings up a level, and pointed the Running on Device links at the new Android page and the existing iOS page (sample apps kept inline).

cc @mergennachin @AlannaBurke @larryliu0820 @cccclai @helunwencser @jackzhxng @byjlw

Mirrors the structure of run-on-ios.md for the executorch-android AAR. Covers LlmModule, LlmModuleConfig builder, LlmGenerationConfig, LlmCallback (onResult/onStats/onError), load/stop/resetContext, and multimodal prefill (images via int[]/ByteBuffer/float[], prefillNormalizedImage, prefillAudio, prefillRawAudio).

Slots the new Android page between the Qualcomm guide and run-on-ios so it appears in the LLM section sidebar.

…ch#8790) Replaces github.com/meta-pytorch/executorch-examples links for Android and iOS with the in-docs run-on-android.md and run-on-ios.md pages so the Running section stays inside the docs.

Surfaces the Hugging Face export path from the main Model Export and Lowering page via a tip admonition under Model Preparation, pointing users to llm/export-llm-optimum.md before the manual export walkthrough.

…ytorch#8790) Drops the stale Export Methods / Method 1 framing (only the CLI method is documented) and promotes the now-orphaned h4 headings up one level. Updates the Running on Device section to link the new in-docs Android page and existing iOS page, with the LlamaDemo and etLLM sample apps preserved inline.

pytorch-bot · 2026-05-15T06:25:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19611

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-15T06:26:46Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Adds first-party documentation for running LLMs on Android via the executorch-android AAR and improves the Hugging Face (Optimum ExecuTorch) discovery flow across the LLM docs.

Changes:

Added a new Android LLM runner guide (run-on-android.md) documenting LlmModule, LlmModuleConfig, LlmGenerationConfig, callbacks, and multimodal prefill APIs.
Updated LLM docs navigation to include the Android guide and adjusted “Getting Started” running links to point to in-doc pages.
Improved Hugging Face export guidance by adding an Optimum tip in the export docs and cleaning up the Optimum export page (headings + device-running links).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
docs/source/using-executorch-export.md	Adds a tip directing Hugging Face Hub users to the Optimum ExecuTorch export flow.
docs/source/llm/working-with-llms.md	Inserts `run-on-android` into the LLM toctree.
docs/source/llm/run-on-android.md	New Android Java runtime guide for `LlmModule` + configs + multimodal prefill APIs.
docs/source/llm/getting-started.md	Updates “Running” links to point to in-doc Android/iOS pages.
docs/source/llm/export-llm-optimum.md	Renames/restructures CLI export section and updates “Running on device” links to new docs pages.

Comments suppressed due to low confidence (2)

docs/source/llm/run-on-android.md:113

LlmGenerationConfig docs list maxNewTokens and warming as supported generation parameters, but LlmModule.generate(String, LlmGenerationConfig, ...) currently only reads seqLen, echo, temperature, numBos, and numEos (it never uses maxNewTokens / warming). This makes the documentation misleading because setting those fields has no effect. Consider either updating the doc to call out which fields are currently honored, or updating the Java binding/native call to plumb maxNewTokens/warming through if supported by the underlying runner.

For full control over generation parameters, use `LlmGenerationConfig`:

```java
LlmGenerationConfig genConfig = LlmGenerationConfig.create()
    .seqLen(2048)
    .temperature(0.8f)
    .echo(false)
    .build();

module.generate("Once upon a time", genConfig, callback);

LlmGenerationConfig exposes echo, maxNewTokens, seqLen, temperature, numBos, numEos, and warming. Defaults match the C++ GenerationConfig documented in Running LLMs with C++.

**docs/source/llm/run-on-android.md:164**
* In the normalized-image `ByteBuffer` example, after writing floats into `floatBuffer` the buffer position will typically be at the end, so calling `prefillNormalizedImage(floatBuffer, ...)` will fail validation due to insufficient `remaining()` bytes. The example should reset the position (e.g., `flip()`/`rewind()`) after filling the buffer, similar to the raw-byte example above.

ByteBuffer floatBuffer = ByteBuffer
.allocateDirect(3 * 336 * 336 * Float.BYTES)
.order(ByteOrder.nativeOrder());
// fill floatBuffer with normalized values, then:
module.prefillNormalizedImage(floatBuffer, 336, 336, 3);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+LlmModule module = new LlmModule(config);
+```
+
+Available load modes are `LOAD_MODE_FILE`, `LOAD_MODE_MMAP` (default), `LOAD_MODE_MMAP_USE_MLOCK`, and `LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS`. Available model types are `MODEL_TYPE_TEXT`, `MODEL_TYPE_TEXT_VISION`, and `MODEL_TYPE_MULTIMODAL`.


omkar-334 added 5 commits May 15, 2026 11:52

[DOC] Add run-on-android to LLM toctree (pytorch#8790)

f97f410

Slots the new Android page between the Qualcomm guide and run-on-ios so it appears in the LLM section sidebar.

[DOC] Point LLM getting-started runtime links to in-docs pages (pytor…

b7dc058

…ch#8790) Replaces github.com/meta-pytorch/executorch-examples links for Android and iOS with the in-docs run-on-android.md and run-on-ios.md pages so the Running section stays inside the docs.

[DOC] Add Optimum ExecuTorch callout on export page (pytorch#8790)

260b1a0

Surfaces the Hugging Face export path from the main Model Export and Lowering page via a tip admonition under Model Preparation, pointing users to llm/export-llm-optimum.md before the manual export walkthrough.

Copilot AI review requested due to automatic review settings May 15, 2026 06:25

omkar-334 requested a review from mergennachin as a code owner May 15, 2026 06:25

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2026

Merge branch 'main' into docs-hf

37d0f68

Copilot started reviewing on behalf of omkar-334 May 15, 2026 06:26 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add Android LLM runner page and HuggingFace #19611

docs: add Android LLM runner page and HuggingFace #19611
omkar-334 wants to merge 6 commits into
pytorch:mainfrom
omkar-334:docs-hf

omkar-334 commented May 15, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

omkar-334 commented May 15, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot Bot commented May 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19611

❗ 1 Active SEVs

Uh oh!

github-actions Bot commented May 15, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omkar-334 commented May 15, 2026 •

edited by pytorch-bot Bot

Loading

This PR needs a `release notes:` label