docs: add Android LLM runner page and HuggingFace #19611
Conversation
Mirrors the structure of run-on-ios.md for the executorch-android AAR. Covers LlmModule, LlmModuleConfig builder, LlmGenerationConfig, LlmCallback (onResult/onStats/onError), load/stop/resetContext, and multimodal prefill (images via int[]/ByteBuffer/float[], prefillNormalizedImage, prefillAudio, prefillRawAudio).
Slots the new Android page between the Qualcomm guide and run-on-ios so it appears in the LLM section sidebar.
…ch#8790) Replaces github.com/meta-pytorch/executorch-examples links for Android and iOS with the in-docs run-on-android.md and run-on-ios.md pages so the Running section stays inside the docs.
Surfaces the Hugging Face export path from the main Model Export and Lowering page via a tip admonition under Model Preparation, pointing users to llm/export-llm-optimum.md before the manual export walkthrough.
…ytorch#8790) Drops the stale Export Methods / Method 1 framing (only the CLI method is documented) and promotes the now-orphaned h4 headings up one level. Updates the Running on Device section to link the new in-docs Android page and existing iOS page, with the LlamaDemo and etLLM sample apps preserved inline.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19611
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Adds first-party documentation for running LLMs on Android via the executorch-android AAR and improves the Hugging Face (Optimum ExecuTorch) discovery flow across the LLM docs.
Changes:
- Added a new Android LLM runner guide (
run-on-android.md) documentingLlmModule,LlmModuleConfig,LlmGenerationConfig, callbacks, and multimodal prefill APIs. - Updated LLM docs navigation to include the Android guide and adjusted “Getting Started” running links to point to in-doc pages.
- Improved Hugging Face export guidance by adding an Optimum tip in the export docs and cleaning up the Optimum export page (headings + device-running links).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| docs/source/using-executorch-export.md | Adds a tip directing Hugging Face Hub users to the Optimum ExecuTorch export flow. |
| docs/source/llm/working-with-llms.md | Inserts run-on-android into the LLM toctree. |
| docs/source/llm/run-on-android.md | New Android Java runtime guide for LlmModule + configs + multimodal prefill APIs. |
| docs/source/llm/getting-started.md | Updates “Running” links to point to in-doc Android/iOS pages. |
| docs/source/llm/export-llm-optimum.md | Renames/restructures CLI export section and updates “Running on device” links to new docs pages. |
Comments suppressed due to low confidence (2)
docs/source/llm/run-on-android.md:113
LlmGenerationConfigdocs listmaxNewTokensandwarmingas supported generation parameters, butLlmModule.generate(String, LlmGenerationConfig, ...)currently only readsseqLen,echo,temperature,numBos, andnumEos(it never usesmaxNewTokens/warming). This makes the documentation misleading because setting those fields has no effect. Consider either updating the doc to call out which fields are currently honored, or updating the Java binding/native call to plumbmaxNewTokens/warmingthrough if supported by the underlying runner.
For full control over generation parameters, use `LlmGenerationConfig`:
```java
LlmGenerationConfig genConfig = LlmGenerationConfig.create()
.seqLen(2048)
.temperature(0.8f)
.echo(false)
.build();
module.generate("Once upon a time", genConfig, callback);
LlmGenerationConfig exposes echo, maxNewTokens, seqLen, temperature, numBos, numEos, and warming. Defaults match the C++ GenerationConfig documented in Running LLMs with C++.
**docs/source/llm/run-on-android.md:164**
* In the normalized-image `ByteBuffer` example, after writing floats into `floatBuffer` the buffer position will typically be at the end, so calling `prefillNormalizedImage(floatBuffer, ...)` will fail validation due to insufficient `remaining()` bytes. The example should reset the position (e.g., `flip()`/`rewind()`) after filling the buffer, similar to the raw-byte example above.
ByteBuffer floatBuffer = ByteBuffer
.allocateDirect(3 * 336 * 336 * Float.BYTES)
.order(ByteOrder.nativeOrder());
// fill floatBuffer with normalized values, then:
module.prefillNormalizedImage(floatBuffer, 336, 336, 3);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LlmModule module = new LlmModule(config); | ||
| ``` | ||
|
|
||
| Available load modes are `LOAD_MODE_FILE`, `LOAD_MODE_MMAP` (default), `LOAD_MODE_MMAP_USE_MLOCK`, and `LOAD_MODE_MMAP_USE_MLOCK_IGNORE_ERRORS`. Available model types are `MODEL_TYPE_TEXT`, `MODEL_TYPE_TEXT_VISION`, and `MODEL_TYPE_MULTIMODAL`. |
Summary
New
docs/source/llm/run-on-android.md, a Java reference for theexecutorch-androidAAR runner. Same shape asrun-on-ios.md. CoversLlmModule, theLlmModuleConfigbuilder,LlmGenerationConfig, theLlmCallbackmethods,load/stop/resetContext, and the image/audio prefill variants. Points at LlamaDemo.Added
run-on-androidto the LLM toctree inworking-with-llms.md, sitting between the Qualcomm page and iOS.In
getting-started.md, swapped the two GitHub example links for the in-docs Android and iOS pages so users stay in the docs.Added a tip admonition to
using-executorch-export.mdunder Model Preparation, sending HF Hub users toexport-llm-optimum.mdbefore the manual flow.Cleaned up
export-llm-optimum.md. Removed the leftover "Method 1" framing since only the CLI path is documented, bumped the orphaned subheadings up a level, and pointed the Running on Device links at the new Android page and the existing iOS page (sample apps kept inline).Fixes #8790
cc @mergennachin @AlannaBurke @larryliu0820 @cccclai @helunwencser @jackzhxng @byjlw