Qualcomm AI Engine Direct - change the llama tutorial to static llama version #14887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

cccclai merged 2 commits into pytorch:main from CodeLinaro:dev1/danny/change_to_static_llama_tutorial

Oct 9, 2025

+60 −105

Contributor

DannyYuyang-quic commented Oct 8, 2025 •

edited by pytorch-bot bot

Loading

Summary

change the llama tutorial to static llama version

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @cbilgin


          Qualcomm AI Engine Direct - change the llama tutorial to static llama…

23ecb9a

… version

DannyYuyang-quic requested a review from mergennachin as a code owner

October 8, 2025 09:30

pytorch-bot bot commented Oct 8, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14887

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 3c15e00 with merge base 400b2a5 ():

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t fa40112103dee383911d53c3bd72faea2bd69a7cb5462c5067716f034eb4e1fd /exec failed with exit code 1
Test CUDA Builds / test-voxtral-cuda-e2e / linux-job (gh)
RuntimeError: Command docker exec -t 190677d2deed4c42d06b2daf9470735a3c63532346550c6e5407824ed9b2eb3f /exec failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed label

Contributor Author

DannyYuyang-quic commented Oct 8, 2025

@pytorchbot label "release notes: qualcomm"

pytorch-bot bot added the release notes: qualcomm label

Contributor Author

DannyYuyang-quic commented Oct 8, 2025

Hi @cccclai, we’re considering updating the Llama 8B tutorial to Llama 3B Instruct with the static llama version, since we’re currently validating the static Llama 3B Instruct setup.
What’s your perspective on this?

Thanks!!

cc: @haowhsu-quic

mergennachin requested review from abhinaykukkadapu, cccclai and kimishpatel

October 8, 2025 12:42

mergennachin added the partner: qualcomm label

abhinaykukkadapu reviewed

View reviewed changes

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md Outdated

    
                  # The Llama3_2 enabled should be instruct, however, Llama's tokenizer does not provide utility to apply chat template.

                  instruct_model = False

                  num_sharding = 1

Contributor

abhinaykukkadapu Oct 8, 2025

Should num_sharding here be 4, according to the steps above?

Contributor

cccclai Oct 8, 2025

It's for 8B model only

Contributor Author

DannyYuyang-quic Oct 9, 2025

I mistakenly pasted the config for the 1B model. It should be for the 3B model instead. I’ll update the config accordingly

Contributor Author

DannyYuyang-quic Oct 9, 2025

updated the config.

cccclai reviewed

View reviewed changes

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md Outdated Show resolved Hide resolved

cccclai reviewed

View reviewed changes

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md Show resolved Hide resolved

cccclai reviewed

View reviewed changes

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md Outdated

    
              If you encounter any issues while reproducing the tutorial, please file a github

              issue on ExecuTorch repo and tag use `#qcom_aisw` tag

              issue on ExecuTorch repo and tag use `#qcom_aisw` tag

Contributor

cccclai Oct 8, 2025

Add a link to https://github.com/pytorch/executorch/issues

cccclai approved these changes

View reviewed changes

kimishpatel reviewed

View reviewed changes

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md Outdated

    
              3. SeqMSE Quantization: optimizes the parameter encodings of each layer of a model individually to minimize the difference between the layer’s original and quantized outputs. SeqMSE uses a search-based approach with `seq_mse_candidates` = 1000. (Implementation details: [SeqMSE pass](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/_passes/seq_mse.py))

              4. Model Sharding: Set `num_sharding` = 4 to shard the model into sub-parts. This helps reduce memory pressure and improve performance during on-device inference.

Contributor

kimishpatel Oct 8, 2025

qualify this comment to suggest # of shards might be different depending on the model size

Contributor Author

DannyYuyang-quic Oct 9, 2025

Thanks for the suggestion! I’ll add the comment.

kimishpatel reviewed

View reviewed changes

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md

    
              -- artifact/

                 └── llama_qnn.pte

              **3.3 Upload model, tokenizer and llama runner binary to phone**

Contributor

kimishpatel Oct 8, 2025

why these instructions are removed?

Contributor Author

DannyYuyang-quic Oct 9, 2025

Because the upload step is already included in the script


          add issue link and refine the comments

3c15e00

cccclai merged commit bdc526b into pytorch:main

133 of 135 checks passed

Contributor

cccclai commented Oct 9, 2025

@pytorchbot cherry-pick --onto release/1.0 -c docs

pytorchbot pushed a commit that referenced this pull request


          Qualcomm AI Engine Direct - change the llama tutorial to static llama…

c732108

… version (#14887)

### Summary
change the llama tutorial to static llama version

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @cbilgin

(cherry picked from commit bdc526b)

Collaborator

pytorchbot commented Oct 9, 2025

Cherry picking #14887

The cherry pick PR is at #14949 The following tracker issues are updated:

[v1.0.0] Release Tracker #14288 (comment)

Details for Dev Infra team

Raised by workflow job

pytorchbot mentioned this pull request

[v1.0.0] Release Tracker #14288

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

abhinaykukkadapu abhinaykukkadapu left review comments

kimishpatel kimishpatel left review comments

cccclai cccclai approved these changes

mergennachin Awaiting requested review from mergennachin mergennachin is a code owner

+1 more reviewer

haowhsu-quic haowhsu-quic left review comments

Labels

CLA Signed partner: qualcomm release notes: qualcomm