Update QCOM llama hardware support #15965

cccclai · 2025-11-24T19:39:50Z

Added hardware support details and memory limit error handling instructions.

It seems like lots of users try to use the llama flow on other hardware other than phones. Let's try to document it first

pytorch-bot · 2025-11-24T19:39:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15965

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Dr CI is temporarily not working due to API fairewall

✅ No Failures

As of commit 8fe731e with merge base 12d17ef ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-11-24T19:40:32Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

shewu-quic

Thanks to make document better!

shewu-quic · 2025-11-25T01:07:08Z

examples/qualcomm/oss_scripts/llama/README.md

+
+We’ve validated this flow on the **Samsung Galaxy S23**, **Samsung Galaxy S24**, and **OnePlus 12**.  
+Support on other hardware depends on the **HTP architecture (HtpArch)**.  
+The **16a4w_block** format and **weight sharing between prefill and decode** are supported on **V73 and newer**.


Minimum requirement of LPBQ is v69.
Minimum requirement of weight sharing is v73
Minimum requirement of 16bit activation and 16 bit weight for matmul is V73. (16 bit kv)

Updated, what do you think?

shewu-quic · 2025-11-25T01:08:27Z

examples/qualcomm/oss_scripts/llama/README.md

+For older devices, you may need to **retune the quantization recipe**. A good starting point is:
+
+- Use **16a4w**
+- Optionally apply **SpinQuant** for better stability and accuracy


Partially layers apply 16a8w to get better accuracy.

shewu-quic

LGTM. Thanks!

Added hardware support details and memory limit error handling instructions.

Updated hardware support section to include Samsung Galaxy S25.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2025

cccclai mentioned this pull request Nov 24, 2025

qnn_llama_runner on SA8295 outputs repetitive “sp” with Qwen3-1.7B after ExecuTorch export #15954

Closed

cccclai requested review from chenweng-quic, haowhsu-quic, shewu-quic and winskuo-quic November 24, 2025 19:40

shewu-quic reviewed Nov 25, 2025

View reviewed changes

shewu-quic approved these changes Nov 26, 2025

View reviewed changes

Gasoonjia approved these changes Nov 26, 2025

View reviewed changes

cccclai added 4 commits December 1, 2025 11:27

Update QCOM llama hardware support

18d9c69

Added hardware support details and memory limit error handling instructions.

Update README.md

66859c7

Update README with correct LPBQ version requirement

6773d3b

Add Samsung Galaxy S25 to hardware support list

8fe731e

Updated hardware support section to include Samsung Galaxy S25.

cccclai force-pushed the cccclai-patch-12 branch from a70d000 to 8fe731e Compare December 1, 2025 19:27

cccclai merged commit 60a2bd6 into main Dec 1, 2025
165 checks passed

cccclai deleted the cccclai-patch-12 branch December 1, 2025 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update QCOM llama hardware support #15965

Update QCOM llama hardware support #15965

Uh oh!

cccclai commented Nov 24, 2025

Uh oh!

pytorch-bot bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

shewu-quic left a comment

Uh oh!

shewu-quic Nov 25, 2025

Uh oh!

cccclai Nov 25, 2025

Uh oh!

shewu-quic Nov 25, 2025

Uh oh!

cccclai Nov 25, 2025

Uh oh!

shewu-quic left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update QCOM llama hardware support #15965

Update QCOM llama hardware support #15965

Uh oh!

Conversation

cccclai commented Nov 24, 2025

Uh oh!

pytorch-bot bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15965

❗ 1 Active SEVs

✅ No Failures

Uh oh!

github-actions bot commented Nov 24, 2025

This PR needs a release notes: label

Uh oh!

shewu-quic left a comment

Choose a reason for hiding this comment

Uh oh!

shewu-quic Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

shewu-quic Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

shewu-quic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Nov 24, 2025 •

edited

Loading

This PR needs a `release notes:` label