Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b #12700

haowhsu-quic · 2025-07-22T02:20:09Z

Summary

add ptq recipe for llama3.2 1b/3b
add seq_mse support for helping quantizing 1b model
complement qnn_llama_runner for smollm2

Test Plan

 python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H $HOST -s $SN -m SM8750 --temperature 0 --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --decoder_model llama3_2-1b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --artifact ./llama_artifact --tasks wikitext --limit 1 --compile_only --params ../.llama/checkpoints/Llama3.2-1B-Instruct/params.json --tokenizer_model ../.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model --checkpoint ../.llama/checkpoints/Llama3.2-1B-Instruct/consolidated.00.pth

pytorch-bot · 2025-07-22T02:20:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12700

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 8 Pending, 1 Unrelated Failure

As of commit fd7ee71 with merge base a7eefd0 ():

NEW FAILURES - The following jobs have failed:

Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh)
pull / unittest / linux / linux-job (gh)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

haowhsu-quic · 2025-09-05T05:41:15Z

@pytorchbot label "release notes: qualcomm"

haowhsu-quic · 2025-09-05T05:42:21Z

Hi @cccclai, this PR is for introducing SeqMSE for PTQ llama3.2_1/3b_instruct / qwen3 0.6b. Please have a look, thank you.

facebook-github-bot · 2025-09-05T16:59:19Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D81791793.

cccclai · 2025-09-08T20:42:55Z

@haowhsu-quic can you rebase? I'm having issues to import. Also feel free to tag me more often this week, trying to have PRs landed asap

facebook-github-bot · 2025-09-09T00:34:31Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D81791793.

- add ptq recipe for llama3.2 1b/3b - add seq_mse support for helping quantizing 1b model - complement qnn_llama_runner for smollm2

haowhsu-quic · 2025-09-11T14:33:50Z

Hi @cccclai, please help merge this when you're available. Thank you.

facebook-github-bot · 2025-09-11T15:30:32Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D81791793.

swolchok · 2025-09-11T17:30:34Z

backends/qualcomm/_passes/seq_mse.py

+    def __init__(self):
+        super(RemoveSeqMse, self).__init__()


can't we just omit this?

### Summary - add ptq recipe for llama3.2 1b/3b - add seq_mse support for helping quantizing 1b model - complement qnn_llama_runner for smollm2 ### Test Plan ```bash python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H $HOST -s $SN -m SM8750 --temperature 0 --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --decoder_model llama3_2-1b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --artifact ./llama_artifact --tasks wikitext --limit 1 --compile_only --params ../.llama/checkpoints/Llama3.2-1B-Instruct/params.json --tokenizer_model ../.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model --checkpoint ../.llama/checkpoints/Llama3.2-1B-Instruct/consolidated.00.pth ```

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 22, 2025

cccclai mentioned this pull request Jul 22, 2025

Implemented range setting in QNN llama flow #12377

Merged

haowhsu-quic force-pushed the dev_seqmse branch from 7481013 to 3f2cad4 Compare September 5, 2025 05:36

haowhsu-quic changed the title ~~QC SeqMSE Draft~~ Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b Sep 5, 2025

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Sep 5, 2025

haowhsu-quic marked this pull request as ready for review September 5, 2025 05:41

haowhsu-quic requested a review from cccclai as a code owner September 5, 2025 05:41

Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b

e5cd418

- add ptq recipe for llama3.2 1b/3b - add seq_mse support for helping quantizing 1b model - complement qnn_llama_runner for smollm2

haowhsu-quic force-pushed the dev_seqmse branch from 429d4d9 to e5cd418 Compare September 11, 2025 07:29

haowhsu-quic added 3 commits September 11, 2025 15:32

Merge branch 'main' into dev_seqmse

c5e52fb

Merge branch 'main' into dev_seqmse

573ed7d

Merge branch 'main' into dev_seqmse

fd7ee71

cccclai approved these changes Sep 11, 2025

View reviewed changes

cccclai merged commit 2e44184 into pytorch:main Sep 11, 2025
117 of 125 checks passed

cccclai mentioned this pull request Sep 11, 2025

Qualcomm AI Engine Direct - GA Static Smollm3 3B #14149

Merged

swolchok reviewed Sep 11, 2025

View reviewed changes

DannyYuyang-quic mentioned this pull request Sep 15, 2025

Qualcomm AI Engine Direct - add seq_mse_candidates setting to SmolLM3 and fixed a bug in the graph drawer #14295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b #12700

Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b #12700

Uh oh!

haowhsu-quic commented Jul 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 22, 2025 •

edited

Loading

Uh oh!

haowhsu-quic commented Sep 5, 2025

Uh oh!

haowhsu-quic commented Sep 5, 2025

Uh oh!

facebook-github-bot commented Sep 5, 2025

Uh oh!

cccclai commented Sep 8, 2025

Uh oh!

facebook-github-bot commented Sep 9, 2025

Uh oh!

haowhsu-quic commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

Uh oh!

swolchok Sep 11, 2025

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b #12700

Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b #12700

Uh oh!

Conversation

haowhsu-quic commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

pytorch-bot bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12700

❌ 2 New Failures, 8 Pending, 1 Unrelated Failure

Uh oh!

haowhsu-quic commented Sep 5, 2025

Uh oh!

haowhsu-quic commented Sep 5, 2025

Uh oh!

facebook-github-bot commented Sep 5, 2025

Uh oh!

cccclai commented Sep 8, 2025

Uh oh!

facebook-github-bot commented Sep 9, 2025

Uh oh!

haowhsu-quic commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

Uh oh!

swolchok Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

haowhsu-quic commented Jul 22, 2025 •

edited

Loading

pytorch-bot bot commented Jul 22, 2025 •

edited

Loading