Qualcomm AI Engine Direct - GA Static Smollm3 3B #14149

DannyYuyang-quic · 2025-09-10T14:32:30Z

Summary:

e2e script for GA Static SmolLm3-3B
perf: 16a4w block quant token rate in kv mode: ~= 30 tokens/sec(SM8750)
acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset
add model params file & model weight converter

Test plan

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --decoder_model smollm3-3b --model_mode kv --max_seq_len 1024 --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1

pytorch-bot · 2025-09-10T14:32:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14149

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit cd3b2f4 with merge base 6ed10e5 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-openvino-linux / linux-job (gh) (similar failure)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh) (similar failure)
pull / unittest-buck / macos / macos-job (gh) (similar failure)
pull / unittest-editable / macos / macos-job (gh) (similar failure)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (trunk failure)
RuntimeError: Failed to install QNN SDK. Please check the logs above.
Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64
pull / test-binary-size-linux-gcc / linux-job (gh) (trunk failure)
pull / test-moshi-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-setup-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

DannyYuyang-quic · 2025-09-10T14:34:50Z

Hi @cccclai, this is the PR for SmolLM3-3B in static version, please have a look!
Thanks!!

cc: @haowhsu-quic

DannyYuyang-quic · 2025-09-10T14:39:55Z

@pytorchbot label "release notes: qualcomm"

cccclai · 2025-09-10T19:28:59Z

There is still lint error and conflict

jackzhxng · 2025-09-10T19:34:10Z

examples/models/smollm3/convert_weights.py

+
+from torchtune.models.convert_weights import get_mapped_key
+
+from torchtune.training import FullModelHFCheckpointer


Can we use this instead? I'd like to move away from TorchTune if possible

https://github.com/pytorch/executorch/pull/13011/files#diff-791e820b8f381c55f72cd81f161de5b514d0b3c7139f4dda3fcbe5a2522f2299R1

Sounds good! I’ll make the changes, thanks!!

Hi @jackzhxng, I’ve made the changes. Could you please review them again and let me know if they now meet your expectations? Thanks!

cc: @cccclai

jackzhxng · 2025-09-11T15:21:01Z

examples/qualcomm/oss_scripts/llama/llama.py

+    if args.decoder_model == "smollm3-3b":
+        from transformers import AutoConfig
+
+        kv_config.apply_rope_layers = AutoConfig.from_pretrained(


I don't feel too strongly about this but if you want to avoid having a transformers dep, what if we just did this instead - add to model_args:

https://github.com/pytorch/executorch/pull/13011/files#diff-a6c94385261aa94cd40e29fa9795a2a33b53bd42aebf2ef14886cda733bfd021R43

And in your 3b_config.json
"no_rope_layer_interval": 4

Thanks for the suggestion, I like this change!
I’ve updated the config to include "no_rope_layer_interval": 4.
Appreciate the tip!

cccclai · 2025-09-11T16:14:15Z

I guess there is merge conflict again. Can you resolve it?

cccclai · 2025-09-11T16:14:42Z

Since #12700 is merged, is it the last PR we need to merge?

Summary: - e2e script for GA Static SmolLM3-3B - perf: 16a4w block quant token rate in kv mode: ~= 30 tokens/sec(SM8750) - acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset - add model params file & model weight converter

DannyYuyang-quic · 2025-09-11T17:02:07Z

I guess there is merge conflict again. Can you resolve it?

I've rebased the PR! Thanks!

facebook-github-bot · 2025-09-11T17:05:04Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D82231239.

DannyYuyang-quic · 2025-09-11T17:05:41Z

Since #12700 is merged, is it the last PR we need to merge?

@haowhsu-quic
Do we still need to wait for Gemma1 or any others~?

cccclai · 2025-09-11T17:12:38Z

Since #12700 is merged, is it the last PR we need to merge?

@haowhsu-quic Do we still need to wait for Gemma1 or any others~?

okay ping me when you have it out

cccclai · 2025-09-11T17:13:35Z

or add milestones 1.0.0 so we can track them

Summary: - e2e script for GA Static SmolLm3-3B - perf: 16a4w block quant token rate in kv mode: ~= 30 tokens/sec(SM8750) - acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset - add model params file & model weight converter ### Test plan ``` bash python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --decoder_model smollm3-3b --model_mode kv --max_seq_len 1024 --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1 ```

DannyYuyang-quic requested review from cccclai, lucylq and jackzhxng as code owners September 10, 2025 14:32

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2025

DannyYuyang-quic force-pushed the dev1/danny/GA_static_Smollm3 branch from ac9394f to 953112e Compare September 10, 2025 14:39

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Sep 10, 2025

jackzhxng reviewed Sep 10, 2025

View reviewed changes

DannyYuyang-quic force-pushed the dev1/danny/GA_static_Smollm3 branch 2 times, most recently from 3a71f1f to cd3b2f4 Compare September 11, 2025 09:01

jackzhxng approved these changes Sep 11, 2025

View reviewed changes

jackzhxng added the ciflow/trunk label Sep 11, 2025

cccclai added this to the 1.0.0 milestone Sep 11, 2025

DannyYuyang-quic added 2 commits September 12, 2025 00:42

Qualcomm AI Engine Direct - GA Static SmolLM3 3B

d38eb08

Summary: - e2e script for GA Static SmolLM3-3B - perf: 16a4w block quant token rate in kv mode: ~= 30 tokens/sec(SM8750) - acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset - add model params file & model weight converter

add no_rope_layer_interval into config

4f3d12e

DannyYuyang-quic force-pushed the dev1/danny/GA_static_Smollm3 branch from cd3b2f4 to 4f3d12e Compare September 11, 2025 17:00

pytorch-bot bot removed the ciflow/trunk label Sep 11, 2025

cccclai added the ciflow/trunk label Sep 11, 2025

cccclai merged commit 9af908d into pytorch:main Sep 11, 2025
301 of 315 checks passed


		from torchtune.models.convert_weights import get_mapped_key

		from torchtune.training import FullModelHFCheckpointer

Qualcomm AI Engine Direct - GA Static Smollm3 3B #14149

Qualcomm AI Engine Direct - GA Static Smollm3 3B #14149

Uh oh!

Conversation

DannyYuyang-quic commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

pytorch-bot bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14149

✅ You can merge normally! (9 Unrelated Failures)

Uh oh!

DannyYuyang-quic commented Sep 10, 2025

Uh oh!

DannyYuyang-quic commented Sep 10, 2025

Uh oh!

cccclai commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackzhxng Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

DannyYuyang-quic Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

DannyYuyang-quic Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackzhxng Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

DannyYuyang-quic Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai commented Sep 11, 2025

Uh oh!

cccclai commented Sep 11, 2025

Uh oh!

DannyYuyang-quic commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

DannyYuyang-quic commented Sep 11, 2025

Uh oh!

cccclai commented Sep 11, 2025

Uh oh!

cccclai commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

DannyYuyang-quic commented Sep 10, 2025 •

edited

Loading

pytorch-bot bot commented Sep 10, 2025 •

edited

Loading

cccclai commented Sep 10, 2025 •

edited

Loading

DannyYuyang-quic Sep 11, 2025 •

edited

Loading