Skip to content

Conversation

DannyYuyang-quic
Copy link
Contributor

@DannyYuyang-quic DannyYuyang-quic commented Sep 10, 2025

Summary:

  • e2e script for GA Static SmolLm3-3B
  • perf: 16a4w block quant token rate in kv mode: ~= 30 tokens/sec(SM8750)
  • acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset
  • add model params file & model weight converter

Test plan

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --decoder_model smollm3-3b --model_mode kv --max_seq_len 1024 --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1

Copy link

pytorch-bot bot commented Sep 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14149

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit cd3b2f4 with merge base 6ed10e5 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2025
@DannyYuyang-quic
Copy link
Contributor Author

Hi @cccclai, this is the PR for SmolLM3-3B in static version, please have a look!
Thanks!!

cc: @haowhsu-quic

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_Smollm3 branch from ac9394f to 953112e Compare September 10, 2025 14:39
@DannyYuyang-quic
Copy link
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Sep 10, 2025
@cccclai
Copy link
Contributor

cccclai commented Sep 10, 2025

There is still lint error and conflict


from torchtune.models.convert_weights import get_mapped_key

from torchtune.training import FullModelHFCheckpointer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I’ll make the changes, thanks!!

Copy link
Contributor Author

@DannyYuyang-quic DannyYuyang-quic Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jackzhxng, I’ve made the changes. Could you please review them again and let me know if they now meet your expectations? Thanks!

cc: @cccclai

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_Smollm3 branch 2 times, most recently from 3a71f1f to cd3b2f4 Compare September 11, 2025 09:01
if args.decoder_model == "smollm3-3b":
from transformers import AutoConfig

kv_config.apply_rope_layers = AutoConfig.from_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel too strongly about this but if you want to avoid having a transformers dep, what if we just did this instead - add to model_args:

https://github.com/pytorch/executorch/pull/13011/files#diff-a6c94385261aa94cd40e29fa9795a2a33b53bd42aebf2ef14886cda733bfd021R43

And in your 3b_config.json
"no_rope_layer_interval": 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I like this change!
I’ve updated the config to include "no_rope_layer_interval": 4.
Appreciate the tip!

@cccclai
Copy link
Contributor

cccclai commented Sep 11, 2025

I guess there is merge conflict again. Can you resolve it?

@cccclai
Copy link
Contributor

cccclai commented Sep 11, 2025

Since #12700 is merged, is it the last PR we need to merge?

@cccclai cccclai added this to the 1.0.0 milestone Sep 11, 2025
Summary:
- e2e script for GA Static SmolLM3-3B
 - perf: 16a4w block quant token rate in kv mode: ~= 30 tokens/sec(SM8750)
 - acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset
- add model params file & model weight converter
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/GA_static_Smollm3 branch from cd3b2f4 to 4f3d12e Compare September 11, 2025 17:00
@DannyYuyang-quic
Copy link
Contributor Author

I guess there is merge conflict again. Can you resolve it?

I've rebased the PR! Thanks!

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D82231239.

@DannyYuyang-quic
Copy link
Contributor Author

Since #12700 is merged, is it the last PR we need to merge?

@haowhsu-quic
Do we still need to wait for Gemma1 or any others~?

@cccclai
Copy link
Contributor

cccclai commented Sep 11, 2025

Since #12700 is merged, is it the last PR we need to merge?

@haowhsu-quic Do we still need to wait for Gemma1 or any others~?

okay ping me when you have it out

@cccclai
Copy link
Contributor

cccclai commented Sep 11, 2025

or add milestones 1.0.0 so we can track them
image

@cccclai cccclai merged commit 9af908d into pytorch:main Sep 11, 2025
301 of 315 checks passed
StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
Summary:
- e2e script for GA Static SmolLm3-3B
- perf: 16a4w block quant token rate in kv mode: ~= 30
tokens/sec(SM8750)
 - acc: PPL ~= (fp: 8.345 -> htp:8.976) in wikitext dataset
- add model params file & model weight converter

### Test plan
``` bash
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --decoder_model smollm3-3b --model_mode kv --max_seq_len 1024 --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants