Skip to content

Conversation

haowhsu-quic
Copy link
Collaborator

@haowhsu-quic haowhsu-quic commented Jul 22, 2025

Summary

  • add ptq recipe for llama3.2 1b/3b
  • add seq_mse support for helping quantizing 1b model
  • complement qnn_llama_runner for smollm2

Test Plan

 python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H $HOST -s $SN -m SM8750 --temperature 0 --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --decoder_model llama3_2-1b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --artifact ./llama_artifact --tasks wikitext --limit 1 --compile_only --params ../.llama/checkpoints/Llama3.2-1B-Instruct/params.json --tokenizer_model ../.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model --checkpoint ../.llama/checkpoints/Llama3.2-1B-Instruct/consolidated.00.pth

Copy link

pytorch-bot bot commented Jul 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12700

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 8 Pending, 1 Unrelated Failure

As of commit fd7ee71 with merge base a7eefd0 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 22, 2025
@haowhsu-quic haowhsu-quic changed the title QC SeqMSE Draft Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b Sep 5, 2025
@haowhsu-quic
Copy link
Collaborator Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Sep 5, 2025
@haowhsu-quic haowhsu-quic marked this pull request as ready for review September 5, 2025 05:41
@haowhsu-quic
Copy link
Collaborator Author

Hi @cccclai, this PR is for introducing SeqMSE for PTQ llama3.2_1/3b_instruct / qwen3 0.6b. Please have a look, thank you.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D81791793.

@cccclai
Copy link
Contributor

cccclai commented Sep 8, 2025

@haowhsu-quic can you rebase? I'm having issues to import. Also feel free to tag me more often this week, trying to have PRs landed asap

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D81791793.

- add ptq recipe for llama3.2 1b/3b
- add seq_mse support for helping quantizing 1b model
- complement qnn_llama_runner for smollm2
@haowhsu-quic
Copy link
Collaborator Author

Hi @cccclai, please help merge this when you're available. Thank you.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D81791793.

@cccclai cccclai merged commit 2e44184 into pytorch:main Sep 11, 2025
117 of 125 checks passed
Comment on lines +185 to +186
def __init__(self):
super(RemoveSeqMse, self).__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just omit this?

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
### Summary
- add ptq recipe for llama3.2 1b/3b
- add seq_mse support for helping quantizing 1b model
- complement qnn_llama_runner for smollm2

### Test Plan
```bash
 python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H $HOST -s $SN -m SM8750 --temperature 0 --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --decoder_model llama3_2-1b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --artifact ./llama_artifact --tasks wikitext --limit 1 --compile_only --params ../.llama/checkpoints/Llama3.2-1B-Instruct/params.json --tokenizer_model ../.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model --checkpoint ../.llama/checkpoints/Llama3.2-1B-Instruct/consolidated.00.pth
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants