-
Notifications
You must be signed in to change notification settings - Fork 683
Qualcomm AI Engine Direct - PTQ for llama3.2 1b/3b #12700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12700
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 8 Pending, 1 Unrelated FailureAs of commit fd7ee71 with merge base a7eefd0 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
7481013
to
3f2cad4
Compare
@pytorchbot label "release notes: qualcomm" |
Hi @cccclai, this PR is for introducing SeqMSE for PTQ llama3.2_1/3b_instruct / qwen3 0.6b. Please have a look, thank you. |
@haowhsu-quic can you rebase? I'm having issues to import. Also feel free to tag me more often this week, trying to have PRs landed asap |
- add ptq recipe for llama3.2 1b/3b - add seq_mse support for helping quantizing 1b model - complement qnn_llama_runner for smollm2
429d4d9
to
e5cd418
Compare
Hi @cccclai, please help merge this when you're available. Thank you. |
def __init__(self): | ||
super(RemoveSeqMse, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just omit this?
### Summary - add ptq recipe for llama3.2 1b/3b - add seq_mse support for helping quantizing 1b model - complement qnn_llama_runner for smollm2 ### Test Plan ```bash python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H $HOST -s $SN -m SM8750 --temperature 0 --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --decoder_model llama3_2-1b_instruct --prompt "I would like to learn python, could you teach me with a simple example?" --artifact ./llama_artifact --tasks wikitext --limit 1 --compile_only --params ../.llama/checkpoints/Llama3.2-1B-Instruct/params.json --tokenizer_model ../.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model --checkpoint ../.llama/checkpoints/Llama3.2-1B-Instruct/consolidated.00.pth ```
Summary
Test Plan