Qualcomm AI Engine Direct - Reland GA Static QWEN2.5 0.5B #12582

winskuo-quic · 2025-07-17T06:19:53Z

Summary

Previous PR were merged unintentionally: #12054
On top of previous PR, also had a new commit addressing code review.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

pytorch-bot · 2025-07-17T06:19:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12582

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 7272556 with merge base e81727a ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t b447f32418db3f8038ace35b610a8480d5b73e1d20759612dabb9b337d94221f /exec failed with exit code 5
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 9b6221ea82622ef2d6da4d8ef6519b204cd5d07cd4abef44040a6c19a0ec9838 /exec failed with exit code 5

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 5
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 5

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-07-17T06:20:31Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

winskuo-quic · 2025-07-17T12:03:52Z

Hi @cccclai,
This PR is to reland the reverted PR: #12054
I also have a new commit on top of original PR that is addressing issues discussed during code review, which includes:

Contributing to the CPU config instead of converting the config ourselves so that all backends can be using the config. I have created a new .json file for qwen2.5 0.5b since codebase now only has qwen2.5 1.5b config.
In previous PR, we mentioned that tokenizer went into runtime error on device. This issue is resolved; however, another issue related to normalizer shows up. I believe it is resolved in tokenizer's mainline: Support NFC Normalizer meta-pytorch/tokenizers#104. However, since ExecuTorch is still using tokenizer without this patch(Support NFC Normalizer meta-pytorch/tokenizers#104), it will still run into error. To work around, I have popped the normalizer key for now, and I can remove the workaround once ExecuTorch uses newer version of tokenizer that has the issue resolved.

cccclai · 2025-07-22T20:12:21Z

examples/models/qwen2_5/config/0_5b_config.json

+  "use_scaled_rope": false,
+  "vocab_size": 151936,
+  "use_hf_rope": true,
+  "attention_qkv_bias": true


cccclai · 2025-07-22T20:14:38Z

examples/qualcomm/oss_scripts/llama/llama.py

-        state_dict[f"layers.{layer_i}.attention.wk.weight"] = permute(
-            state_dict[f"layers.{layer_i}.attention.wk.weight"], n_kv_heads
+    else:
+        state_dict = torch.load(


Does it mean we support both huggingface model and static llama?

Thanks for reviewing the PR.
I think the use of huggingface's weight in static llama is there before this pr.
For Qwen, it's just going into the if statement. Static Llama's behavior is not changed.

facebook-github-bot · 2025-07-23T03:15:13Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D78788475.

cccclai · 2025-07-23T17:17:16Z

Hey can you rebase?

…ytorch#12506)" This reverts commit e9088ee.

facebook-github-bot · 2025-07-24T17:31:59Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D78788475.

cccclai

Looks good, thank you!

) ### Summary Previous PR were merged unintentionally: pytorch#12054 On top of previous PR, also had a new commit addressing code review. ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Summary: Forward fix for buck build with pytorch#12582 Differential Revision: D79056925

pytorch-bot bot added the ci-no-td label Jul 17, 2025

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2025

winskuo-quic marked this pull request as ready for review July 17, 2025 11:54

winskuo-quic requested review from larryliu0820, kirklandsign, cccclai, lucylq and jackzhxng as code owners July 17, 2025 11:54

cccclai reviewed Jul 22, 2025

View reviewed changes

winskuo-quic added 2 commits July 24, 2025 09:06

Revert "Revert "Qualcomm AI Engine Direct - GA Static QWEN2.5 0.5B" (p…

85ee963

…ytorch#12506)" This reverts commit e9088ee.

Code Review

7272556

winskuo-quic force-pushed the dev1/winskuo/reland_qwen2_0.5b branch from 84bdb3e to 7272556 Compare July 24, 2025 04:44

cccclai approved these changes Jul 24, 2025

View reviewed changes

cccclai merged commit f592d85 into pytorch:main Jul 24, 2025
96 of 100 checks passed

cccclai mentioned this pull request Jul 27, 2025

Forward fix for qnn runner buck target #12886

Merged

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Jul 27, 2025

Forward fix for qnn runner buck target

2a612a0

Summary: Forward fix for buck build with pytorch#12582 Differential Revision: D79056925

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Reland GA Static QWEN2.5 0.5B #12582

Qualcomm AI Engine Direct - Reland GA Static QWEN2.5 0.5B #12582

Uh oh!

winskuo-quic commented Jul 17, 2025

Uh oh!

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

winskuo-quic commented Jul 17, 2025

Uh oh!

cccclai Jul 22, 2025

Uh oh!

cccclai Jul 22, 2025

Uh oh!

winskuo-quic Jul 23, 2025

Uh oh!

facebook-github-bot commented Jul 23, 2025

Uh oh!

cccclai commented Jul 23, 2025

Uh oh!

facebook-github-bot commented Jul 24, 2025

Uh oh!

cccclai left a comment

Uh oh!

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Reland GA Static QWEN2.5 0.5B #12582

Qualcomm AI Engine Direct - Reland GA Static QWEN2.5 0.5B #12582

Uh oh!

Conversation

winskuo-quic commented Jul 17, 2025

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12582

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

github-actions bot commented Jul 17, 2025

This PR needs a release notes: label

Uh oh!

winskuo-quic commented Jul 17, 2025

Uh oh!

cccclai Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 23, 2025

Uh oh!

cccclai commented Jul 23, 2025

Uh oh!

facebook-github-bot commented Jul 24, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading

This PR needs a `release notes:` label