Skip to content

Qualcomm AI Engine Direct - Reland GA Static QWEN2.5 0.5B #12582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 24, 2025

Conversation

winskuo-quic
Copy link
Collaborator

Summary

Previous PR were merged unintentionally: #12054
On top of previous PR, also had a new commit addressing code review.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

@pytorch-bot pytorch-bot bot added the ci-no-td label Jul 17, 2025
Copy link

pytorch-bot bot commented Jul 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12582

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 7272556 with merge base e81727a (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@winskuo-quic winskuo-quic marked this pull request as ready for review July 17, 2025 11:54
@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is to reland the reverted PR: #12054
I also have a new commit on top of original PR that is addressing issues discussed during code review, which includes:

  1. Contributing to the CPU config instead of converting the config ourselves so that all backends can be using the config. I have created a new .json file for qwen2.5 0.5b since codebase now only has qwen2.5 1.5b config.
  2. In previous PR, we mentioned that tokenizer went into runtime error on device. This issue is resolved; however, another issue related to normalizer shows up. I believe it is resolved in tokenizer's mainline: Support NFC Normalizer meta-pytorch/tokenizers#104. However, since ExecuTorch is still using tokenizer without this patch(Support NFC Normalizer meta-pytorch/tokenizers#104), it will still run into error. To work around, I have popped the normalizer key for now, and I can remove the workaround once ExecuTorch uses newer version of tokenizer that has the issue resolved.

"use_scaled_rope": false,
"vocab_size": 151936,
"use_hf_rope": true,
"attention_qkv_bias": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice

state_dict[f"layers.{layer_i}.attention.wk.weight"] = permute(
state_dict[f"layers.{layer_i}.attention.wk.weight"], n_kv_heads
else:
state_dict = torch.load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we support both huggingface model and static llama?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the PR.
I think the use of huggingface's weight in static llama is there before this pr.
For Qwen, it's just going into the if statement. Static Llama's behavior is not changed.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D78788475.

@cccclai
Copy link
Contributor

cccclai commented Jul 23, 2025

Hey can you rebase?

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/reland_qwen2_0.5b branch from 84bdb3e to 7272556 Compare July 24, 2025 04:44
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D78788475.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@cccclai cccclai merged commit f592d85 into pytorch:main Jul 24, 2025
96 of 100 checks passed
Conarnar pushed a commit to Conarnar/executorch that referenced this pull request Jul 25, 2025
)

### Summary
Previous PR were merged unintentionally:
pytorch#12054
On top of previous PR, also had a new commit addressing code review.

### Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual
commands you used and note down tests that you have written if
applicable.
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Jul 27, 2025
Summary: Forward fix for buck build with pytorch#12582

Differential Revision: D79056925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants