-
Notifications
You must be signed in to change notification settings - Fork 684
Qualcomm AI Engine Direct - Fix mem_handel register twice issue #13410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - Fix mem_handel register twice issue #13410
Conversation
Summary: - Insert registered handle in pre_registered_handles_ map to avoid register multiple times for the same data_ptr Background: When running llama in lookahead mode using the same AR-N model for both the prompt processor and token generator. The input and output are the same, and the kv cache is shared between both components.. This causes a "register twice" error message from QNN when a shared buffer (Smart Mask) is used.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13410
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 9473642 with merge base 30a6f5e ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
@shewu-quic Thanks for putting this up. However the situation you described (using same graph for two purposes) is not what we are doing, we have multiple graphs, and externally allocated ION buffers that we are trying to use as input for those graphs. The double registration error we are seeing comes from |
Seems like there are some ongoing discussion, what do you think regarding this particular PR? |
I believe this PR is ready for review and merge. The PR it addresses is different from the one that @sxu encountered. |
Summary:
Background:
When running llama in lookahead mode using the same AR-N model for both the prompt processor and token generator. The input and output are the same, and the kv cache is shared between both components.. This causes a "register twice" error message from QNN when a shared buffer (Smart Mask) is used.
Error message:
cc: @sxu , @haowhsu-quic