-
Notifications
You must be signed in to change notification settings - Fork 684
Qualcomm AI Engine Direct - CI for QNN Static Stories Llama #7884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - CI for QNN Static Stories Llama #7884
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7884
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 PendingAs of commit 9d73dd8 with merge base eacbeb7 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the test!
.ci/scripts/setup-stories-llama.sh
Outdated
set -ex | ||
|
||
# Download and prepare stories llama model artifacts | ||
prepare_model_artifacts() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use this?
executorch/.ci/scripts/utils.sh
Line 145 in 1bf20e3
download_stories_model_artifacts() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the recommendation. I will reuse this and remove setup-stories-llama.sh.
.github/workflows/pull.yml
Outdated
PYTHON_EXECUTABLE=python bash -x .ci/scripts/setup-stories-llama.sh | ||
# Test static llama stories110m | ||
PYTHON_EXECUTABLE=python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleScript.test_stories_single_llama --model SM8650 --build_folder build-android/ --executorch_root . --artifact_dir . --compile_only" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the accuracy, probably can borrow some logic here
executorch/.ci/scripts/test_llama.sh
Line 4 in 1bf20e3
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to add accuracy test to this PR.
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Somehow the CI isn't trigger, can you push a new commit? |
…7884) Summary: QNN Static Stories 110M Llama Compile only CI. Verify PTE size for hybrid mode to ensure PTE size is reduced when weight sharing is enabled. TODO: Runtime inference speed test and accuracy test Pull Request resolved: pytorch#7884 Differential Revision: D68635095 Pulled By: cccclai
ed50f6a
to
526a0d8
Compare
…7884) Summary: QNN Static Stories 110M Llama Compile only CI. Verify PTE size for hybrid mode to ensure PTE size is reduced when weight sharing is enabled. TODO: Runtime inference speed test and accuracy test Pull Request resolved: pytorch#7884 Differential Revision: D68635095 Pulled By: cccclai
Hi @cccclai, Thanks |
I feel like making it step by step is fine. Like
Just so we can more and more coverage during this time |
e74a418
to
fd5f7ac
Compare
20c42a9
to
e4fdae5
Compare
9595ff5
to
d565402
Compare
62c5ceb
to
9d73dd8
Compare
Hi @cccclai, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you so much!
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary
QNN Static Stories 110M Llama Compile only CI.
Verify PTE size for hybrid mode to ensure PTE size is reduced when weight sharing is enabled.
TODO: Runtime inference speed test.
cc @cccclai @shewu-quic