Qualcomm AI Engine Direct - CI for QNN Static Stories Llama #7884

winskuo-quic · 2025-01-23T13:16:08Z

Summary

QNN Static Stories 110M Llama Compile only CI.
Verify PTE size for hybrid mode to ensure PTE size is reduced when weight sharing is enabled.
TODO: Runtime inference speed test.

cc @cccclai @shewu-quic

pytorch-bot · 2025-01-23T13:16:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7884

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending

As of commit 9d73dd8 with merge base eacbeb7 ():

NEW FAILURE - The following job has failed:

pull / test-llava-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 83f50565078bc30d6239c2ea508f618d9e835fb18215b104542aed793237e1be /exec failed with exit code 139

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-01-24T18:05:22Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai

Thank you for adding the test!

cccclai · 2025-01-24T18:08:52Z

.ci/scripts/setup-stories-llama.sh

+set -ex
+
+# Download and prepare stories llama model artifacts
+prepare_model_artifacts() {


Should we use this?

executorch/.ci/scripts/utils.sh

Line 145 in 1bf20e3

download_stories_model_artifacts() {

Thanks for the recommendation. I will reuse this and remove setup-stories-llama.sh.

cccclai · 2025-01-24T18:09:53Z

.github/workflows/pull.yml

+        PYTHON_EXECUTABLE=python bash -x .ci/scripts/setup-stories-llama.sh
+        
+        # Test static llama stories110m
+        PYTHON_EXECUTABLE=python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleScript.test_stories_single_llama --model SM8650 --build_folder build-android/ --executorch_root . --artifact_dir . --compile_only"


For the accuracy, probably can borrow some logic here

executorch/.ci/scripts/test_llama.sh

Line 4 in 1bf20e3

#

I will try to add accuracy test to this PR.

facebook-github-bot · 2025-01-30T06:54:50Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2025-01-30T19:19:59Z

Somehow the CI isn't trigger, can you push a new commit?

…7884) Summary: QNN Static Stories 110M Llama Compile only CI. Verify PTE size for hybrid mode to ensure PTE size is reduced when weight sharing is enabled. TODO: Runtime inference speed test and accuracy test Pull Request resolved: pytorch#7884 Differential Revision: D68635095 Pulled By: cccclai

winskuo-quic · 2025-02-03T09:47:14Z

Somehow the CI isn't trigger, can you push a new commit?

Hi @cccclai,
I have changed this to a draft PR to perform testing.
I will notify you once CI for pte size check and accuracy test passed.

Thanks

cccclai · 2025-02-03T19:34:09Z

I feel like making it step by step is fine. Like

one PR with export success
one PR for pte size check succuess
one PR for accuracy check

Just so we can more and more coverage during this time

cccclai · 2025-02-03T19:35:35Z

In the meanwhile, there is an SSH option to log in to the server if needed.

winskuo-quic · 2025-02-11T13:46:59Z

I feel like making it step by step is fine. Like

one PR with export success

one PR for pte size check succuess

one PR for accuracy check

Just so we can more and more coverage during this time

Hi @cccclai,
Actually, as long as the first test you mentioned can work properly, there's not much effort to enable pte size check and accuracy test. Therefore, I have enabled all 3 of them in this PR. I will work on inference speed test in a future PR.
I have tested to fail the accuracy and pte size test on purpose to ensure the CI is working, as shown in the 2 images below. This PR is now ready and has passed the 2 static llama tests I have added. Please have another look.
Thanks

Failed on purpose CI results.

cccclai

This is awesome, thank you so much!

facebook-github-bot · 2025-02-11T18:32:58Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 23, 2025

winskuo-quic mentioned this pull request Jan 24, 2025

Qualcomm AI Engine Direct - Unify Llama2&Llama3 and Small Accuracy Improvement. #7618

Merged

cccclai approved these changes Jan 24, 2025

View reviewed changes

lucylq added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jan 30, 2025

winskuo-quic force-pushed the dev1/winskuo/llama_ci_for_meta branch from ed50f6a to 526a0d8 Compare February 3, 2025 02:30

winskuo-quic marked this pull request as draft February 3, 2025 09:47

winskuo-quic force-pushed the dev1/winskuo/llama_ci_for_meta branch 2 times, most recently from e74a418 to fd5f7ac Compare February 4, 2025 08:42

digantdesai added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ labels Feb 4, 2025

winskuo-quic force-pushed the dev1/winskuo/llama_ci_for_meta branch 7 times, most recently from 20c42a9 to e4fdae5 Compare February 11, 2025 01:46

Add Static Stories Llama CI

b3e52aa

winskuo-quic force-pushed the dev1/winskuo/llama_ci_for_meta branch 2 times, most recently from 9595ff5 to d565402 Compare February 11, 2025 05:26

winskuo-quic marked this pull request as ready for review February 11, 2025 05:36

Enable x86 runner for static llama, create a script for static llama ci

9d73dd8

winskuo-quic force-pushed the dev1/winskuo/llama_ci_for_meta branch from 62c5ceb to 9d73dd8 Compare February 11, 2025 12:17

cccclai approved these changes Feb 11, 2025

View reviewed changes

cccclai merged commit 14ddfd4 into pytorch:main Feb 11, 2025
75 of 78 checks passed

Qualcomm AI Engine Direct - CI for QNN Static Stories Llama #7884

Qualcomm AI Engine Direct - CI for QNN Static Stories Llama #7884

Uh oh!

Conversation

winskuo-quic commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7884

❌ 1 New Failure, 1 Pending

Uh oh!

facebook-github-bot commented Jan 24, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

cccclai Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 30, 2025

Uh oh!

cccclai commented Jan 30, 2025

Uh oh!

winskuo-quic commented Feb 3, 2025

Uh oh!

cccclai commented Feb 3, 2025

Uh oh!

cccclai commented Feb 3, 2025

Uh oh!

winskuo-quic commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

winskuo-quic commented Jan 23, 2025 •

edited

Loading

pytorch-bot bot commented Jan 23, 2025 •

edited

Loading

winskuo-quic commented Feb 11, 2025 •

edited

Loading