Skip to content

Conversation

@winskuo-quic
Copy link
Collaborator

@winskuo-quic winskuo-quic commented Feb 17, 2025

Summary

  • Isolate LLM to an individual class for test_qnn_delegate.py
  • This PR should not affect Executorch's CI. This is mainly for internal CI that checks pte size, accuracy, and inference speed. Runs stories110m and Llama 3.2 1B

cc @cccclai @shewu-quic @cbilgin @mergennachin @byjlw

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8512

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit cdf332b with merge base 433e30b (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2025
@winskuo-quic winskuo-quic changed the title Qualcomm AI Engine Direct - CI For LLama Qualcomm AI Engine Direct - CI For Llama Feb 17, 2025
@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/add_1B_llama_UT branch from 940ee4a to cdf332b Compare February 18, 2025 08:24
@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is mainly for our internal CI to perform inference speed test on Stories Llama and Llama 3.2 1B.
ExecuTorch's CI will still work as usual, which tests pte size and accuracy using Stories Llama.
Please have a look.
Thanks.

@winskuo-quic winskuo-quic marked this pull request as ready for review February 18, 2025 09:36
@swolchok swolchok requested a review from cccclai February 18, 2025 18:39
@swolchok swolchok added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 18, 2025
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


// For now, we just print the total inference time for CI, can save more info
// in future if needed.
std::ofstream outfile("outputs/inference_speed.txt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems default to write to this path. If users don't have this path then I assume it will fail? - can we make this runner more generic so it can be reused directly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think by running the executable directly without using llama.py, it will fail. I will make a separate PR making this more flexible to users not using python script.

@cccclai cccclai added module: user experience Issues related to reducing friction for users release notes: qualcomm Changes to the Qualcomm backend delegate labels Feb 19, 2025
@cccclai cccclai merged commit f0ef51c into pytorch:main Feb 19, 2025
76 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ module: user experience Issues related to reducing friction for users partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm release notes: qualcomm Changes to the Qualcomm backend delegate triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants