Qualcomm AI Engine Direct - Add the tutorial to deploy llama3 8B Instruct #5335

shewu-quic · 2024-09-13T01:30:55Z

No description provided.

… 8B Instruct

pytorch-bot · 2024-09-13T01:30:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5335

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2876d16 with merge base 9256b4a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2024-09-13T01:34:16Z

Hi @cccclai,

The PR is to add a document to show how to run and export llama 3 8B Instruct.
Could you please help to take a look?

Thanks :)

cccclai

Thanks!

facebook-github-bot · 2024-09-13T01:53:11Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

haowhsu-quic · 2024-09-13T02:18:00Z

docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md

+- Follow [the README for executorch llama](https://github.com/pytorch/executorch/tree/main/examples/models/llama2) to know how to run a llama model on mobile via ExecuTorch.
+- A Qualcomm device with 16GB RAM
+  - We are continuing to optimize our memory usage to ensure compatibility with lower memory devices.
+- The version of [Qualcomm AI Engine Direct SDK](https://developer.qualcomm.com/software/qualcomm-ai-engine-direct-sdk) is 2.25.0 or above.


Might point to 2.26 version, or convolutions in you case (converted from linear with no bias) will fail to lower.

Thanks for your reminder!

cccclai · 2024-09-13T02:27:42Z

docs/source/build-run-qualcomm-ai-engine-direct-backend.md

   - [QNN 2.25.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.25.0.240728.zip)
-   - [QNN 2.24.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.24.0.240626.zip)
-   - [QNN 2.23.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.23.0.24.06.24.zip)
+   - Note that convolution op might be failed for QNN 2.25.0.


Does it mean 2.26 is the preferred version?

Yes, I think it is the preferred version. Because we find the failed for conv in QNN 2.25.

cccclai · 2024-09-13T02:28:10Z

docs/source/build-run-qualcomm-ai-engine-direct-backend.md

 ## What is coming?

- - [llama2 and llama3](https://github.com/pytorch/executorch/pull/4030). Note that at the moment of writing, we still suffer from the quantization issue in llama2-7B and llama3-8B cases. Only storiesllama works well.
+ - Improve the performance for llama3-8B-Instruct and support bert-mode.


bert-mode isn't a common word. Batch prefill is probably a better name

Thanks for your advice.

cccclai · 2024-09-13T02:29:53Z

I'm hitting missing rms norm it the main branch.

  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 397, in export_llama
    builder = _export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 594, in _export_llama
    builder = builder.to_executorch()
  File "/data/users/chenlai/executorch/extension/llm/export/builder.py", line 382, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/data/users/chenlai/executorch/exir/program/_program.py", line 1270, in to_executorch
    new_gm_res = p(new_gm)
  File "/home/chenlai/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/data/users/chenlai/executorch/exir/passes/__init__.py", line 426, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::rms_norm'}

does it mean we fail to lower somehow

shewu-quic · 2024-09-13T02:34:15Z

I'm hitting missing rms norm it the main branch.

  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 397, in export_llama
    builder = _export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 594, in _export_llama
    builder = builder.to_executorch()
  File "/data/users/chenlai/executorch/extension/llm/export/builder.py", line 382, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/data/users/chenlai/executorch/exir/program/_program.py", line 1270, in to_executorch
    new_gm_res = p(new_gm)
  File "/home/chenlai/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/data/users/chenlai/executorch/exir/passes/__init__.py", line 426, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::rms_norm'}

does it mean we fail to lower somehow

Could you please check is there any error about rms norm op validation?

It should work with qnn 2.26.

facebook-github-bot · 2024-09-13T04:36:39Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

WuhanMonkey · 2024-09-13T04:53:21Z

I'm hitting missing rms norm it the main branch.

  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 397, in export_llama
    builder = _export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 594, in _export_llama
    builder = builder.to_executorch()
  File "/data/users/chenlai/executorch/extension/llm/export/builder.py", line 382, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/data/users/chenlai/executorch/exir/program/_program.py", line 1270, in to_executorch
    new_gm_res = p(new_gm)
  File "/home/chenlai/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/data/users/chenlai/executorch/exir/passes/__init__.py", line 426, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::rms_norm'}

does it mean we fail to lower somehow

Could you please check is there any error about rms norm op validation?

It should work with qnn 2.26.

I actually had the same problem with qnn 2.23 but 2.25 doesn't have this issue. But for 2.25, like I mentioned in the other thread, the export actually didn't quantize the model, instead it just upcast it to fp32.

shewu-quic · 2024-09-13T04:57:02Z

Could you please check is there any error about rms norm op validation?
It should work with qnn 2.26.

I actually had the same problem with qnn 2.23 but 2.25 doesn't have this issue. But for 2.25, like I mentioned in the other thread, the export actually didn't quantize the model, instead it just upcast it to fp32.

If possible, could you try again with qnn 2.26?
Thanks!!

facebook-github-bot · 2024-09-13T05:08:12Z

@cccclai merged this pull request in fe53d41.

WuhanMonkey · 2024-09-13T22:44:44Z

Could you please check is there any error about rms norm op validation?
It should work with qnn 2.26.

I actually had the same problem with qnn 2.23 but 2.25 doesn't have this issue. But for 2.25, like I mentioned in the other thread, the export actually didn't quantize the model, instead it just upcast it to fp32.

If possible, could you try again with qnn 2.26? Thanks!!

Confirmed qnn 2.26 works and the model exported in expected size

Qualcomm AI Engine Direct - Add the tutorial to export and run llama3…

87f5143

… 8B Instruct

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 13, 2024

shewu-quic mentioned this pull request Sep 13, 2024

Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175

Merged

cccclai approved these changes Sep 13, 2024

View reviewed changes

kirklandsign approved these changes Sep 13, 2024

View reviewed changes

haowhsu-quic reviewed Sep 13, 2024

View reviewed changes

shewu-quic added 2 commits September 13, 2024 10:21

Update build-run-qualcomm-ai-engine-direct-backend.md

69fc43c

Update build-run-llama3-qualcomm-ai-engine-direct-backend.md

778884e

cccclai reviewed Sep 13, 2024

View reviewed changes

Update build-run-qualcomm-ai-engine-direct-backend.md

2876d16

facebook-github-bot closed this in fe53d41 Sep 13, 2024

facebook-github-bot added the Merged label Sep 13, 2024

Qualcomm AI Engine Direct - Add the tutorial to deploy llama3 8B Instruct #5335

Qualcomm AI Engine Direct - Add the tutorial to deploy llama3 8B Instruct #5335

Uh oh!

Conversation

shewu-quic commented Sep 13, 2024

Uh oh!

pytorch-bot bot commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5335

✅ No Failures

Uh oh!

shewu-quic commented Sep 13, 2024

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 13, 2024

Uh oh!

haowhsu-quic Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

shewu-quic Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

cccclai Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

shewu-quic Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

cccclai Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

shewu-quic Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

cccclai commented Sep 13, 2024

Uh oh!

shewu-quic commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Sep 13, 2024

Uh oh!

WuhanMonkey commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shewu-quic commented Sep 13, 2024

Uh oh!

facebook-github-bot commented Sep 13, 2024

Uh oh!

WuhanMonkey commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pytorch-bot bot commented Sep 13, 2024 •

edited

Loading

shewu-quic commented Sep 13, 2024 •

edited

Loading

WuhanMonkey commented Sep 13, 2024 •

edited

Loading