Skip to content

Conversation

@shewu-quic
Copy link
Collaborator

No description provided.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5335

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2876d16 with merge base 9256b4a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 13, 2024
@shewu-quic
Copy link
Collaborator Author

Hi @cccclai,

The PR is to add a document to show how to run and export llama 3 8B Instruct.
Could you please help to take a look?

Thanks :)

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

- Follow [the README for executorch llama](https://github.com/pytorch/executorch/tree/main/examples/models/llama2) to know how to run a llama model on mobile via ExecuTorch.
- A Qualcomm device with 16GB RAM
- We are continuing to optimize our memory usage to ensure compatibility with lower memory devices.
- The version of [Qualcomm AI Engine Direct SDK](https://developer.qualcomm.com/software/qualcomm-ai-engine-direct-sdk) is 2.25.0 or above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might point to 2.26 version, or convolutions in you case (converted from linear with no bias) will fail to lower.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminder!

- [QNN 2.25.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.25.0.240728.zip)
- [QNN 2.24.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.24.0.240626.zip)
- [QNN 2.23.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.23.0.24.06.24.zip)
- Note that convolution op might be failed for QNN 2.25.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean 2.26 is the preferred version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is the preferred version. Because we find the failed for conv in QNN 2.25.

## What is coming?

- [llama2 and llama3](https://github.com/pytorch/executorch/pull/4030). Note that at the moment of writing, we still suffer from the quantization issue in llama2-7B and llama3-8B cases. Only storiesllama works well.
- Improve the performance for llama3-8B-Instruct and support bert-mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bert-mode isn't a common word. Batch prefill is probably a better name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice.

@cccclai
Copy link
Contributor

cccclai commented Sep 13, 2024

I'm hitting missing rms norm it the main branch.

  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 397, in export_llama
    builder = _export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 594, in _export_llama
    builder = builder.to_executorch()
  File "/data/users/chenlai/executorch/extension/llm/export/builder.py", line 382, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/data/users/chenlai/executorch/exir/program/_program.py", line 1270, in to_executorch
    new_gm_res = p(new_gm)
  File "/home/chenlai/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/data/users/chenlai/executorch/exir/passes/__init__.py", line 426, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::rms_norm'}

does it mean we fail to lower somehow

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Sep 13, 2024

I'm hitting missing rms norm it the main branch.

  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 397, in export_llama
    builder = _export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 594, in _export_llama
    builder = builder.to_executorch()
  File "/data/users/chenlai/executorch/extension/llm/export/builder.py", line 382, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/data/users/chenlai/executorch/exir/program/_program.py", line 1270, in to_executorch
    new_gm_res = p(new_gm)
  File "/home/chenlai/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/data/users/chenlai/executorch/exir/passes/__init__.py", line 426, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::rms_norm'}

does it mean we fail to lower somehow

Could you please check is there any error about rms norm op validation?

It should work with qnn 2.26.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@WuhanMonkey
Copy link
Contributor

WuhanMonkey commented Sep 13, 2024

I'm hitting missing rms norm it the main branch.

  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 397, in export_llama
    builder = _export_llama(modelname, args)
  File "/data/users/chenlai/executorch/examples/models/llama2/export_llama_lib.py", line 594, in _export_llama
    builder = builder.to_executorch()
  File "/data/users/chenlai/executorch/extension/llm/export/builder.py", line 382, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/data/users/chenlai/executorch/exir/program/_program.py", line 1270, in to_executorch
    new_gm_res = p(new_gm)
  File "/home/chenlai/.conda/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    res = self.call(graph_module)
  File "/data/users/chenlai/executorch/exir/passes/__init__.py", line 426, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'aten::rms_norm'}

does it mean we fail to lower somehow

Could you please check is there any error about rms norm op validation?

It should work with qnn 2.26.

I actually had the same problem with qnn 2.23 but 2.25 doesn't have this issue. But for 2.25, like I mentioned in the other thread, the export actually didn't quantize the model, instead it just upcast it to fp32.

@shewu-quic
Copy link
Collaborator Author

Could you please check is there any error about rms norm op validation?
It should work with qnn 2.26.

I actually had the same problem with qnn 2.23 but 2.25 doesn't have this issue. But for 2.25, like I mentioned in the other thread, the export actually didn't quantize the model, instead it just upcast it to fp32.

If possible, could you try again with qnn 2.26?
Thanks!!

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in fe53d41.

@WuhanMonkey
Copy link
Contributor

Could you please check is there any error about rms norm op validation?
It should work with qnn 2.26.

I actually had the same problem with qnn 2.23 but 2.25 doesn't have this issue. But for 2.25, like I mentioned in the other thread, the export actually didn't quantize the model, instead it just upcast it to fp32.

If possible, could you try again with qnn 2.26? Thanks!!

Confirmed qnn 2.26 works and the model exported in expected size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants