Skip to content

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Feb 21, 2025

Summary

Use to_edge_transform_and_lower in export_llama for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in export_llama in the args, although I'm not sure if that is happening anywhere at the moment.

Closes #8621

Performance regression benchmarking for xnnpack (on android) vs. past 3 days:
Screenshot 2025-02-24 at 11 39 52 AM

These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges.

Test plan

See if CI passes

Copy link

pytorch-bot bot commented Feb 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8624

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 951d91e with merge base 77589c6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2025
@jackzhxng jackzhxng changed the title Use to_edge_lower_and_transform for xnnpack Use to_edge_lower_and_transform for XNNPack Feb 21, 2025
@jackzhxng jackzhxng added topic: not user facing release notes: xnnpack Changes to the XNNPack backend delegate and removed topic: not user facing labels Feb 21, 2025
@mergennachin mergennachin self-requested a review February 21, 2025 20:43
@iseeyuan
Copy link
Contributor

Have you tested the performance of a model (like llama 3B), before and after? Asking because export_llama is used by different users to prepare for the .pte files in cpu. Please make sure there's no perf regress.

@jackzhxng
Copy link
Contributor Author

jackzhxng commented Feb 24, 2025

Running on demand perf benchmark here: https://github.com/pytorch/executorch/actions/runs/13505211116

)
if args.verbose:
print_delegation_info(builder.edge_manager.exported_program().graph_module)
if args.num_sharding > 0 and args.qnn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code shouldn't be here right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah technically should remove since qnn. Will remove after benchmarking finishes

@jackzhxng jackzhxng temporarily deployed to upload-benchmark-results February 24, 2025 19:19 — with GitHub Actions Inactive
@jackzhxng jackzhxng temporarily deployed to upload-benchmark-results February 24, 2025 19:48 — with GitHub Actions Inactive
@jackzhxng
Copy link
Contributor Author

@iseeyuan please see the performance benchmark graph I posted in the pr description

@mergennachin
Copy link
Contributor

Can you run internal CI tests before merging? Otherwise looks good.

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Feb 25, 2025
Summary:
Use `to_edge_transform_and_lower` in `export_llama` for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in `export_llama` in the args, although I'm not sure if that is happening anywhere at the moment.

Closes #8621

Performance regression benchmarking for xnnpack (on android) vs. past 3 days:
<img width="1427" alt="Screenshot 2025-02-24 at 11 39 52 AM" src="https://github.com/user-attachments/assets/1640cf2c-a579-491f-8940-7ccfbe464903" />

These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges.


Test Plan: See if CI passes

Differential Revision: D70124742

Pulled By: jackzhxng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70124742

@facebook-github-bot
Copy link
Contributor

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:
Use `to_edge_transform_and_lower` in `export_llama` for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in `export_llama` in the args, although I'm not sure if that is happening anywhere at the moment.

Closes #8621

Performance regression benchmarking for xnnpack (on android) vs. past 3 days:
<img width="1427" alt="Screenshot 2025-02-24 at 11 39 52 AM" src="https://github.com/user-attachments/assets/1640cf2c-a579-491f-8940-7ccfbe464903" />

These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges.


Test Plan: See if CI passes

Differential Revision: D70124742

Pulled By: jackzhxng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70124742

@jackzhxng jackzhxng merged commit b5344c1 into main Feb 25, 2025
119 of 121 checks passed
@jackzhxng jackzhxng deleted the jz/export_llama_new_api branch February 25, 2025 09:58
@jackzhxng jackzhxng restored the jz/export_llama_new_api branch February 25, 2025 09:59
@jackzhxng jackzhxng temporarily deployed to upload-benchmark-results February 25, 2025 10:45 — with GitHub Actions Inactive
swolchok added a commit that referenced this pull request Feb 26, 2025
@swolchok swolchok mentioned this pull request Feb 26, 2025
swolchok added a commit that referenced this pull request Feb 26, 2025
* Revert "Switch to new ao quant api for 8da4w (#8501)"

This reverts commit f3fc096.

* Revert "Use to_edge_lower_and_transform for XNNPack (#8624)"

This reverts commit b5344c1.

#8624 caused concerning test failure internally -- out of bounds array access. #8501 depends on it per author
facebook-github-bot pushed a commit that referenced this pull request Feb 27, 2025
Summary:
Trying to bring back #8624 after it got reverted due to an internal test failing


Differential Revision: D70221944

Pulled By: jackzhxng
facebook-github-bot pushed a commit that referenced this pull request Feb 27, 2025
Differential Revision: D70221944

Pull Request resolved: #8717
iseeyuan pushed a commit that referenced this pull request Mar 14, 2025
* Revert "Switch to new ao quant api for 8da4w (#8501)"

This reverts commit f3fc096.

* Revert "Use to_edge_lower_and_transform for XNNPack (#8624)"

This reverts commit b5344c1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: xnnpack Changes to the XNNPack backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move export_llama to to_edge_transform_and_lower API

5 participants