Add 4-bit Groupwise Weight-Only Quantization for Core ML #4866

YifanShenSZ · 2024-08-23T04:31:44Z

4-bit groupwise weight-only quantization is well supported in Core ML. Since torchao offers the same quantization, let's use torchao to quantize llama then delegate to Core ML.

pytorch-bot · 2024-08-23T04:31:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4866

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 30 Pending

As of commit ac94cd9 with merge base d3da92d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

examples/models/llama2/source_transformation/quantize.py

digantdesai · 2024-08-23T14:48:57Z

cc @kimishpatel

YifanShenSZ · 2024-09-11T17:08:07Z

Turns out, we may prefer Core ML quantization at this moment. Landed the alternative in #5228 Locally confirmed performance gain & accuracy preservation

add 4-bit groupwise weight-only quantization for coreml

ac94cd9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 23, 2024

digantdesai reviewed Aug 23, 2024

View reviewed changes

examples/models/llama2/source_transformation/quantize.py Show resolved Hide resolved

digantdesai added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: coreml Issues related to Apple's Core ML delegation and code under backends/apple/coreml/ labels Aug 23, 2024

digantdesai requested a review from kimishpatel August 23, 2024 15:33

YifanShenSZ closed this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 4-bit Groupwise Weight-Only Quantization for Core ML #4866

Add 4-bit Groupwise Weight-Only Quantization for Core ML #4866

Uh oh!

YifanShenSZ commented Aug 23, 2024

Uh oh!

pytorch-bot bot commented Aug 23, 2024 •

edited

Loading

Uh oh!

Uh oh!

digantdesai commented Aug 23, 2024

Uh oh!

YifanShenSZ commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add 4-bit Groupwise Weight-Only Quantization for Core ML #4866

Add 4-bit Groupwise Weight-Only Quantization for Core ML #4866

Uh oh!

Conversation

YifanShenSZ commented Aug 23, 2024

Uh oh!

pytorch-bot bot commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4866

⏳ No Failures, 30 Pending

Uh oh!

Uh oh!

digantdesai commented Aug 23, 2024

Uh oh!

YifanShenSZ commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Aug 23, 2024 •

edited

Loading