Skip to content

Conversation

YifanShenSZ
Copy link
Collaborator

@YifanShenSZ YifanShenSZ commented Sep 10, 2024

Motivation

Short term: TorchAO int4 quantization yields float zero point, but CoreML does not have good support for it yet. We will need CoreML int4 quantization for now.

Intermediate term: Before torch implements all CoreML-supported quantizations (e.g. palettization, sparcification, joint compression...), it will be great to have a way to use/experiment those CoreML quantizations.

Solution

In CoreML preprocess, we add CoreML quantization config as a compile spec

Copy link

pytorch-bot bot commented Sep 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5228

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 554382a with merge base 7e374d7 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024
@YifanShenSZ YifanShenSZ force-pushed the coreml-quantization branch 2 times, most recently from 3bc734f to 455cea4 Compare September 10, 2024 19:14
@YifanShenSZ YifanShenSZ changed the title add coreml quantize Add CoreML Quantize Sep 10, 2024
@YifanShenSZ
Copy link
Collaborator Author

@cccclai 🙏

@YifanShenSZ YifanShenSZ marked this pull request as ready for review September 10, 2024 20:32
@cccclai
Copy link
Contributor

cccclai commented Sep 10, 2024

Hey could you share the command to run the script? The arg list is getting longer and it's hard to guess...

@YifanShenSZ
Copy link
Collaborator Author

Hey could you share the command to run the script? The arg list is getting longer and it's hard to guess...

Sure

python -m examples.models.llama2.export_llama -c Meta-Llama-3-8B/consolidated.00.pth -p Meta-Llama-3-8B/params.json --disable_dynamic_shape -kv --coreml --coreml-quantize b4w --coreml-enable-state

This is not the final command, though: we are adding fused sdpa

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Sep 11, 2024

Hey could you rebase? Run into land race and the other PR touched the same file merge first...

@YifanShenSZ
Copy link
Collaborator Author

YifanShenSZ commented Sep 11, 2024

Rebased ✅

GitHub is not showing conflict yet, though. Is the conflict change in Meta internal only for now? (And I need to wait until it gets exported?)

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in 4da3c5d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants