-
Notifications
You must be signed in to change notification settings - Fork 378
Introduce int8 quantization api (version 2) #3391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3391
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e32508c with merge base d355d1f ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "quantize_" |
|
@namgyu-youn I have confirmed internally, there are some infra issues right now so the CI jobs didn't show up, let's just wait for that to be resolved |
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
|
@jerryzh168 Finally! CI started to run, but broken although the local test passed in NVIDIA A100. Assuming CI instance calls different kernels compared to Ampere, but not sure what should I do... can you please help with this? Actually, I didn't understand why compiler, instead of profiler |
|
Thanks for working on this @namgyu-youn! |
Summary:
Introduce a new tensor subclass API. The main features are
Int8Tensor: Main API, which handles quantization and dequantization operationsThis API is integrated into global variants (
Int8WeightOnlyConfig,Int8DynamicActivationInt8WeightConfig) usingversion, and not defined as a default.Related Issue/PR: #3241 (reland)
Test plan: pytest -sv test/quantization/quantize_/workflows/int8/test_int8_tensor.py
PERF Test:
https://github.com/pytorch/ao/blob/main/tutorials/quantize_vit/run_vit_b_quant.py with a batch size of 32:
torch.compiletorch.compileFuture Plan: #3241 (review)