New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[quant] PerChannelFloatQParams support for quint4x2 dtype #45594
Conversation
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 97553da79ec9e1eb2a63516a5ce37a9747dd9264 Pull Request resolved: #45594
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does dequantize() work with no changes?
No, it does require change. I've made changes to the |
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: a10aaeeebc5102d4e301d1d23e3a6ab257e9758d Pull Request resolved: #45594
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]
Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2e8feabb1ffa50d6219b987ff78c796e5918903b Pull Request resolved: #45594
Codecov Report
@@ Coverage Diff @@
## gh/supriyar/190/base #45594 +/- ##
=====================================================
Coverage 68.50% 68.50%
=====================================================
Files 408 408
Lines 52487 52487
=====================================================
Hits 35954 35954
Misses 16533 16533 Continue to review full report at Codecov.
|
This pull request has been merged in 1a2d3b6. |
Stack from ghstack:
Summary:
Adds support for Per-channel quantization using float qparams for 4-bit dtype
We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the
4-bit data depending on the bit_width.
Size of 4-bit quantized tensor is half that of 8-bit quantized tensor.
Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D24025595