[quant] PerChannelFloatQParams support for quint4x2 dtype #45594

supriyar · 2020-09-30T18:59:40Z

Stack from ghstack:

[quant] PerChannelFloatQParams support for quint4x2 dtype #45594 [quant] PerChannelFloatQParams support for quint4x2 dtype
[quant] creating quint4x2 dtype for quantized tensors #44678 [quant] creating quint4x2 dtype for quantized tensors

Summary:
Adds support for Per-channel quantization using float qparams for 4-bit dtype
We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the
4-bit data depending on the bit_width.
Size of 4-bit quantized tensor is half that of 8-bit quantized tensor.

Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D24025595

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 97553da79ec9e1eb2a63516a5ce37a9747dd9264 Pull Request resolved: #45594

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]

raghuramank100

Does dequantize() work with no changes?

supriyar · 2020-10-01T04:36:23Z

Does dequantize() work with no changes?

No, it does require change. I've made changes to the dequantize_per_channel_affine_kernel to do the unpacking when dequantizing.

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: a10aaeeebc5102d4e301d1d23e3a6ab257e9758d Pull Request resolved: #45594

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24025595](https://our.internmc.facebook.com/intern/diff/D24025595) [ghstack-poisoned]

Summary: Adds support for Per-channel quantization using float qparams for 4-bit dtype We use the new dispatch mechanism and use existing quantize/dequantize kernels to pack the 4-bit data depending on the bit_width. Size of 4-bit quantized tensor is half that of 8-bit quantized tensor. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_quantize_per_channel_sub_byte Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2e8feabb1ffa50d6219b987ff78c796e5918903b Pull Request resolved: #45594

codecov · 2020-10-02T00:39:23Z

Codecov Report

Merging #45594 into gh/supriyar/190/base will not change coverage.
The diff coverage is n/a.

@@                  Coverage Diff                  @@
##           gh/supriyar/190/base   #45594   +/-   ##
=====================================================
  Coverage                 68.50%   68.50%           
=====================================================
  Files                       408      408           
  Lines                     52487    52487           
=====================================================
  Hits                      35954    35954           
  Misses                    16533    16533

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 320861a...bff8e10. Read the comment docs.

facebook-github-bot · 2020-10-02T08:10:40Z

This pull request has been merged in 1a2d3b6.

supriyar mentioned this pull request Sep 30, 2020

[quant] creating quint4x2 dtype for quantized tensors #44678

Closed

supriyar requested review from z-a-f, raghuramank100, jerryzh168 and vkuzo September 30, 2020 19:49

raghuramank100 approved these changes Oct 1, 2020

View reviewed changes

facebook-github-bot closed this in 1a2d3b6 Oct 2, 2020

facebook-github-bot added the merged label Oct 2, 2020

This was referenced Oct 2, 2020

[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751

Closed

[quant] Support 4-bit embedding_bag operators using the dtype quint4x2 #45752

Closed

facebook-github-bot deleted the gh/supriyar/190/head branch October 5, 2020 14:18

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quant] PerChannelFloatQParams support for quint4x2 dtype #45594

[quant] PerChannelFloatQParams support for quint4x2 dtype #45594

supriyar commented Sep 30, 2020 •

edited

raghuramank100 left a comment

supriyar commented Oct 1, 2020 •

edited

codecov bot commented Oct 2, 2020 •

edited

facebook-github-bot commented Oct 2, 2020

[quant] PerChannelFloatQParams support for quint4x2 dtype #45594

[quant] PerChannelFloatQParams support for quint4x2 dtype #45594

Conversation

supriyar commented Sep 30, 2020 • edited

raghuramank100 left a comment

Choose a reason for hiding this comment

supriyar commented Oct 1, 2020 • edited

codecov bot commented Oct 2, 2020 • edited

Codecov Report

facebook-github-bot commented Oct 2, 2020

supriyar commented Sep 30, 2020 •

edited

supriyar commented Oct 1, 2020 •

edited

codecov bot commented Oct 2, 2020 •

edited