[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751

supriyar · 2020-10-02T18:00:19Z

Stack from ghstack:

[quant] Refactor qembeddingbag to remove duplicate code #45881 [quant] Refactor qembeddingbag to remove duplicate code
[quant] Support for 4-bit quantized EmbeddingBag module #45865 [quant] Support for 4-bit quantized EmbeddingBag module
[quant] Support 4-bit embedding_bag operators using the dtype quint4x2 #45752 [quant] Support 4-bit embedding_bag operators using the dtype quint4x2
[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751 [quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2

Summary:
Use the torch.quint4x2 dtype to create 4-bit packed tensors

Test Plan:
python test/test_quantization.py TestEmbeddingBagOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D24120997

Summary: Use the torch.quint4x2 dtype to create 4-bit packed tensors Test Plan: python test/test_quantization.py TestEmbeddingBagOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

codecov · 2020-10-02T22:22:55Z

Codecov Report

❗ No coverage uploaded for pull request base (gh/supriyar/191/base@cf61bf3). Click here to learn what that means.
The diff coverage is n/a.

@@                   Coverage Diff                   @@
##             gh/supriyar/191/base   #45751   +/-   ##
=======================================================
  Coverage                        ?   68.20%           
=======================================================
  Files                           ?      410           
  Lines                           ?    53245           
  Branches                        ?        0           
=======================================================
  Hits                            ?    36314           
  Misses                          ?    16931           
  Partials                        ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf61bf3...9026935. Read the comment docs.

jerryzh168 · 2020-10-05T17:29:53Z

aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp

+    weight_data =
+        reinterpret_cast<uint8_t*>(weight_contig.data_ptr<c10::quint8>());


I remember data_ptr does not do type check, so maybe you can just do weight_contig.data_ptr<uint8_t*>

I don't think it is applicable for qint types. It throws the error "expected scalar type Byte but found QUInt8"

oh sorry, I meant this: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/templates/TensorBody.h#L354
so:

static_cast<uint8_t*>(weight_contig.data_ptr())

jerryzh168 · 2020-10-05T17:30:40Z

aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp

+          zero_points.toType(c10::kFloat),
+          0, // The output channel axis is 0
+          device(c10::kCPU).dtype(c10::kQUInt4x2));
+      output_data =


…ng quint4x2" Summary: Use the torch.quint4x2 dtype to create 4-bit packed tensors Test Plan: python test/test_quantization.py TestEmbeddingBagOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

vkuzo · 2020-10-06T03:13:58Z

aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp

+    at::parallel_for(
+        0, embedding_rows, 1, [&](int32_t start_idx, int32_t end_idx) {
+          for (int64_t row = start_idx; row < end_idx; ++row) {
+            const uint8_t* input_row = weight_data + row * embedding_cols;
+            std::uint8_t* output_row = output_data + row * output_columns;
+            at::Half* output_row_scale_bias =
+                reinterpret_cast<at::Half*>(output_row + embedding_cols);
+            output_row_scale_bias[0] = weight_scales[row];
+            output_row_scale_bias[1] = weight_bias[row];
+            for (int64_t col = 0; col < embedding_cols; ++col) {
+              // The weight values have already been packed, so here we just
+              // store it in the output tensor.
+              output_row[col] = input_row[col];
+            }
+          }
+        });


optional: seems like this could be reused with the above section if the at::Half is templatized? Def optional though since LOC is low.

Yes, I think there would be similar LOC added in templatizing it so planning to skip it for now.

…ng quint4x2" Summary: Use the torch.quint4x2 dtype to create 4-bit packed tensors Test Plan: python test/test_quantization.py TestEmbeddingBagOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24120997](https://our.internmc.facebook.com/intern/diff/D24120997) [ghstack-poisoned]

facebook-github-bot · 2020-10-07T04:14:30Z

This pull request has been merged in 5c283fa.

[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2

98c9576

Summary: Use the torch.quint4x2 dtype to create 4-bit packed tensors Test Plan: python test/test_quantization.py TestEmbeddingBagOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

supriyar mentioned this pull request Oct 2, 2020

[quant] Support 4-bit embedding_bag operators using the dtype quint4x2 #45752

Closed

supriyar requested review from raghuramank100, vkuzo, z-a-f and jerryzh168 October 2, 2020 20:14

jerryzh168 reviewed Oct 5, 2020

View reviewed changes

Update on "[quant] Add 4-bit embedding_bag prepack/unpack support usi…

d84ca8c

…ng quint4x2" Summary: Use the torch.quint4x2 dtype to create 4-bit packed tensors Test Plan: python test/test_quantization.py TestEmbeddingBagOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

This was referenced Oct 5, 2020

[quant] Support for 4-bit quantized EmbeddingBag module #45865

Closed

[quant] Refactor qembeddingbag to remove duplicate code #45881

Closed

vkuzo approved these changes Oct 6, 2020

View reviewed changes

supriyar added 2 commits October 5, 2020 20:54

facebook-github-bot closed this in 5c283fa Oct 7, 2020

facebook-github-bot added the Merged label Oct 7, 2020

facebook-github-bot deleted the gh/supriyar/191/head branch October 10, 2020 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751

[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751

supriyar commented Oct 2, 2020 •

edited

codecov bot commented Oct 2, 2020 •

edited

jerryzh168 Oct 5, 2020

supriyar Oct 5, 2020

jerryzh168 Oct 6, 2020

jerryzh168 Oct 5, 2020

vkuzo Oct 6, 2020

supriyar Oct 6, 2020

facebook-github-bot commented Oct 7, 2020

		weight_data =
		reinterpret_cast<uint8_t*>(weight_contig.data_ptr<c10::quint8>());

[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751

[quant] Add 4-bit embedding_bag prepack/unpack support using quint4x2 #45751

Conversation

supriyar commented Oct 2, 2020 • edited

codecov bot commented Oct 2, 2020 • edited

Codecov Report

jerryzh168 Oct 5, 2020

Choose a reason for hiding this comment

supriyar Oct 5, 2020

Choose a reason for hiding this comment

jerryzh168 Oct 6, 2020

Choose a reason for hiding this comment

jerryzh168 Oct 5, 2020

Choose a reason for hiding this comment

vkuzo Oct 6, 2020

Choose a reason for hiding this comment

supriyar Oct 6, 2020

Choose a reason for hiding this comment

facebook-github-bot commented Oct 7, 2020

supriyar commented Oct 2, 2020 •

edited

codecov bot commented Oct 2, 2020 •

edited