[Quantization] Quantization API #309

Aalanli · 2023-07-11T20:21:38Z

Add extensible quantization API.
See examples/quantization/gpt2.py for usage example.

On gpt2 with first 500 test split of wikitext-2-raw-v1:
original f32 ppl: 129.88427568662286
original f32 acc: [top-1: 0.291, top-5: 0.486, top-10: 0.561]

quantized f16 ppl: 131.41456528937462
quantized f16 acc: [top-1: 0.288, top-5: 0.482, top-10: 0.556]

quantized f16 -> int8 ppl: 131.11489348364347
quantized f16 -> int8 acc: [top-1: 0.284, top-5: 0.481, top-10: 0.554]

Currently supported:

symmetric weight quantization
Automatically quantize linear layers
Automatically quantize embedding layers
Custom symmetric quantized weight kernel for int8

…ant-static-matmul

yaoyaoding

Thanks @Aalanli ! Good progress!

I left some comments on the minor issues.

yaoyaoding · 2023-07-12T05:22:52Z

python/hidet/graph/nn/linear.py

+class SymQuantLinearTransposed(Module):
+    def __init__(self, weight: Tensor, bias: Optional[Tensor] = None, quant_type: str = 'int8'):
+        super().__init__()
+        self.in_features = weight.shape[0]
+        self.out_features = weight.shape[1]
+        qweight, scale = ops.symmetric_quantize(weight, quant_type=quant_type, dims=[-1])
+        self.qweight = qweight
+        self.scale = scale
+        self.bias = bias
+
+    def extra_str(self) -> str:
+        return 'in_features={}, out_features={}'.format(self.in_features, self.out_features)
+
+    def forward(self, x: Tensor) -> Tensor:
+        x = ops.matmul(x, ops.symmetric_dequantize(ops.barrier(self.qweight), self.scale, dims=[-1]))
+        if self.bias is not None:
+            x = ops.add(x, self.bias)
+        return x


Maybe we can also put all the quantization nn layers to a sub-namespace like hidet.graph.nn.quantized (like torch people used torch.nn.quantized) or hidet.graph.nn.quant (what you used in ops).

Right, I think that this module is currently not needed, since quantization is applied during graph pass anyways. And the copying mechanisms won't work here when converting from torch.

yaoyaoding · 2023-07-12T05:30:12Z

python/hidet/graph/ops/normalize/norm.py

@@ -15,7 +15,7 @@
 from hidet.ir.compute import reduce


@xinli-git, could you help to have a look at the change of norm? Thanks!

In the future, let's try to unify the schedule template for different data types, which will reduce the complexity of maintanance.

Sorry, this change in norm is exactly the same as the earlier one. Since I needed to apply the same fix for some tests to pass.

python/hidet/graph/ops/quant/__init__.py

python/hidet/graph/transforms/base.py

python/hidet/graph/transforms/graph_patterns/base.py

python/hidet/graph/transforms/graph_patterns/quant/linear.py

python/hidet/graph/transforms/graph_patterns/base.py

python/hidet/ir/primitives/cuda/mma.py

yaoyaoding · 2023-07-13T02:23:17Z

Hi @Aalanli,

I forget one thing. It is recommanded to add put some of code in the examples/quantization to the test, so that we are sure our potential change of other places will not break the quantization support.

Allan Lin and others added 23 commits June 29, 2023 18:54

symmetric quantization functions

eef53f3

format/lint

a9ed409

fix getoutput bug

d4b3959

add tests

dac6d3a

format/lint

f0bd253

quantization-graph-pass

4badf32

format

f2f539c

initial

9b4bce3

quant matmul kernel

6ef3b2b

optimize schedule parameters

909d647

new instructions

8316ed9

minor name change

8e13db9

Merge remote-tracking branch 'origin/quantization-graph-pass' into qu…

25d53cb

…ant-static-matmul

add resolve rule

3460293

fix get_output bug

9478175

fix norm op

542dc9d

Merge branch 'fix-norm-bug' into quant-static-matmul

18b2b3b

refactor quantization api

36c79c8

minor addition

66514fa

reformat quantization api

cb7c1a2

format/lint

e8a9f93

lint

3932e5a

Merge branch 'main' into quant-static-matmul

c5e905e

yaoyaoding reviewed Jul 12, 2023

View reviewed changes

yaoyaoding requested a review from xinli-git July 12, 2023 06:02

Aalanli added 5 commits July 12, 2023 12:46

apply suggestions

af42cbb

small performance fix

608b3f6

change mma

d3cef22

lint

802939a

Merge branch 'main' into quant-static-matmul

50a1512

Allan Lin added 2 commits July 13, 2023 10:50

add tests for quantization

9597c8b

test logits instead

e89d135

xinli-git mentioned this pull request Jul 16, 2023

[Operator] optimize normalize op with vectorized load, dynamic shape and more #316

Merged

increase tol

e0df974

Aalanli merged commit e3b01bb into hidet-org:main Jul 17, 2023
2 checks passed

Aalanli deleted the quant-static-matmul branch July 17, 2023 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] Quantization API #309

[Quantization] Quantization API #309

Aalanli commented Jul 11, 2023

yaoyaoding left a comment

yaoyaoding Jul 12, 2023

Aalanli Jul 12, 2023

yaoyaoding Jul 12, 2023

Aalanli Jul 12, 2023

yaoyaoding commented Jul 13, 2023

[Quantization] Quantization API #309

[Quantization] Quantization API #309

Conversation

Aalanli commented Jul 11, 2023

yaoyaoding left a comment

Choose a reason for hiding this comment

yaoyaoding Jul 12, 2023

Choose a reason for hiding this comment

Aalanli Jul 12, 2023

Choose a reason for hiding this comment

yaoyaoding Jul 12, 2023

Choose a reason for hiding this comment

Aalanli Jul 12, 2023

Choose a reason for hiding this comment

yaoyaoding commented Jul 13, 2023