Add LSQ quantizer #3503

chenbohua3 · 2021-03-31T08:46:20Z

This PR contains an implementation of LSQ quantizer (Learned Step Size Quantization, ICLR 2020, see here). It uses gradients to update quantization scales and can achieve sound results in our production environment, especially for lower bits.

In the mnist experiment, it can get about 99.20% top1 acc. And the results on imagenet-1k are on going.

ghost · 2021-03-31T08:46:32Z

All CLA requirements met.

linbinskn · 2021-04-09T12:00:41Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

@@ -146,7 +146,7 @@ def __init__(self, model, config_list, optimizer=None):
                    types of nn.module you want to apply quantization, eg. 'Conv2d'
        """
        super().__init__(model, config_list, optimizer)
-        self.quant_grad = QATGrad
+        self.quant_grad = QATGrad.apply


Why we have to move apply here instead of using it directly?

So it is for avoiding STE in LSQ quantizer.

Yes, it is aimed at unifying the framework of quantizers with customized gradient and quantizers with auto-grad gradient. Also, use.apply is the way recommended by PyTorch (see here)

linbinskn · 2021-04-09T12:06:03Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+
+
+class LsqQuantizer(Quantizer):


Please add docstring as the other Quantizers, especially for parameters and return.

linbinskn · 2021-04-09T12:17:09Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+            if "weight" in config.get("quant_types", []):
+                # todo: support per-channel quantization for weight since TensorRT it for conv weight
+                q_bit = get_bits_length(config, "weight")


In current implementation, we only support single bit quantization in LsqQuantizer? Can we support mixed precision right now?

It seems that mixed precision of quantization is supported in this implementation since each layer has its own q_bit. We can achieve mixed quantization through some specific settings in config_list like：

configure_list = [{ 'quant_types': ['weight'], 'quant_bits': 8, 'op_types': ['Conv2d'], 'op_names': ['features.3'] }, { 'quant_types': ['weight'], 'quant_bits': 7, 'op_types': ['Conv2d'], 'op_names': ['features.6'] }]

linbinskn · 2021-04-09T12:20:34Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+            # todo: in the origin paper, the initial value of activation is calculated from first input batch
+            if "output" in config.get("quant_types", []):
+                q_bit = get_bits_length(config, "")


Same question with single bit weight.

same as the weights

linbinskn · 2021-04-09T12:28:46Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+    def quantize(self, x, scale, zero_point, qmin, qmax):
+        grad_scale_factor = 1.0 / ((qmax * x.numel()) ** 0.5)
+        scale = self.grad_scale(scale, grad_scale_factor)


A little confused about the name of value and function. Can we polish naming here or in grad_scale function? For instance, change the second parameter name 'scale' to 'scale_factor'.

The names of functions and variables are the same as those defined in the paper.

linbinskn · 2021-04-09T12:35:38Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+        module = wrapper.module
+        output = self.quantize(output, module.scale, module.zero_point, module.activation_qmin, module.activation_qmax)
+        return output


Can this quantization algorithm support exporting model and related quantization parameters? If yes, maybe we can consider adding function export_model() based on what parameters should export to inference framework like TensorRT.

I will check it out

linbinskn · 2021-04-09T12:52:11Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+    def __init__(self, model, config_list, optimizer=None):
+        super().__init__(model, config_list, optimizer)
+        self.quant_grad = QuantForward()


If we keep the original forward and backward structure, the Lsq can forward as usual and backward by STE. In this way, will it be anything wrong? May be have something to do with the update of scale and zeropoint.

There will not be anything wrong if the gradients are handled carefully. However, there exists one major limitation for the origin framework, that is, we must customize all gradients for all learnable parameters. If the gradient-based algorithms become complex, it will be troubling and error-prone to do the customization. In this situation, I think using the auto-grad system to determine the gradient is more convenient for users.

linbinskn · 2021-05-11T14:07:57Z

Many good points in this PR! Please test exported model on TensorRT and modify initialization of activation scale.

chenbohua3 · 2021-05-14T05:52:57Z

I have added codes about using the first batch data to initialize activation scale. After that, we can get about 99.20% top1 acc with the provided example.

Also, I have tested exporting the model to TensorRT. The transformed TensorRT model get almost the same acc with the PyTorch model.

linbinskn · 2021-05-14T06:37:57Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+            qmax = module.activation_qmax
+            init_oup_scale = output.data.detach().abs().mean() * 2 / (qmax ** 0.5)
+            module.scale.data = init_oup_scale


It seems that weight and activation use the same scale in single module which means weight and activation have the same rescale parameter, and the value of scale will update by the gradient of weight and activation simultaneously. What consequence would be caused if we quantized both weight and activation of the same layer? Would it cause something wrong?

Yes you are right:) Now each layer will construct input_scale/weight_scale/output_sclae according to the config setting.

linbinskn · 2021-05-17T00:48:00Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

-                calibration_config[name]['tracked_min_activation'] = -abs_max_activation
-                calibration_config[name]['tracked_max_activation'] = abs_max_activation
+            if hasattr(module, 'input_bit'):
+                calibration_config[name]['weight_bit'] = int(module.input_bit)


Why assigning calibration_config[name]['weight_bit'] with module.input_bit instead of module.weight_bit. If 'weight_bit' is not equal to input_bit when setting the config, the export result will be incorrect.

According to here, weight_bit is used to determine whether set input tensor's dynamic ranges or not, which I think may be not appropriate. Assigning input_bit to weight_bit here is just to be consistent with it.

Currently, we choose to record range of input tensor during the process of quantizing weight in the algorithm QAT. The reason why we handle it in this way is the requirement of integration with TensorRT which needs input tensor's dynamic range when setting layer precision to 8bit. So we record input dynamic range as here.
And if we want to export LSQ model to TensorRT, input dynamic range should also be set in most situations and input_bit should be the same as weight_bit.
However, it is still strange not to set calibration_config[name]['weight_bit] with weight_bit since we already have the value of weight_bit ==.

Got it. How about changing the codes like:

if hasattr(module, 'weight_bit'): calibration_config[name]['weight_bit'] = int(module.weight_bit) abs_max_input = float(module.input_scale * module.input_qmax) calibration_config[name]['tracked_min_input'] = -abs_max_input calibration_config[name]['tracked_max_input'] = abs_max_input

linbinskn · 2021-05-17T03:19:04Z

Looks good. Completing related doc is necessary. Please refer to overview, quantization and Quantizer. Feel free to ask me if have any questions.

J-shang · 2021-05-17T05:09:07Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+       Learned Step Size Quantization (ICLR 2020)
+       https://arxiv.org/pdf/1902.08153.pdf
+       """


please align

J-shang · 2021-05-17T06:17:59Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+        new_input = self.quantize(inputs[0], module.input_scale, module.input_qmin, module.input_qmax)
+        list_inp = list(inputs)
+        list_inp[0] = new_input


why we only quantize the first input

It seems that currently the quantization framework only supports layers with single input (see here, so is the trt backend, see here ). So current implementation does not support layers with multi inputs. It may be a better choice to modify the lsq quantizer to support layers with multi inputs after the framework supports it.

got it, it is reasonable

chenbohua3 · 2021-05-17T07:09:00Z

Docs have been added:)

linbinskn · 2021-05-17T08:21:12Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+                    type of quantization you want to apply, currently support 'weight', 'input', 'output'
+                - quant_bits : int or dict of {str : int}
+                    bits length of quantization, key is the quantization type, value is the length, eg. {'weight', 8},


{'weight', 8} -> {'weight': 8}

linbinskn · 2021-05-17T08:24:18Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+
+            if "input" in config.get("quant_types", []):
+                # scale of activation will be initialized using the first batch data


activation -> input

linbinskn · 2021-05-17T08:31:59Z

nni/algorithms/compression/pytorch/quantization/quantizers.py

+    def grad_scale(x, scale):
+        """
+            Used to scale the gradient


Recommend explaining this function in detail since both of reviewers were confused during reviewing this part. Whatever, I think this function is also part of key implementation of LSQ which can helps others understand the insight of this algorithm.

QuanluZhang · 2021-05-18T01:44:52Z

docs/en_US/Compression/Quantizer.rst

+
+..
+
+   We introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer’s quantizer step size, such that it can be learned in conjunction with other network parameters.


We -> The authors

QuanluZhang · 2021-05-18T01:47:04Z

docs/en_US/Compression/Quantizer.rst

+    quantizer = LsqQuantizer(model, configure_list, optimizer)
+    quantizer.compress()
+
+You can view example for more information


better to add a hyperlink to the example

QuanluZhang · 2021-05-18T01:49:31Z

examples/model_compress/quantization/LSQ_torch_quantizer.py

+    model = Mnist()
+    '''you can change this to DoReFaQuantizer to implement it
+    DoReFaQuantizer(configure_list).compress(model)
+    '''


this comment can be removed

QuanluZhang · 2021-05-18T03:08:03Z

@chenbohua3 looks great, thanks for your contribution!

QuanluZhang requested review from linbinskn and J-shang April 2, 2021 00:28

linbinskn reviewed Apr 9, 2021

View reviewed changes

chenbohua3 force-pushed the add_lsq branch from 6a62c2e to dc7f97b Compare April 16, 2021 09:51

ultmaster assigned linbinskn and J-shang and unassigned linbinskn Apr 28, 2021

ultmaster added this to Review in progress in v2.3 Apr 28, 2021

ultmaster added external contributor model compression labels Apr 30, 2021

chenbohua3 added 11 commits May 12, 2021 16:34

add LSQ quantizer.

77e82d1

rename example

6eee549

add scale & zero point to optimizer

0287994

may remove zero point

d980728

remove

727a58e

add doc

6f2b69c

fix

ad36ea0

fix

3a6d6d1

export

c20cb9f

fix

f0922c4

use first batch data to initialize activation scale

0e9b8f0

chenbohua3 force-pushed the add_lsq branch from f31b019 to 0e9b8f0 Compare May 14, 2021 05:44

linbinskn reviewed May 14, 2021

View reviewed changes

support tensorrt

369115c

linbinskn reviewed May 17, 2021

View reviewed changes

linbinskn approved these changes May 17, 2021

View reviewed changes

refine

b326b58

J-shang reviewed May 17, 2021

View reviewed changes

add docs

15402a9

align

e91ad68

J-shang approved these changes May 17, 2021

View reviewed changes

chenbohua3 requested review from J-shang and linbinskn May 17, 2021 07:49

linbinskn reviewed May 17, 2021

View reviewed changes

chenbohua3 added 2 commits May 17, 2021 19:05

refine doc

c0263f1

refine doc

671d2d9

J-shang approved these changes May 18, 2021

View reviewed changes

linbinskn approved these changes May 18, 2021

View reviewed changes

QuanluZhang reviewed May 18, 2021

View reviewed changes

refine doc

4f41433

v2.3 automation moved this from Review in progress to Reviewer approved May 18, 2021

QuanluZhang approved these changes May 18, 2021

View reviewed changes

QuanluZhang merged commit af929fd into microsoft:master May 18, 2021

v2.3 automation moved this from Reviewer approved to Done May 18, 2021

ultmaster mentioned this pull request May 31, 2021

NNI 2021 May~June Iteration Planning #3581

Closed

49 tasks


		if "input" in config.get("quant_types", []):
		# scale of activation will be initialized using the first batch data


		..

		We introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer’s quantizer step size, such that it can be learned in conjunction with other network parameters.



		class LsqQuantizer(Quantizer):

Add LSQ quantizer #3503

Add LSQ quantizer #3503

Conversation

chenbohua3 commented Mar 31, 2021 • edited

ghost commented Mar 31, 2021 • edited by ghost

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linbinskn Apr 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenbohua3 Apr 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenbohua3 Apr 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linbinskn commented May 11, 2021

chenbohua3 commented May 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linbinskn commented May 17, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenbohua3 May 17, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenbohua3 commented May 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuanluZhang commented May 18, 2021

chenbohua3 commented Mar 31, 2021 •

edited

ghost commented Mar 31, 2021 •

edited by ghost

linbinskn Apr 9, 2021 •

edited

chenbohua3 Apr 18, 2021 •

edited

chenbohua3 Apr 18, 2021 •

edited

linbinskn commented May 17, 2021 •

edited

chenbohua3 May 17, 2021 •

edited