This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Why QAT_torch_quantizer.py quantized the output of ReLU? #4984

Unanswered

Simba98 asked this question in Q&A

Simba98
Jul 4, 2022

https://github.com/microsoft/nni/blob/master/examples/model_compress/quantization/QAT_torch_quantizer.py#L53

Its comment talks about

# 2. Same tensor should be quantized only once. For example, if a tensor is the output of layer A and the input
    # of the layer B, you should configure either {'quant_types': ['output'], 'op_names': ['a']} or
    # {'quant_types': ['input'], 'op_names': ['b']} in the configure_list.

The ops in mnist.naive https://github.com/microsoft/nni/blob/master/examples/model_compress/models/mnist/naive.py#L19
is conv1 -> relu1 -> max_pool -> conv2.
So according to hint2, the input of conv2 is the max output of relu1; as the input of conv2 is quantized to 8bit, the output of relu1 is duplicated.

Thx.

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment