All quantized ops should take scalar_type as an argument #34351

jerryzh168 · 2020-03-06T02:38:27Z

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qconcat.cpp#L111

because quantized::cat is fused from dequant - aten::cat - quantize and quantize has scalar_type argument.

We'll need to first implement a requantize that can change the scalar_type of a Tensor, maybe just use the to API?
And then add scalar_type argument for all the quantized ops, we can change one op in each PR.

cc @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @jerryzh168 @dzhulgakov

The text was updated successfully, but these errors were encountered:

imskr · 2020-03-06T18:01:02Z

Can I work on this? @jerryzh168

jerryzh168 · 2020-03-06T19:55:56Z

@imskr sure, feel free to submit a PR and tag us :)

imskr · 2020-03-06T20:59:36Z

@jerryzh168 Thanks. Do I have to add scalar_type as argument in quantized::cat?

jerryzh168 · 2020-03-06T22:58:13Z

@imskr yeah, I think what we need here is
PR 0. implement a requantize op, that can convert between different scalar_types
PR 1. add scalar_type as an argument for quantized::cat, and in the end of the op we'll call requantize
PR 2-?. add scalar_type for other quantized ops

I just discovered that none of our quantized ops has scalar_type argument. we want to add scalar_type for all of them.

imskr · 2020-03-07T09:21:20Z

I have to add scalar_type argument to functions in here right? @jerryzh168

jerryzh168 · 2020-03-09T18:33:02Z

@imskr yeah, and requantize is implemented in this PR: #34376, you can wait until that PR is landed

imskr · 2020-03-10T12:27:39Z

@jerryzh168 Thanks

imskr · 2020-03-12T18:30:14Z

@jerryzh168 Since requantize is implemented in PR: #34376 I can now add SCALAR_TYPE in file right?

masahi · 2020-03-16T01:42:33Z

@jerryzh168 I believe the output dtype of concat op should be the same as input qtensors, so adding scalar_type to qconcat args doesn't seem right to me.

Instead, now that you have a requantize op, I think a better way is to use the requantize op to make input quant params to be the same as output, and use the normal concat op (aten::cat). This is how TVM implements qconcat. https://github.com/apache/incubator-tvm/blob/master/src/relay/qnn/op/concatenate.cc#L132-L143

In my PR on translating quantized torch models to TVM apache/tvm#4977, I tried two different approaches for converting quantized::cat:

Dequantize -> normal concat -> Quantize (Torch way)
Requantize inputs and use normal concat (TVM way linked above)

I didn't see any accuracy loss using the second approach. And the second approach is better because there is no float values in the middle, if requanitze is implemented with fixed point math (which doesn't seem to be the case for Torch, https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/quantized/Quantizer.cpp#L359).

jerryzh168 · 2020-03-17T20:43:50Z

@masahi Thanks for providing your input! Yes for quantized::cat we might want to enforce inputs and outputs to have the same scalar_type, but this is not the case in general. If we take a step back and look at how we get quantized ops, it is through patterns like the following, taking quantized::conv2d from graph mode quantization as an example:

  std::string conv2d = R"(
graph(%a_quant, %packed_params, %r_scale, %r_zero_point, %r_dtype, %stride, %padding, %dilation, %groups):
        %a_dequant = aten::dequantize(%a_quant)
        %w_quant : Tensor, %b : Tensor? = quantized::conv2d_unpack(%packed_params)
        %w_dequant = aten::dequantize(%w_quant)
        %r = aten::conv2d(%a_dequant, %w_dequant, %b, %stride, %padding, %dilation, %groups)
        %r_quant = aten::quantize_per_tensor(%r, %r_scale, %r_zero_point, %r_dtype)
        return (%r_quant) )";

  std::string quantized_conv2d = R"(
graph(%a_quant, %packed_params, %r_scale, %r_zero_point, %r_dtype, %stride, %padding, %dilation, %groups):
        %r_quant = quantized::conv2d(%a_quant, %packed_params, %stride, %padding, %dilation, %groups, %r_scale, %r_zero_point)
        return (%r_quant) )";

Notice that %r_dtype in aten::quantize_per_tensor is ignored in quantized:::conv2d right now, which means this is not an equivalent transformation. And we do have cases when the scalar_type matters, cc @z-a-f for examples

For implementations, we can still explicitly enforce that input quantized Tensors should have the same scalar_type as output for ops like quantized::cat

masahi · 2020-03-17T20:52:25Z

I believe concat op doesn't need scalar_type argument (assuming it means output dtype), but for other ops it makes sense to add scalar_type.

From what I've seen, right now all Torch quantized ops output quint8 tensor as output, so there is no need to pass around scalar_type. Can I use this occasion to ask if Torch team has a plan for other output types, like uint4? If that is the case, the output scalar_type does really matter. Personally I'm very interested in 4 bit quantization support.

jerryzh168 · 2020-03-17T21:00:47Z

yeah, but we would like to keep consistent for all quantized ops. quantized::cat and similar ops can have checks in the ops that input/output scalar_types should be the same.

I think we'll be supporting more scalar_types if we see they become popular in different hardwares, but right now you can already do 4 bit quantization during training using fake quant to estimate the accuracy impact

z-a-f · 2020-03-18T19:34:30Z

FYI: It is good to have a scalar_type. For example in our LSTM implementation we want to have 32 bit quant for the internal state (c), but the input/output and gates are 8-bit. At the same time, sigmoid output is unsigned (could be signed tho), while tanh is signed. It makes it more "natural" to have different scalar types for different parts. Of course, we could make an exception, an adapt the gates to specific examples -- so this is not a strong preference for me

z-a-f · 2020-03-18T19:40:13Z

I believe concat op doesn't need scalar_type argument (assuming it means output dtype), but for other ops it makes sense to add scalar_type.

From what I've seen, right now all Torch quantized ops output quint8 tensor as output, so there is no need to pass around scalar_type. Can I use this occasion to ask if Torch team has a plan for other output types, like uint4? If that is the case, the output scalar_type does really matter. Personally I'm very interested in 4 bit quantization support.

For the the concat_out you are right, we can infer the type from the output tensor. The problem is that when there are different paths with different quantization params coming together, it is not 100% clear what the result should be. During the concatenation we have to requantize even if the scalar_type is the same for all tensors. For example, if the scale and zero_point are different. In such a case, imho, changing the scalar_type comes for free.

I think a good approach would be to allow scalar_type in the cat, but also allow the default None arg.

But yet again, this is not a strong preference for me, I can live with either implementation

z-a-f · 2020-03-18T19:50:52Z

BTW FYI: torch.cat tries promoting the types when concatenating. The problem is that it is not behaving properly (@jerryzh168 I wonder if we should open an issue about that):

xt = torch.ones((2, 1, 3), dtype=torch.int32)
yt = torch.ones((2, 1, 3), dtype=torch.int8)
zt = torch.cat([xt, yt])
print(zt)
# tensor([[[          1,           1,           1]],
#         [[          1,           1,           1]],
#         [[   16843009,         257,   176279536]],
#         [[          1,    94132789, -1342175233]]], dtype=torch.int32)
zt = torch.cat([xt, yt.to(torch.int32)])
print(zt)
# tensor([[[1, 1, 1]],
#         [[1, 1, 1]],
#         [[1, 1, 1]],
#         [[1, 1, 1]]], dtype=torch.int32)

In numpy this is behaving properly:

x = np.ones((2, 1, 3), dtype=np.int32)
y = np.ones((2, 1, 3), dtype=np.int8)
z = np.concatenate([x,y])
print(z)
# array([[[1, 1, 1]],
#        [[1, 1, 1]],
#        [[1, 1, 1]],
#        [[1, 1, 1]]], dtype=int32)

jerryzh168 · 2020-03-18T20:23:29Z

BTW FYI: torch.cat tries promoting the types when concatenating. The problem is that it is not behaving properly (@jerryzh168 I wonder if we should open an issue about that):

xt = torch.ones((2, 1, 3), dtype=torch.int32)
yt = torch.ones((2, 1, 3), dtype=torch.int8)
zt = torch.cat([xt, yt])
print(zt)
# tensor([[[          1,           1,           1]],
#         [[          1,           1,           1]],
#         [[   16843009,         257,   176279536]],
#         [[          1,    94132789, -1342175233]]], dtype=torch.int32)
zt = torch.cat([xt, yt.to(torch.int32)])
print(zt)
# tensor([[[1, 1, 1]],
#         [[1, 1, 1]],
#         [[1, 1, 1]],
#         [[1, 1, 1]]], dtype=torch.int32)

In numpy this is behaving properly:

x = np.ones((2, 1, 3), dtype=np.int32)
y = np.ones((2, 1, 3), dtype=np.int8)
z = np.concatenate([x,y])
print(z)
# array([[[1, 1, 1]],
#        [[1, 1, 1]],
#        [[1, 1, 1]],
#        [[1, 1, 1]]], dtype=int32)

we don't allow type promotion for quantized data types since quantized data types are not meaningful independently, they are coupled with other quantization parameters like scale/zero_point to represent the quantization, so the idea of original data type promotion doesn't really apply here I think.

raghuramank100 · 2020-03-18T21:30:15Z

Lets not do type promotion in concat. The input data types should be the same. I do not see the need for a scalar_type for cat. The output type should be the same as that of the input

dzhulgakov · 2020-03-19T00:47:39Z

Btw, in any case for regular cat we should not return garbage - let's error out or fix it properly (meaning yes, let's create an issue for it)

z-a-f · 2020-03-19T01:42:53Z

@dzhulgakov #35014

andrewor14 · 2023-11-17T22:42:02Z

@jerryzh168 Is this something we still want to do?

jerryzh168 · 2023-11-17T22:46:09Z

probably not, since the native pytorch quantized ops on CPU are no longer important for us

jerryzh168 added oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 6, 2020

jerryzh168 assigned z-a-f Mar 6, 2020

jerryzh168 changed the title ~~quantized::cat should take scalar_type as an argument~~ All quantized ops should take scalar_type as an argument Mar 6, 2020

andrewor14 unassigned z-a-f Nov 17, 2023

andrewor14 added the low priority We're unlikely to get around to doing this in the near future label Nov 17, 2023

andrewor14 closed this as completed Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All quantized ops should take scalar_type as an argument #34351

All quantized ops should take scalar_type as an argument #34351

jerryzh168 commented Mar 6, 2020 •

edited by pytorch-bot bot

imskr commented Mar 6, 2020 •

edited

jerryzh168 commented Mar 6, 2020

imskr commented Mar 6, 2020

jerryzh168 commented Mar 6, 2020 •

edited

imskr commented Mar 7, 2020

jerryzh168 commented Mar 9, 2020

imskr commented Mar 10, 2020

imskr commented Mar 12, 2020 •

edited

masahi commented Mar 16, 2020 •

edited

jerryzh168 commented Mar 17, 2020 •

edited

masahi commented Mar 17, 2020 •

edited

jerryzh168 commented Mar 17, 2020

z-a-f commented Mar 18, 2020

z-a-f commented Mar 18, 2020

z-a-f commented Mar 18, 2020 •

edited

jerryzh168 commented Mar 18, 2020

raghuramank100 commented Mar 18, 2020

dzhulgakov commented Mar 19, 2020

z-a-f commented Mar 19, 2020

andrewor14 commented Nov 17, 2023

jerryzh168 commented Nov 17, 2023

All quantized ops should take scalar_type as an argument #34351

All quantized ops should take scalar_type as an argument #34351

Comments

jerryzh168 commented Mar 6, 2020 • edited by pytorch-bot bot

imskr commented Mar 6, 2020 • edited

jerryzh168 commented Mar 6, 2020

imskr commented Mar 6, 2020

jerryzh168 commented Mar 6, 2020 • edited

imskr commented Mar 7, 2020

jerryzh168 commented Mar 9, 2020

imskr commented Mar 10, 2020

imskr commented Mar 12, 2020 • edited

masahi commented Mar 16, 2020 • edited

jerryzh168 commented Mar 17, 2020 • edited

masahi commented Mar 17, 2020 • edited

jerryzh168 commented Mar 17, 2020

z-a-f commented Mar 18, 2020

z-a-f commented Mar 18, 2020

z-a-f commented Mar 18, 2020 • edited

jerryzh168 commented Mar 18, 2020

raghuramank100 commented Mar 18, 2020

dzhulgakov commented Mar 19, 2020

z-a-f commented Mar 19, 2020

andrewor14 commented Nov 17, 2023

jerryzh168 commented Nov 17, 2023

jerryzh168 commented Mar 6, 2020 •

edited by pytorch-bot bot

imskr commented Mar 6, 2020 •

edited

jerryzh168 commented Mar 6, 2020 •

edited

imskr commented Mar 12, 2020 •

edited

masahi commented Mar 16, 2020 •

edited

jerryzh168 commented Mar 17, 2020 •

edited

masahi commented Mar 17, 2020 •

edited

z-a-f commented Mar 18, 2020 •

edited