-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All quantized ops should take scalar_type as an argument #34351
Comments
Can I work on this? @jerryzh168 |
@imskr sure, feel free to submit a PR and tag us :) |
@jerryzh168 Thanks. Do I have to add |
@imskr yeah, I think what we need here is I just discovered that none of our quantized ops has |
I have to add |
@jerryzh168 Thanks |
@jerryzh168 Since |
@jerryzh168 I believe the output dtype of concat op should be the same as input qtensors, so adding Instead, now that you have a requantize op, I think a better way is to use the requantize op to make input quant params to be the same as output, and use the normal concat op ( In my PR on translating quantized torch models to TVM apache/tvm#4977, I tried two different approaches for converting
I didn't see any accuracy loss using the second approach. And the second approach is better because there is no float values in the middle, if requanitze is implemented with fixed point math (which doesn't seem to be the case for Torch, https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/quantized/Quantizer.cpp#L359). |
@masahi Thanks for providing your input! Yes for
Notice that For implementations, we can still explicitly enforce that input quantized Tensors should have the same scalar_type as output for ops like |
I believe concat op doesn't need From what I've seen, right now all Torch quantized ops output quint8 tensor as output, so there is no need to pass around |
yeah, but we would like to keep consistent for all quantized ops. I think we'll be supporting more scalar_types if we see they become popular in different hardwares, but right now you can already do 4 bit quantization during training using fake quant to estimate the accuracy impact |
FYI: It is good to have a |
For the the I think a good approach would be to allow But yet again, this is not a strong preference for me, I can live with either implementation |
BTW FYI: xt = torch.ones((2, 1, 3), dtype=torch.int32)
yt = torch.ones((2, 1, 3), dtype=torch.int8)
zt = torch.cat([xt, yt])
print(zt)
# tensor([[[ 1, 1, 1]],
# [[ 1, 1, 1]],
# [[ 16843009, 257, 176279536]],
# [[ 1, 94132789, -1342175233]]], dtype=torch.int32)
zt = torch.cat([xt, yt.to(torch.int32)])
print(zt)
# tensor([[[1, 1, 1]],
# [[1, 1, 1]],
# [[1, 1, 1]],
# [[1, 1, 1]]], dtype=torch.int32) In numpy this is behaving properly: x = np.ones((2, 1, 3), dtype=np.int32)
y = np.ones((2, 1, 3), dtype=np.int8)
z = np.concatenate([x,y])
print(z)
# array([[[1, 1, 1]],
# [[1, 1, 1]],
# [[1, 1, 1]],
# [[1, 1, 1]]], dtype=int32) |
we don't allow type promotion for quantized data types since quantized data types are not meaningful independently, they are coupled with other quantization parameters like scale/zero_point to represent the quantization, so the idea of original data type promotion doesn't really apply here I think. |
Lets not do type promotion in concat. The input data types should be the same. I do not see the need for a scalar_type for cat. The output type should be the same as that of the input |
Btw, in any case for regular cat we should not return garbage - let's error out or fix it properly (meaning yes, let's create an issue for it) |
@jerryzh168 Is this something we still want to do? |
probably not, since the native pytorch quantized ops on CPU are no longer important for us |
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qconcat.cpp#L111
because quantized::cat is fused from
dequant - aten::cat - quantize
and quantize has scalar_type argument.We'll need to first implement a requantize that can change the scalar_type of a Tensor, maybe just use the
to
API?And then add
scalar_type
argument for all the quantized ops, we can change one op in each PR.cc @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @jerryzh168 @dzhulgakov
The text was updated successfully, but these errors were encountered: