-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize the constraints on QuantizationZeroPoint #1405
Comments
Hi, I didn't get a chance to comment during the discussion meeting today, so I'd add my comment here. I feel 8bit zero point is more of a TFLite x INT8 quant specific choice, and in StableHLO spec it is better to be more general than that. For example, if we think about INT4 quantization, do we want to restrict the zero point to 8bit too? In this case, 8-bit seems a much more magic/arbitrary number in this case (why not 4bit, 5bit, etc.). |
Hi, thanks for the today's discussion!
If
As you can see the zero point is -510 which doesn't fit into u8 data type. |
Thanks @igorsafo for bringing this up and thanks for the super useful example. Let me share a bit of history on why we end up at Later based on the discussion, we decided to add the constraint Coming back at the current discussion: I agree that, per the current form of specification, the running example is not allowed as the zero-point goes out of Just to get a bit more context: Do you mind sharing on the motivation for not including |
@sdasgup3 Thank you for the links! It makes a lot of sense! |
Thanks for the information! update |
I think it is just a catch all to express zero-point in any supported quantized dtype. Per the formula For now, let us close this issue with the resolution that zp should be the same or narrower than the storage type enforcing that the |
The goal of the ticket is to resolve the following discussions around the quantization
zero_point
:update
3. Following the TFlite op behavior, should zero_points be be restricted to certain ranges (e.g.,
tanh
op)?The text was updated successfully, but these errors were encountered: