-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding support for unknown scales and zero_points #1407
Comments
## Summary The PR proposes the spec for quantized dot-general op along with the specifications for a few other ops on which the dot-general depends on, for example, `slice`, `transpose`, and `reshape`. ## A few details Given `fp = tensor with floating-point type and q = tensor with uniformed quantized type`, the PR covers the semantics of (1) Static range quantized `dot_general` op `dot_general(q, q)`, and ~~(2) Hybrid quantized `dot_general` op `dot_general(fp, q)`: Currently, this version of the op only supports dynamic range quantization, where the on-the-fly quantization of `lhs` is fused in the op-semantics. IMO, once we support #1407, the quantization logic can be un-fused and made explicit in the MLIR graph (cc @sngyhan).~~ **update**: As per the [discussion](#1413 (comment)), it is decided to have only (1) in the spec. It might be too early to introduce (2), the "dynamic range quantizated" variant of the op, mainly because (a) only TFLite CPU implements it and (b) in the long, there are plans to implement dynamic range quantization expolicitly in the graph level. ## What comes next The plan forward is to propose a PR for convolution op in very near future. I realized that the spec for convolution depends on dot-general and a split might help the review process. Please let me know your review feedback.
Support for unknown scales would be incredibly useful for quantization aware training (QAT). What is the current status on this? Maybe a bit of context, our use case is focused on QAT targeting a fully int8 quantized TFLite inference model. Currently we're relying on @abattery mentioned that he's interested in QAT as well and the odml team seems to have a way to inject @sdasgup3 do you know whether there is interest in supporting QAT workflows via StableHLO from frontends like jax or PyTorch? I'd be very interested in getting involved and contributing towards any consolidated effort here since QAT is much easier to deal with from an ML training standpoint compared to post training quantization which always has the potential to introduce accuracy degradations if not done carefully. |
@lgeiger Thanks for bringing this up and providing details about your case. This is on our radar for sometime, but did get a sufficiently motivating use-case (and bandwidth) to initiate work on it.
Much appreciated on your willingness to contribute! I will get back to you on you the question. |
The goal of the ticket is to track the support of unknown scales and zero-points. This is required to represent the scales and zero-points, in StableHLO graph, calculated on the fly by the training program while quantizing the activations.
Please refer to relevant discussion here.
The text was updated successfully, but these errors were encountered: