Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding support for unknown scales and zero_points #1407

Open
sdasgup3 opened this issue Apr 14, 2023 · 2 comments
Open

Consider adding support for unknown scales and zero_points #1407

sdasgup3 opened this issue Apr 14, 2023 · 2 comments
Labels

Comments

@sdasgup3
Copy link
Member

The goal of the ticket is to track the support of unknown scales and zero-points. This is required to represent the scales and zero-points, in StableHLO graph, calculated on the fly by the training program while quantizing the activations.

Please refer to relevant discussion here.

@sdasgup3 sdasgup3 added the Spec label Apr 14, 2023
@sdasgup3 sdasgup3 self-assigned this Apr 14, 2023
@burmako burmako changed the title Represent unknown scales and zero-points Consider adding support for unknown scales and zero_points Apr 14, 2023
@sdasgup3 sdasgup3 removed their assignment Apr 14, 2023
sdasgup3 added a commit that referenced this issue May 10, 2023
## Summary 
The PR proposes the spec for quantized dot-general op along with the
specifications for a few other ops on which the dot-general depends on,
for example, `slice`, `transpose`, and `reshape`.

## A few details
Given `fp = tensor with floating-point type and q = tensor with
uniformed quantized type`, the PR covers the semantics of
(1) Static range quantized `dot_general` op `dot_general(q, q)`, and 
~~(2) Hybrid quantized `dot_general` op `dot_general(fp, q)`: Currently,
this version of the op only supports dynamic range quantization, where
the on-the-fly quantization of `lhs` is fused in the op-semantics. IMO,
once we support #1407, the
quantization logic can be un-fused and made explicit in the MLIR graph
(cc @sngyhan).~~

**update**: As per the
[discussion](#1413 (comment)),
it is decided to have only (1) in the spec. It might be too early to
introduce (2), the "dynamic range quantizated" variant of the op, mainly
because (a) only TFLite CPU implements it and (b) in the long, there are
plans to implement dynamic range quantization expolicitly in the graph
level.


## What comes next
The plan forward is to propose a PR for convolution op in very near
future. I realized that the spec for convolution depends on dot-general
and a split might help the review process.

Please let me know your review feedback.
@lgeiger
Copy link

lgeiger commented Jun 20, 2024

Support for unknown scales would be incredibly useful for quantization aware training (QAT). What is the current status on this?

Maybe a bit of context, our use case is focused on QAT targeting a fully int8 quantized TFLite inference model. Currently we're relying on tf.quantization.fake_quant_with_min_max_vars on the training side. As far as I'm aware this is the only supported way at the moment but it would be great to be able to directly output StableHLO from jax or maybe even PyTorch for greater flexibility and better usability.

@abattery mentioned that he's interested in QAT as well and the odml team seems to have a way to inject stablehlo.uniform_quantize ops but I'm not sure what the latest status on these efforts are.

@sdasgup3 do you know whether there is interest in supporting QAT workflows via StableHLO from frontends like jax or PyTorch? I'd be very interested in getting involved and contributing towards any consolidated effort here since QAT is much easier to deal with from an ML training standpoint compared to post training quantization which always has the potential to introduce accuracy degradations if not done carefully.

@sdasgup3
Copy link
Member Author

@lgeiger Thanks for bringing this up and providing details about your case. This is on our radar for sometime, but did get a sufficiently motivating use-case (and bandwidth) to initiate work on it.

do you know whether there is interest in supporting QAT workflows via StableHLO from frontends like jax or PyTorch? I'd be very interested in getting involved and contributing towards any consolidated effort here

Much appreciated on your willingness to contribute! I will get back to you on you the question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

2 participants