Consider adding support for unknown scales and zero_points #1407

sdasgup3 · 2023-04-14T01:25:23Z

The goal of the ticket is to track the support of unknown scales and zero-points. This is required to represent the scales and zero-points, in StableHLO graph, calculated on the fly by the training program while quantizing the activations.

Please refer to relevant discussion here.

@sngyhan

## Summary The PR proposes the spec for quantized dot-general op along with the specifications for a few other ops on which the dot-general depends on, for example, `slice`, `transpose`, and `reshape`. ## A few details Given `fp = tensor with floating-point type and q = tensor with uniformed quantized type`, the PR covers the semantics of (1) Static range quantized `dot_general` op `dot_general(q, q)`, and ~~(2) Hybrid quantized `dot_general` op `dot_general(fp, q)`: Currently, this version of the op only supports dynamic range quantization, where the on-the-fly quantization of `lhs` is fused in the op-semantics. IMO, once we support #1407, the quantization logic can be un-fused and made explicit in the MLIR graph (cc @sngyhan).~~ **update**: As per the [discussion](#1413 (comment)), it is decided to have only (1) in the spec. It might be too early to introduce (2), the "dynamic range quantizated" variant of the op, mainly because (a) only TFLite CPU implements it and (b) in the long, there are plans to implement dynamic range quantization expolicitly in the graph level. ## What comes next The plan forward is to propose a PR for convolution op in very near future. I realized that the spec for convolution depends on dot-general and a split might help the review process. Please let me know your review feedback.

lgeiger · 2024-06-20T22:19:23Z

Support for unknown scales would be incredibly useful for quantization aware training (QAT). What is the current status on this?

Maybe a bit of context, our use case is focused on QAT targeting a fully int8 quantized TFLite inference model. Currently we're relying on tf.quantization.fake_quant_with_min_max_vars on the training side. As far as I'm aware this is the only supported way at the moment but it would be great to be able to directly output StableHLO from jax or maybe even PyTorch for greater flexibility and better usability.

@abattery mentioned that he's interested in QAT as well and the odml team seems to have a way to inject stablehlo.uniform_quantize ops but I'm not sure what the latest status on these efforts are.

@sdasgup3 do you know whether there is interest in supporting QAT workflows via StableHLO from frontends like jax or PyTorch? I'd be very interested in getting involved and contributing towards any consolidated effort here since QAT is much easier to deal with from an ML training standpoint compared to post training quantization which always has the potential to introduce accuracy degradations if not done carefully.

sdasgup3 · 2024-06-24T16:00:56Z

@lgeiger Thanks for bringing this up and providing details about your case. This is on our radar for sometime, but did get a sufficiently motivating use-case (and bandwidth) to initiate work on it.

do you know whether there is interest in supporting QAT workflows via StableHLO from frontends like jax or PyTorch? I'd be very interested in getting involved and contributing towards any consolidated effort here

Much appreciated on your willingness to contribute! I will get back to you on you the question.

sdasgup3 added the Spec label Apr 14, 2023

sdasgup3 self-assigned this Apr 14, 2023

burmako changed the title ~~Represent unknown scales and zero-points~~ Consider adding support for unknown scales and zero_points Apr 14, 2023

sdasgup3 removed their assignment Apr 14, 2023

burmako mentioned this issue Apr 14, 2023

Introduce QuantizedType #1352

Merged

sdasgup3 mentioned this issue Apr 15, 2023

Specification for quantized DotGeneralOp #1413

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding support for unknown scales and zero_points #1407

Consider adding support for unknown scales and zero_points #1407

sdasgup3 commented Apr 14, 2023

lgeiger commented Jun 20, 2024

sdasgup3 commented Jun 24, 2024

Consider adding support for unknown scales and zero_points #1407

Consider adding support for unknown scales and zero_points #1407

Comments

sdasgup3 commented Apr 14, 2023

lgeiger commented Jun 20, 2024

sdasgup3 commented Jun 24, 2024