-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out the story for hybrid quantization #1575
Comments
Thanks @burmako for bringing the topic. Let me add a bit context around it to further the discussion. A few definitions which might be handy in the presentation: Quantization Techniques
Let us have a look at convolution op which can be used to implement one of the the above techniques:
We note that we also call this a hybrid op where operands are of different types (activation Op unifying the operand types to
|
### Summary The PR proposes the specification for quantized element wise operations (total 47). ### Details Overall, we propose treating quantized elements as floating-point elements, and therefore ops on quantized tensors as ops on floating-point tensors, along the lines of dequantize -> float computation -> quantize. This principle works for most elementwise ops, although there are some exceptions discussed below. Furthermore, the proposal is to only support per-tensor quantization for elementwise ops. We haven't yet come across use cases for per-axis quantization for these ops, so let's start small. The story for per-axis quantization will be worked out in #1574. Finally, we propose to not support hybrid quantization (i.e. situations when some inputs/outputs are quantized and some are not) for now. The story for this will be worked out in #1575. - **Ops that support tensors of floating-point types (33)**: These ops support quantized tensors, with semantics following dequantize -> float computation -> quantize. We are using the `dequantize_op_quantize` function to express it: - Binary (10): `add, atan2, compare, divide, maximum, minimum, multiply, power, remainder, subtract`. - Unary (20): `abs, cbrt, ceil, cosine, exponential, exponential_minus_one, floor, is_finite, log, logistic, log_plus_one, negate, reduce_precision, round_nearest_afz, round_nearest_even, rsqrt, sign, sine, sqrt, tanh`. - Ternary (2): `clamp, select`. - Other (1): `map`. - **Ops that don't support tensors of floating-point types (9)**: These ops (`and, count_leading_zero, not, or, popcnt, shift_left, shift_right_arithmetic, shift_right_logical, xor`) don't support quantized tensors. If there is a need to perform computations on the underlying integer representation of these tensors, they can be bitcast_convert'ed to integers. - **Ops that involve complex types (3)**: These ops (`complex`, `imag`, `real`) don't support quantized tensors because quantization doesn't compose with complex types at the moment. - **Conversion ops (2)**: - `convert`: A convert from a quantized type to any type can be realized using `stablehlo.uniform_dequantize` followed by `stabhle.convert` to convert the dequantized floating-point type to type of choice. Similarly, a convert from any type to quantized type can be realized using `stablehlo.convert` to floating-point type followed by `stablehlo.uniform_quantize`. It's not necessarily great that we have 3 ops to represent something that could theoretically be represented by 1 op, and we're planning to explore a potential simplification in #1576. - `bitcast_convert`: Works with low-level representations, so it treats quantized elements as integer elements.
#1792 proposes semantic changes in StableHLO to support weight only quantization for convolution and dot_general ops. Remaining tasks:
|
…y quantization (#1792) This RFC proposes to add hybrid quantized convolution and dot_general for weight-only quantization. Please let me know your feedback on this. The RFC partially address the issue #1575 w.r.t supporting weight only quantization support in StableHLO. The remaining tasks for the above the are highlighted [here](#1575 (comment)).
With #1792 merged let us close this issue. We will open a separate ones for the remaining one #1575 (comment) once we have more information around them. |
At the moment, dot_general (as well as convolution as proposed in #1477) don't support hybrid quantization, e.g. float lhs and quantized rhs. However, this is an important practical use case. How do we represent it?
The text was updated successfully, but these errors were encountered: