-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose #20362
Conversation
} \ | ||
} \ | ||
assert(output_index == static_cast<size_t>(N * broadcast_dim * block_size)); \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this variable is called "block_size", it is not the same thing as the new block_size attribute. #Resolved
for (size_t bd = 0; bd < static_cast<size_t>(broadcast_dim); bd++) { \ | ||
size_t bd_i = bd >> 1; /*bd / 2*/ \ | ||
size_t bd_j = bd & 0x1; /*bd % 2*/ \ | ||
INT4_TYPE::UnpackedType zp = zero_point ? zero_point[bd_i].GetElem(bd_j) : 0; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scale and zero-point inputs do have the same shape. Please refer to onnx spec: https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html
Both zero-point and scale have the same shape in this code as well. The zero-point input is stored as a packed int4, so we have to get the correct 4-bit element.
Can you please clarify what you think needs to be updated? #Resolved
…ed in onnx model (onnx bug)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
static bool Pack(gsl::span<Int4x2Base<Signed>> dst, gsl::span<const UnpackedType> src) { | ||
if (src.empty() || (CalcNumInt4Pairs(src.size()) != dst.size())) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does a return value of false mean it failed? regarding return value, can the handling of an empty src be made consistent with Unpack()?
/// </summary> | ||
/// <param name="elt_type">Data type of the tensor elements.</param> | ||
/// <param name="shape_size">The number of elements indicated by the shape (i.e., shape.Size()).</param> | ||
/// <returns>Number of Tensor elements. Returns -1 if shape_size is negative.</returns> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is returning shape_size
for a negative shape_size
. but I guess -1 is the only expected value for a negative shape_size
.
\ | ||
gsl::span<const INT4_TYPE> src_span = gsl::make_span(reinterpret_cast<const INT4_TYPE*>(unpacked_tensor.data()), \ | ||
num_packed_pairs); \ | ||
gsl::span<INT4_TYPE> dst_span = gsl::make_span(p_data, expected_num_elements); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gsl::make_span(p_data, expected_num_elements)
should the span length be num_packed_pairs?
is there much benefit to using spans here if they're just provided to memcpy?
using UnpackedType = typename Int4Traits<Signed>::UnpackedType; | ||
|
||
for (size_t n = 0; n < N; n++) { | ||
float FloatValue = std::nearbyintf(Input[n] / Scale) + static_cast<float>(ZeroPoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will std::nearbyintf round to nearest even here? assuming we want that mode as it's specified for ONNX QuantizeLinear
https://github.com/onnx/onnx/blob/093a8d335a66ea136eb1f16b3a1ce6237ee353ab/docs/Operators.md?plain=1#L20288
MLASCALL | ||
MlasQuantizeLinearU4( | ||
const float* Input, | ||
uint8_t* Output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Output
is to be interpreted as bytes vs 8-bit unsigned integers, right? if so, would std::byte would be clearer?
…led (#20889) ### Description The recent [PR for int4 support](#20362) breaks builds with the onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS option enabled. This PR adds utility functions for debug printing of int4 tensor statistics and data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
auto ShortVector1 = vec_pack(IntegerVector2, IntegerVector3); | ||
|
||
auto CharVector = vec_pack(ShortVector0, ShortVector1); | ||
vec_xst(CharVector, 0, static_cast<int8_t *>(&TmpOutput[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line has broken the build for some compiler versions. Vector commands need C-style casting.
vec_xst(CharVector, 0, (int8_t *)(&TmpOutput[0]));
Let me know if you are fixing this or if you want me to create a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ChipKerchner, apologies for the inconvenience. Here's the PR: #20957
### Description Uses C-style casting for Power vector instructions in `MlasQuantizeLinearInt4Kernel`. ### Motivation and Context Vector commands (e.g., vec_xst) need C-style casting to support various compiler versions. ONNX Runtime CI pipelines do not build with all compiler versions. The recent INT4 PR broke the powerpc build for certain compiler versions because it uses C++-style `static_cast<>`. See: #20362 (comment) Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>
Description
block_size
attribute)block_size
attribute)Notes
To calculate a tensor's storage size, we normally get the number of elements from the shape (i.e.,
tensor_shape.Size()
) and multiply by the size of a single element. This does not directly work for sub-byte elements like int4 as each element in aTensor<Int4x2>
stores two packed int4 elements in a byte. TheTensor:: CalculateTensorStorageSize
should be called to perform the correct calculation for any tensor element type.Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4 type to ORT and adds int4 implementations for the Quant, Dequant, and Transpose ops on CPU EP. We still need to add int4 support for many ops and execution providers. See the ONNX 1.16 release notes: https://github.com/onnx/onnx/releases.