Clean up and fix numpy_helper and subbyte #6124

justinchuby · 2024-05-03T02:37:07Z

complex number handling in numpy_helper
Otherwise the line
```
return np.asarray(data, dtype=storage_np_dtype).astype(np_dtype).reshape(dims)
```
raises TypeError: float() argument must be a string or a real number, not 'complex', because storage_np_dtype is float but data is complex already.
Vectorize float8 conversion functions and improve readability: Speed up float8e4m3_to_float32 by 10.3x (1000x1000 input, 10 iterations, 34.829s -> 3.11s)
Clean up int4 numpy helpers to make them more useful and performant with np native vectorization. Move all int4 related functions to the subbyte module.
Improve handling of big-endian systems
Remove the dims parameter in numpy helper functions to simplify the implementation.
Improve reference evaluator to_array_extended

@galagam for int4 updates, @AlexandreEichenberger for big-endian handling @xadupre for float8 functions and reference evaluator. Thanks!

Float 8 util speed test

from onnx import numpy_helper
import numpy as np
import pyinstrument
import onnx
floats = np.random.randint(0, 255, [1000, 1000], dtype=np.uint8)

print(onnx.__version__)
profiler = pyinstrument.Profiler()
profiler.start()
for i in range(10):
    print(i)
    numpy_helper.float8e4m3_to_float32(floats)
profiler.stop()
profiler.print()

TODO: Unit tests

Fixes #6126

github-actions · 2024-05-03T02:53:45Z

Test Results

3 files ±0 3 suites ±0 2m 15s ⏱️ +8s
7 486 tests ±0 4 386 ✅ - 70 3 030 💤 ± 0 70 ❌ + 70
22 441 runs +5 13 051 ✅ - 135 9 180 💤 - 70 210 ❌ +210

For more details on these failures, see this check.

Results for commit 64be57c. ± Comparison against base commit 013eb5e.

♻️ This comment has been updated with latest results.

xadupre · 2024-05-03T08:44:46Z

Maybe it is worth adding a unit test.

third_party/benchmark

justinchuby · 2024-05-04T02:15:41Z

onnx/numpy_helper.py

    """
-    single_func = lambda x: subbyte.unpack_single_4bitx2(x, signed)  # noqa: E731
-    func = np.frompyfunc(single_func, 1, 2)


frompyfunc is not performant

onnx/numpy_helper.py

-    if tensor_dtype in (TensorProto.COMPLEX64, TensorProto.COMPLEX128):
-        data = combine_pairs_to_complex(data)  # type: ignore[assignment,arg-type]
+    if tensor_dtype in (onnx.TensorProto.COMPLEX64, onnx.TensorProto.COMPLEX128):
+        return np.asarray(combine_pairs_to_complex(data)).astype(np_dtype).reshape(dims)


onnx/numpy_helper.py

-    if tensor_dtype in (TensorProto.COMPLEX64, TensorProto.COMPLEX128):
-        data = combine_pairs_to_complex(data)  # type: ignore[assignment,arg-type]
+    if tensor_dtype in (onnx.TensorProto.COMPLEX64, onnx.TensorProto.COMPLEX128):
+        return np.asarray(combine_pairs_to_complex(data)).astype(np_dtype).reshape(dims)


onnx/subbyte.py

-    clip_high = INT4_MAX if signed else UINT4_MAX
-    if not isinstance(x, np.ndarray):
-        x = np.asarray(x)
+    return np.rint(np.clip(x, INT4_MIN, INT4_MAX)).astype(np.int8)


onnx/subbyte.py

+    Returns:
+        An ndarray with a single int4 element.
+    """
+    return np.rint(np.clip(x, UINT4_MIN, UINT4_MAX)).astype(np.uint8)


onnx/subbyte.py

+    else:
+        i8_low = cast_uint4(val_low)
+        i8_high = cast_uint4(val_high)
+    i8_high <<= 4


onnx/subbyte.py

+    x_low = x & np.uint8(0x0F)
+    x_high = (x >> 4).astype(np.uint8)
+    if signed:
+        x_low = _int4_to_int8(x_low)


onnx/subbyte.py

+    x_high = (x >> 4).astype(np.uint8)
+    if signed:
+        x_low = _int4_to_int8(x_low)
+        x_high = _int4_to_int8(x_high)


onnx/numpy_helper.py

+    # if mantissa > 0:
+    #     exponent = 127 - exponent_bias
+    #     if mantissa & 0b100 == 0:
+    #         mantissa &= 0b011
+    #         mantissa <<= 1
+    #         exponent -= 1
+    #     if mantissa & 0b100 == 0:
+    #         mantissa &= 0b011
+    #         mantissa <<= 1
+    #         exponent -= 1
+    #     result |= (mantissa & 0b011) << 21
+    #     result |= exponent << 23


onnx/numpy_helper.py

justinchuby · 2024-05-04T05:26:47Z

onnx/numpy_helper.py

-    return f
-
-
-_float8e4m3_to_float32 = np.vectorize(


Removed use of vectorize because it is a for loop and is not performant

onnx/numpy_helper.py

+    result[normal_mask] |= exponents[normal_mask] << 23
+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]


onnx/numpy_helper.py

+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]
+    return result


onnx/numpy_helper.py

+    result[normal_mask] |= exponents[normal_mask] << 23
+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]


onnx/numpy_helper.py

+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]
+    return result


onnx/numpy_helper.py

+    # if exponent == 0:
+    #     # Subnormal number
+    #     if mantissa > 0:
+    #         exponent = 127 - exponent_bias
+    #         if mantissa & 0b10 == 0:
+    #             mantissa &= 0b01
+    #             mantissa <<= 1
+    #             exponent -= 1
+    #         result |= (mantissa & 0b01) << 22
+    #         result |= exponent << 23


Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Signed-off-by: Justin Chu <justinchu@microsoft.com>

onnx/reference/op_run.py

-            y[i] = d
-        return y.reshape(shape)
+            data = np.array(tensor.int32_data, dtype=np.uint8)
+        data = data.view(dtype_mapping[elem_type])


onnx/reference/op_run.py

-            y[i] = d
-        return y.reshape(shape)
+            data = np.array(tensor.int32_data, dtype=np.uint8)
+        data = data.view(dtype_mapping[elem_type])


onnx/reference/op_run.py

-        for i, d in enumerate(data):
-            y[i] = d
+        dtype_mapping = {TensorProto.INT4: int4, TensorProto.UINT4: uint4}
+        dtype = dtype_mapping[elem_type]


onnx/reference/op_run.py

-            y[i] = d
+        dtype_mapping = {TensorProto.INT4: int4, TensorProto.UINT4: uint4}
+        dtype = dtype_mapping[elem_type]
+        return subbyte.unpack_int4(data, dims=tensor.dims, signed=signed).view(dtype)


Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby · 2024-05-04T19:01:23Z

onnx/numpy_helper.py

+    return result
+
+
+def _small_endian_dtype(dtype) -> np.dtype:


Suggested change

def _small_endian_dtype(dtype) -> np.dtype:

def _little_endian_dtype(dtype) -> np.dtype:

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby · 2024-05-05T03:34:09Z

For float8 usage, we may be better of using https://github.com/jax-ml/ml_dtypes?

xadupre · 2024-05-06T07:43:31Z

onnx/numpy_helper.py

-    return shift(data.astype(np.int32)).reshape(dims).view(np.float32)  # type: ignore[no-any-return]
-
-
-def _float8e4m3_to_float32_scalar(ival: int, fn: bool, uz: bool) -> np.float32:


I would keep the code of the old function in the documentation. The logic is easier to read so that the new code can be more easily understood.

gramalingam · 2024-07-03T20:54:52Z

Is this active? It is still marked "draft". Maybe we should get this into 1.17 release?

justinchuby · 2024-07-03T21:58:50Z

Is this active? It is still marked "draft". Maybe we should get this into 1.17 release?

I am personally fine with missing the release. The IR has more efficient handling of numpy arrays and does not rely on the helper right now so we are not blocked.

justinchuby requested a review from a team as a code owner May 3, 2024 02:37

justinchuby changed the title ~~Fix numpy_helper to_array when tensor is complex~~ Fix numpy_helper to_array errors May 3, 2024

justinchuby requested review from xadupre and gramalingam May 3, 2024 03:04

justinchuby added this to the 1.17 milestone May 3, 2024

justinchuby marked this pull request as draft May 3, 2024 14:53

justinchuby changed the title ~~Fix numpy_helper to_array errors~~ Clean up numpy_helper and subbyte May 4, 2024

justinchuby marked this pull request as ready for review May 4, 2024 02:03

justinchuby force-pushed the justinchu/complex-numpy branch 2 times, most recently from 9ba8ff1 to b688548 Compare May 4, 2024 02:07

justinchuby added the better engineering Improve engineering quality of the project label May 4, 2024

justinchuby requested a review from AlexandreEichenberger May 4, 2024 02:11

justinchuby commented May 4, 2024

View reviewed changes

third_party/benchmark Outdated Show resolved Hide resolved

justinchuby changed the title ~~Clean up numpy_helper and subbyte~~ Clean up and fix numpy_helper and subbyte May 4, 2024

justinchuby commented May 4, 2024

View reviewed changes

onnx/numpy_helper.py Outdated Show resolved Hide resolved

justinchuby commented May 4, 2024

View reviewed changes

onnx/numpy_helper.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

justinchuby marked this pull request as draft May 4, 2024 04:29

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

onnx/numpy_helper.py Fixed Show fixed Hide fixed

onnx/numpy_helper.py Fixed Show fixed Hide fixed

justinchuby marked this pull request as ready for review May 4, 2024 05:25

justinchuby commented May 4, 2024

View reviewed changes

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

justinchuby and others added 3 commits May 4, 2024 15:37

Fix numpy_helper to_array when tensor is complex

4a7013b

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update numpy_helper.py

4da0d00

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

float8

b397529

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby and others added 12 commits May 4, 2024 15:37

More

756d915

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Remove bench mark

8e5baaa

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update onnx/numpy_helper.py

c511da9

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update onnx/numpy_helper.py

b50d1b7

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

WIP float8e4m3_to_float32

73e2a79

Signed-off-by: Justin Chu <justinchu@microsoft.com>

float8e4m3_to_float32

126da8b

Signed-off-by: Justin Chu <justinchu@microsoft.com>

array

a9d7e76

Signed-off-by: Justin Chu <justinchu@microsoft.com>

dtype

ecb131e

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Fix

a01b2f1

Signed-off-by: Justin Chu <justinchu@microsoft.com>

test_float8_e5m2fnuz_out_of_range

475f513

Signed-off-by: Justin Chu <justinchu@microsoft.com>

float8e5m2_to_float32

93c3448

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update reference

9fd4909

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby force-pushed the justinchu/complex-numpy branch from 15aec39 to 9fd4909 Compare May 4, 2024 15:37

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

justinchuby requested a review from a team as a code owner May 4, 2024 16:01

Ref tests

4c18a15

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby force-pushed the justinchu/complex-numpy branch from cd51e75 to 4c18a15 Compare May 4, 2024 16:04

justinchuby added 2 commits May 4, 2024 16:07

Fix sign mask

c4545cf

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Use put mask to handle multi-d masks

01d1f5d

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby marked this pull request as draft May 4, 2024 16:22

justinchuby marked this pull request as ready for review May 4, 2024 16:26

justinchuby mentioned this pull request May 4, 2024

[IR] Allow tensor created with numpy unsupported dtypes microsoft/onnxscript#1441

Merged

justinchuby commented May 4, 2024

View reviewed changes

docs

64be57c

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

xadupre reviewed May 6, 2024

View reviewed changes

justinchuby marked this pull request as draft June 21, 2024 12:50

justinchuby removed this from the 1.17 milestone Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up and fix numpy_helper and subbyte #6124

Clean up and fix numpy_helper and subbyte #6124

justinchuby commented May 3, 2024 •

edited

Loading

github-actions bot commented May 3, 2024 •

edited

Loading

xadupre commented May 3, 2024

justinchuby May 4, 2024

justinchuby May 4, 2024

justinchuby May 4, 2024

justinchuby commented May 5, 2024

xadupre May 6, 2024

gramalingam commented Jul 3, 2024

justinchuby commented Jul 3, 2024

	def _small_endian_dtype(dtype) -> np.dtype:
	def _little_endian_dtype(dtype) -> np.dtype:

		return shift(data.astype(np.int32)).reshape(dims).view(np.float32) # type: ignore[no-any-return]


		def _float8e4m3_to_float32_scalar(ival: int, fn: bool, uz: bool) -> np.float32:

Clean up and fix numpy_helper and subbyte #6124

Are you sure you want to change the base?

Clean up and fix numpy_helper and subbyte #6124

Conversation

justinchuby commented May 3, 2024 • edited Loading

Float 8 util speed test

TODO: Unit tests

github-actions bot commented May 3, 2024 • edited Loading

Test Results

xadupre commented May 3, 2024

justinchuby May 4, 2024

Choose a reason for hiding this comment

justinchuby May 4, 2024

Choose a reason for hiding this comment

justinchuby May 4, 2024

Choose a reason for hiding this comment

justinchuby commented May 5, 2024

xadupre May 6, 2024

Choose a reason for hiding this comment

gramalingam commented Jul 3, 2024

justinchuby commented Jul 3, 2024

justinchuby commented May 3, 2024 •

edited

Loading

github-actions bot commented May 3, 2024 •

edited

Loading