s390x test failures - test_make_tensor_raw #6181

tehbone · 2024-06-14T15:06:41Z

Bug Report

Is the issue related to model conversion?

No

Describe the bug

Running the test suite on s390x produces failures involving test_make_tensor_raw as well as test_make_xxx_tensor_raw:

FAILED helper_test.py::TestHelperTensorFunctions::test_make_bfloat16_tensor_raw - AssertionError: 
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e4m3fn_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e4m3fnuz_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e5m2_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e5m2fnuz_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::test_make_tensor_raw[TensorProto.FLOAT] - AssertionError: 
FAILED helper_test.py::test_make_tensor_raw[TensorProto.FLOAT16] - AssertionError: 
FAILED helper_test.py::test_make_tensor_raw[TensorProto.DOUBLE] - AssertionError: 
FAILED helper_test.py::test_make_tensor_raw[TensorProto.COMPLEX64] - AssertionError: 
FAILED helper_test.py::test_make_tensor_raw[TensorProto.COMPLEX128] - AssertionError: 
FAILED helper_test.py::test_make_tensor_raw[TensorProto.UINT32] - AssertionError: 
FAILED helper_test.py::test_make_tensor_raw[TensorProto.UINT64] - AssertionError:

System information

OS Platform and Distribution (e.g. Linux Ubuntu 20.04): RHEL 9.3
ONNX version (e.g. 1.13): Latest
Python version: 3.11

Reproduction instructions

Run the test suite on s390x

Expected behavior

No failures

Notes

I'm trying to identify whether or not this is simply a testcase error and the actual code is working as designed, but that depends on intended functionality of onnx.helper:make_tensor. There is another function, onnx.numpy_helper:from_array which does a full conversion of a numpy array to raw bytes, which includes the necessary conversion from float32->blfloat16 and any other odd calculations. Furthermore, the numpy helpers also do endianness conversion to and from the intended little endian format which the TensorProto requires. onnx.helper:make_tensor makes no such conversions other than a flattening of the array. Should make_tensor be responsible for the conversion of the data with the raw input, or is it intended that raw input should be treated as is?

Either way, I can make the corresponding updates once the intended behavior is determined.

For some additional context, I've taken a look tf2onnx, pytorch's ONNX exporter, and onnx2torch, and with the exception of 2 testcases, none of those projects would be affected by assuming that raw=True on onnx.helper:make_tensor also implies pre-byteswapped data.

The text was updated successfully, but these errors were encountered:

tehbone · 2024-06-14T19:43:22Z

Addendum - there are actually a mess of bugs here, so look for a PR regardless:

I'm going to just assume that make_tensor(raw=True) will take in raw data that has already been converted. That is more or less consistent with its use and just needs updates to the tests to affirm that position.
These errors:

FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e4m3fn_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e4m3fnuz_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e5m2_tensor_raw - ValueError: buffer size must be a multiple of element size
FAILED helper_test.py::TestHelperTensorFunctions::test_make_float8e5m2fnuz_tensor_raw - ValueError: buffer size must be a multiple of element size

are a result of the incorrect data type being used in numpy_helper. The conversions are using the np_dtype, but they should be using the corresponding storage np_dtype.
3. The 4bit datatypes have incorrect storage dtypes. The comment indicates they should be bytes yet are coded as integers.

tehbone added the bug label Jun 14, 2024

tehbone mentioned this issue Jun 15, 2024

Fixes s390x byteswapping issues #6183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s390x test failures - test_make_tensor_raw #6181

s390x test failures - test_make_tensor_raw #6181

tehbone commented Jun 14, 2024

tehbone commented Jun 14, 2024

s390x test failures - test_make_tensor_raw #6181

s390x test failures - test_make_tensor_raw #6181

Comments

tehbone commented Jun 14, 2024

Bug Report

Is the issue related to model conversion?

Describe the bug

System information

Reproduction instructions

Expected behavior

Notes

tehbone commented Jun 14, 2024