S390x test fixes by AlekseiNikiforovIBM · Pull Request #27404 · microsoft/onnxruntime

AlekseiNikiforovIBM · 2026-02-20T11:50:07Z

Description

This PR contains fixes to various big endian support issues in onnxruntime, both in libraries and tests.

Motivation and Context

Currently some tests from onnxruntime testsuite fail.
This change fixes all tests from onnxruntime testsuite when it's built without training support.
It also includes a linking issue fix.

Following tests are fixed on s390x:
OrtModelOnlyTests.ValidateOrtFormatModelDoesNotRunOptimizersInFullBuild
FlatbufferUtilsTest.ExternalWriteReadWithLoadInitializers
SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices64
SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices32
SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices16
SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices8
SparseTensorConversionTests.SparseTensorProtoToDense_Rank2Indices_COO
SparseTensorConversionTests.TestConstantNodeConversion
OrtModelOnlyTests.SparseInitializerHandling
SparseTensorConversionTests.TestConstantNodeConversion
SparseTensorConversionTests.TestDenseToSparseConversion
ExecutionFrameTestInit.SparseInitializerAsOutput
CApiTest.SparseOutputModel

AlekseiNikiforovIBM · 2026-02-24T14:06:57Z

I've added fixes for tests enabled with training. Although tests are for training, a lot of fixes are actually in common code.

AlekseiNikiforovIBM · 2026-03-03T12:27:21Z

@tianleiwu @amarin16 @baijumeswani, could you please take a look?

tianleiwu · 2026-03-03T17:49:39Z

1. `onnxruntime/core/graph/graph_flatbuffers_utils.cc` (Changes Requested)

In SaveInitializerOrtFormat, the handling of HasExternalData(be_copy) for big-endian machines introduces a bug where external data will be saved into the ORT Flatbuffer as Big-Endian bytes instead of Little-Endian.

Concrete byte-level trace of the bug:

For the inline data path (correct), using int32 value 1 as an example:

be_copy.raw_data() starts as LE bytes from the ONNX proto: [01, 00, 00, 00]
ConvertRawDataInTensorProto(be_copy) swaps in-place → [00, 00, 00, 01] (now BE)
UnpackInitializerData → has raw_data → UnpackTensorWithRawData → ReadLittleEndian:
interprets [00, 00, 00, 01] as LE, swaps to native BE → output = [01, 00, 00, 00]
Result: unpacked_tensor = [01, 00, 00, 00] = LE bytes ✅

For the external data path (buggy):

ConvertRawDataInTensorProto(be_copy) — no-op (external data, no inline data to swap)
TensorProtoWithExternalDataToTensorProto(be_copy_external_data, {}, be_copy) — loads LE bytes from disk into a brand new TensorProto result (see tensorprotoutils.cc:296), which only copies name/data_type/dims but NOT external_data or data_location. So be_copy becomes an inline-data TensorProto with raw_data = [01, 00, 00, 00] (LE from disk).
UnpackInitializerData(be_copy) — HasExternalData now returns false, so it takes the non-external path → CASE_UNPACK → UnpackTensorWithRawData → ReadLittleEndian:
interprets [01, 00, 00, 00] as LE, swaps to native BE → output = [00, 00, 00, 01]
Result: unpacked_tensor = [00, 00, 00, 01] = BE bytes ❌ — flatbuffer stores BE data.

Root cause: The inline data path relies on a double-swap (step 2 + step 3) that cancels out, producing LE. The external data path only has a single swap (step 3), producing BE.

Proposed fix — add ConvertRawDataInTensorProto after loading external data to match the inline path's double-swap:

      if (onnxruntime::utils::HasExternalData(be_copy)) {
        auto be_copy_external_data{be_copy};
        ORT_RETURN_IF_ERROR(onnxruntime::utils::TensorProtoWithExternalDataToTensorProto(be_copy_external_data, {}, be_copy));
        // Swap the newly loaded LE raw_data to BE, matching what ConvertRawDataInTensorProto
        // would have done for inline data. UnpackInitializerData's ReadLittleEndian will then
        // swap it back to LE, producing the correct result.
        onnxruntime::utils::ConvertRawDataInTensorProto(be_copy);
      }

Copilot

Pull request overview

This PR addresses big-endian (s390x) correctness issues in ORT by standardizing raw tensor data handling (writing/reading in little-endian form where required) and adjusting affected tests/build wiring so the test suite passes without training enabled.

Changes:

Replace direct TensorProto::set_raw_data(...) usage with onnxruntime::utils::SetRawDataInTensorProto(...) across multiple components/tests to centralize endianness handling.
Improve test robustness on big-endian by unpacking tensor proto data via ORT utilities (e.g., UnpackTensor, ConvertRawDataInTensorProto) instead of memcpy/reinterpretation.
Update ORT-format/flatbuffer initializer handling and unit-test build configuration to support big-endian scenarios and non-training builds.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
winml/adapter/winml_adapter_model.cpp	Use ORT helper to set TensorProto raw data with endian-awareness.
orttraining/orttraining/training_api/checkpoint.cc	Remove little-endian-only guard for training checkpoints.
orttraining/orttraining/test/graph/optimizer_graph_builder_test.cc	Read raw_data via `UnpackTensor` to be endian-correct.
orttraining/orttraining/core/optimizer/shape_optimizer.cc	Use endian-aware raw data setter for constant initializers.
orttraining/orttraining/core/optimizer/megatron_transformer.cc	Use endian-aware raw data setter for partitioned initializers.
orttraining/orttraining/core/optimizer/conv1d_replacement.cc	Use endian-aware raw data setter for initializer creation.
orttraining/orttraining/core/framework/checkpointing.cc	Remove big-endian “not implemented” restriction in checkpoint saving path.
onnxruntime/test/providers/nv_tensorrt_rtx/test_nv_trt_rtx_ep_util.cc	Use endian-aware raw data setter in test model building utilities.
onnxruntime/test/framework/sparse_kernels_test.cc	Convert/check raw_data in a big-endian-safe way (copy-by-value + conversion).
onnxruntime/test/framework/int2_test.cc	Use endian-aware raw data setter in Int2 round-trip test.
onnxruntime/test/framework/endian_test.cc	Use endian-aware raw data setter and add stronger assertions about conversion effects.
onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc	Remove manual conversion now handled elsewhere.
onnxruntime/core/optimizer/qdq_transformer/where_dummy_dq.cc	Use endian-aware raw data setter for dummy initializer scalars.
onnxruntime/core/graph/graph_flatbuffers_utils.cc	Add big-endian handling for saving/loading ORT-format initializers and tensor dims.
onnxruntime/core/graph/graph.cc	Remove prior sparse-constant endian workaround; use endian-aware raw data setter for editor API.
onnxruntime/core/framework/tensorprotoutils.cc	Enhance raw-data setter and conversion logic; ensure sparse raw bytes are little-endian.
onnxruntime/core/framework/data_transfer_utils.h	Add byte-swapping to `CopyTensorDataToByteSpan` on big-endian.
cmake/onnxruntime_unittests.cmake	Ensure endian_utils is linked into tests when training is disabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/framework/data_transfer_utils.h

onnxruntime/core/graph/graph_flatbuffers_utils.cc

AlekseiNikiforovIBM · 2026-03-04T12:10:14Z

Thanks for review, I'll rework that change. It is likely that byteswapping is excessive in that place but missing in some other place.

Build command: ./build.sh --config Debug --parallel 0 --enable_pybind --build_wheel --allow_running_as_root

Later this data is narrowed: *p_data++ = static_cast<T>(*data_iter); If for example BE int32_t data is 4 bytes: 0x00 0x00 0x00 0x01 After byteswapping it'll become: 0x01 0x00 0x00 0x00 And after narrowing to int16_t two rightmost bytes are used on big endian and result is 0x00 0x00 If instead we byteswap it as two shorts, byteswapping result is: 0x00 0x00 0x01 0x00 And narrowing result is 0x01 0x00, which is correct LE representation of that number. This change fixes following test on s390x: FlatbufferUtilsTest.ExternalWriteReadWithLoadInitializers

Raw data is expected to be in LE. This change fixes tests: SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices64 SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices32 SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices16 SparseTensorConversionTests.SparseTensorProtoToDense_Rank1Indices8 SparseTensorConversionTests.SparseTensorProtoToDense_Rank2Indices_COO

This change fixes tests: OrtModelOnlyTests.SparseInitializerHandling SparseTensorConversionTests.TestConstantNodeConversion

…390x

This change fixes following tests on s390x: ExecutionFrameTestInit.SparseInitializerAsOutput CApiTest.SparseOutputModel

This change will allow to assess and fix big-endian-specific issues in training-related code.

This change fixes approximately 40 tests.

This change fixes test CheckpointingTest.SaveAndLoad on s390x.

…ronTransformer class This change fixes following tests on s390x: GraphTransformationTests.MegatronMLPPartitionRank0 GraphTransformationTests.MegatronMLPPartitionRank1 GraphTransformationTests.MegatronSelfAttentionPartitionRank0 GraphTransformationTests.MegatronSelfAttentionPartitionRank1

…roto This should fix a lot of potential endianness issues on s390x

This change fixes following tests on s390x: OptimizerGraphBuilderTest.LoadOptimState_FullPrecision_Adam OptimizerGraphBuilderTest.LoadOptimState_FullPrecision_Lamb

Memory data is in native endian format, while on-disk data should be in little endian format already. Move out a part of ConvertRawDataInTensorProto function into a separate one for convenience. This change fixes test OrtModelOnlyTests.ValidateOrtFormatModelDoesNotRunOptimizersInFullBuild on s390x.

This change fixes following tests on s390x: SaveWithExternalInitializers.Mnist SaveWithExternalInitializers.ModelWithOriginalExternalData SaveWithExternalInitializers.ModelWithOriginalExternalDataAlignOffset

…unction ReadExternalDataForTensor already returns data in native endian format. Also remove ConvertEndianessForVector function. It does const_cast and unexpectedly modifies original data. When needed, use WriteLittleEndian instead. These changes fix tests on s390x: TensorProtoUtilsTest.UnpackTensorWithExternalData TensorProtoUtilsTest.ConstantTensorProtoWithExternalData

…onstantNodeConversion test

AlekseiNikiforovIBM · 2026-03-06T14:28:56Z

Your finding is entirely correct if external data is in file. However, in-memory external data is actually in native endian format, i.e. big endian on big endian systems:

onnxruntime/onnxruntime/core/framework/tensorprotoutils.cc

Lines 1803 to 1815 in 65fb61b

    
           if (use_tensor_buffer && tensor.SizeInBytes() > kSmallTensorExternalDataThreshold) { 
        
             // https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/graph_flatbuffers_utils.cc#L302 
        
             const auto* raw_data = tensor.DataRaw(); 
        
             ORT_ENFORCE(raw_data, "Missing raw data for tensor proto. Invalid tensor."); 
        
             static_assert(sizeof(void*) <= sizeof(ExternalDataInfo::OFFSET_TYPE)); 
        
             // we reinterpret_cast this back to void* in tensorprotoutils.cc:GetExtDataFromTensorProto. 
        
             // use intptr_t as OFFSET_TYPE is signed. in theory you could get a weird looking value if the address uses the 
        
             // high bit, but that should be unlikely in a scenario where we care about memory usage enough to use this path. 
        
             auto offset = narrow<ExternalDataInfo::OFFSET_TYPE>(reinterpret_cast<intptr_t>(raw_data)); 
        
             ExternalDataInfo::SetExternalLocationToProto(onnxruntime::utils::kTensorProtoMemoryAddressTag, 
        
                                                          offset, tensor.SizeInBytes(), tensor_proto);

It seems like a bad idea to byteswap this memory data in advance, so I've added byteswapping of data from file after reading it. It also revealed a couple additional byteswapping issues which I also investigated and fixed.

…orProto

…ng 0

tianleiwu · 2026-03-26T03:05:55Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-26T03:06:17Z

Azure Pipelines successfully started running 4 pipeline(s).

AlekseiNikiforovIBM · 2026-03-26T15:32:31Z

I've added change to include endian headers in a couple of files due to failing pipeline indicating this issue in one of build configurations.

Edit: failing pipeline: https://github.com/microsoft/onnxruntime/actions/runs/23539560419

tianleiwu · 2026-03-29T23:28:31Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-29T23:28:53Z

Azure Pipelines successfully started running 4 pipeline(s).

tianleiwu · 2026-03-30T00:27:55Z

@AlekseiNikiforovIBM, there is still build errors in DirectML builds (please also check other failed builds):

D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(128,31): error C2653: 'endian': is not a class or namespace name [D:\a\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_dml.vcxproj]
Error: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(128,39): error C2065: 'native': undeclared identifier [D:\a\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_dml.vcxproj]
Error: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(128,49): error C2653: 'endian': is not a class or namespace name [D:\a\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_dml.vcxproj]
Error: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(128,57): error C2065: 'little': undeclared identifier [D:\a\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_dml.vcxproj]
Error: D:\a\_work\onnxruntime\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(134,63): error C2664: 'size_t onnxruntime::utils::GetElementSizeOfTensor(onnx::TensorProto_DataType)': cannot convert argument 1 from 'int32_t' to 'onnx::TensorProto_DataType' [D:\a\_work\_temp\build\RelWithDebInfo\onnxruntime_providers_dml.vcxproj]

tianleiwu · 2026-03-30T08:15:01Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-30T08:15:21Z

Azure Pipelines successfully started running 4 pipeline(s).

tianleiwu · 2026-03-31T00:04:01Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-31T00:04:21Z

Azure Pipelines successfully started running 4 pipeline(s).

AlekseiNikiforovIBM force-pushed the s390x_test_fixes branch from c7d2c7f to 3aa7f73 Compare February 24, 2026 15:03

tianleiwu requested a review from Copilot March 3, 2026 17:43

Copilot started reviewing on behalf of tianleiwu March 3, 2026 17:44 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

onnxruntime/core/framework/data_transfer_utils.h Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph_flatbuffers_utils.cc Outdated Show resolved Hide resolved

AlekseiNikiforovIBM force-pushed the s390x_test_fixes branch from 3aa7f73 to 067dd42 Compare March 6, 2026 14:21

AlekseiNikiforovIBM added 18 commits March 6, 2026 15:22

Fix build on s390x when training is disabled

df5ccf5

Build command: ./build.sh --config Debug --parallel 0 --enable_pybind --build_wheel --allow_running_as_root

Fix SparseTensorConversionTests.TestConstantNodeConversion test on s390x

29374ef

Fix byte order in SparsifyGeneric function

ce9d15b

This change fixes tests: OrtModelOnlyTests.SparseInitializerHandling SparseTensorConversionTests.TestConstantNodeConversion

Fix test SparseTensorConversionTests.TestDenseToSparseConversion on s…

152eda2

…390x

Remove excessive byteswapping in Graph::Graph

6c1c9d1

This change fixes following tests on s390x: ExecutionFrameTestInit.SparseInitializerAsOutput CApiTest.SparseOutputModel

Remove big-endian-specific exceptions

ce495f8

This change will allow to assess and fix big-endian-specific issues in training-related code.

Byteswap dimensions obtained from flatbuffer structures

eda4d90

This change fixes approximately 40 tests.

Add byteswapping in CopyTensorDataToByteSpan function

6e5ce60

This change fixes test CheckpointingTest.SaveAndLoad on s390x.

Replace most TensorProto::set_raw_data calls with SetRawDataInTensorP…

c307fbf

…roto This should fix a lot of potential endianness issues on s390x

Fix unpacking raw data in tests

e3aa9c3

This change fixes following tests on s390x: OptimizerGraphBuilderTest.LoadOptimState_FullPrecision_Adam OptimizerGraphBuilderTest.LoadOptimState_FullPrecision_Lamb

Apply lint fixes

11d437b

Byteswap data when saving it to external file

3ace82c

This change fixes following tests on s390x: SaveWithExternalInitializers.Mnist SaveWithExternalInitializers.ModelWithOriginalExternalData SaveWithExternalInitializers.ModelWithOriginalExternalDataAlignOffset

Write test file in little endian in SparseTensorConversionTests.TestC…

b8042be

…onstantNodeConversion test

AlekseiNikiforovIBM force-pushed the s390x_test_fixes branch from 067dd42 to b8042be Compare March 6, 2026 14:25

Move byteswapping to saving tensor to file

c66921b

AlekseiNikiforovIBM force-pushed the s390x_test_fixes branch from e2becb5 to 7e1162b Compare March 24, 2026 16:38

AlekseiNikiforovIBM added 3 commits March 25, 2026 12:38

Replace WriteLittleEndian with ReadLittleEndian in GetExtDataFromTens…

a5fd873

…orProto

Move common unpacking code in SaveOrtTensorOrtFormat into a lambda

41d0d19

Add guards against onnxruntime::utils::GetElementSizeOfTensor returni…

38983f1

…ng 0

AlekseiNikiforovIBM force-pushed the s390x_test_fixes branch from 7e1162b to 38983f1 Compare March 25, 2026 11:51

tianleiwu previously approved these changes Mar 26, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) March 26, 2026 03:06

Add missing headers

6e6e743

auto-merge was automatically disabled March 26, 2026 15:30
Head branch was pushed to by a user without write access

AlekseiNikiforovIBM dismissed tianleiwu’s stale review via 6e6e743 March 26, 2026 15:30

Add missing casts when creating gls::span

692ebf7

Add missing header file and type cast

28bbd3d

tianleiwu previously approved these changes Mar 30, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) March 30, 2026 08:15

Add missing namespace when using endian::native and endian::little

b84ba98

auto-merge was automatically disabled March 30, 2026 14:33
Head branch was pushed to by a user without write access

AlekseiNikiforovIBM dismissed tianleiwu’s stale review via b84ba98 March 30, 2026 14:33

tianleiwu enabled auto-merge (squash) March 31, 2026 23:30

tianleiwu approved these changes Mar 31, 2026

View reviewed changes

tianleiwu merged commit f2c28e2 into microsoft:main Mar 31, 2026
105 of 181 checks passed

Conversation

AlekseiNikiforovIBM commented Feb 20, 2026

Description

Motivation and Context

Uh oh!

AlekseiNikiforovIBM commented Feb 24, 2026

Uh oh!

AlekseiNikiforovIBM commented Mar 3, 2026

Uh oh!

tianleiwu commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. onnxruntime/core/graph/graph_flatbuffers_utils.cc (Changes Requested)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

AlekseiNikiforovIBM commented Mar 4, 2026

Uh oh!

AlekseiNikiforovIBM commented Mar 6, 2026

Uh oh!

tianleiwu commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

AlekseiNikiforovIBM commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleiwu commented Mar 29, 2026

Uh oh!

azure-pipelines bot commented Mar 29, 2026

Uh oh!

tianleiwu commented Mar 30, 2026

Uh oh!

tianleiwu commented Mar 30, 2026

Uh oh!

azure-pipelines bot commented Mar 30, 2026

Uh oh!

tianleiwu commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianleiwu commented Mar 3, 2026 •

edited

Loading

1. `onnxruntime/core/graph/graph_flatbuffers_utils.cc` (Changes Requested)

AlekseiNikiforovIBM commented Mar 26, 2026 •

edited

Loading