Skip to content

Do not create numpy on top of Tensor non-owning buffer#28088

Merged
yuslepukhin merged 4 commits into
mainfrom
yuslepukhin/fix_numpy_zero_copy
Apr 16, 2026
Merged

Do not create numpy on top of Tensor non-owning buffer#28088
yuslepukhin merged 4 commits into
mainfrom
yuslepukhin/fix_numpy_zero_copy

Conversation

@yuslepukhin
Copy link
Copy Markdown
Member

This pull request improves the safety and correctness of tensor-to-numpy conversions in the ONNX Runtime Python bindings, specifically addressing the issue of dangling pointers when model outputs alias input buffers. It introduces logic to ensure that numpy arrays returned from session outputs do not share memory with input arrays unless it is safe to do so, and adds targeted tests to prevent regressions.

Tensor-to-Numpy Conversion Safety Improvements:

  • Updated the GetPyObjFromTensor function signature in both the header (onnxruntime_pybind_mlvalue.h) and implementation (onnxruntime_pybind_state.cc) to accept a new zero_copy_non_owning boolean parameter, allowing explicit control over zero-copy behavior. [1] [2]
  • Enhanced the logic in GetPyObjFromTensor so that zero-copy numpy arrays are only created if the tensor owns its buffer or if zero_copy_non_owning is explicitly set. Otherwise, the data is copied to prevent use-after-free errors when the original input memory might be released.

Device Handling Updates:

  • Modified device-specific code paths in onnxruntime_pybind_ortvalue.cc to always request zero-copy for outputs from non-CPU devices, ensuring consistent and safe behavior across all supported hardware backends.

Testing and Regression Coverage:

  • Added comprehensive tests in onnxruntime_test_python.py to verify that outputs which alias inputs are returned as independent numpy arrays, preventing data corruption from dangling pointers. Also added a test to confirm that session-allocated outputs still use efficient zero-copy numpy arrays.

Addresses issue: #21922

Comment thread onnxruntime/test/python/onnxruntime_test_python.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens ONNX Runtime’s Python tensor-to-numpy conversion to avoid returning numpy arrays that reference non-owning (externally backed) CPU tensor buffers, which can lead to dangling pointers when a model output aliases an input buffer (issue #21922).

Changes:

  • Added a zero_copy_non_owning parameter to GetPyObjFromTensor and updated CPU handling to only return zero-copy numpy arrays when the tensor owns its buffer (otherwise copy).
  • Updated OrtValue.numpy() device-specific paths to explicitly allow zero-copy behavior via the new parameter.
  • Added Python regression tests covering input-as-output aliasing and confirming session-owned outputs remain zero-copy.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/test/python/onnxruntime_test_python.py Adds regression test for input/output aliasing corruption and a test that session-owned outputs remain zero-copy.
onnxruntime/python/onnxruntime_pybind_state.cc Adds ownership-aware CPU tensor-to-numpy conversion to avoid dangling pointers for non-owning buffers.
onnxruntime/python/onnxruntime_pybind_ortvalue.cc Updates OrtValue.numpy() to pass the new zero_copy_non_owning flag through device-specific paths.
onnxruntime/python/onnxruntime_pybind_mlvalue.h Extends GetPyObjFromTensor declaration with zero_copy_non_owning (defaulted).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/onnxruntime_pybind_ortvalue.cc
Comment thread onnxruntime/test/python/onnxruntime_test_python.py
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the focused fix. I avoided repeating the existing open threads on the direct non-alias test assertion and the OrtValue.numpy() zero-copy rationale. I found one small maintainability item around documenting the new lifetime-sensitive opt-out flag.

Comment thread onnxruntime/python/onnxruntime_pybind_mlvalue.h
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/python/onnxruntime_test_python.py Outdated
@yuslepukhin yuslepukhin linked an issue Apr 16, 2026 that may be closed by this pull request
@yuslepukhin yuslepukhin enabled auto-merge (squash) April 16, 2026 20:08
@yuslepukhin yuslepukhin merged commit de0b292 into main Apr 16, 2026
97 of 99 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/fix_numpy_zero_copy branch April 16, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Corrupted value for model outputs that are also model inputs

5 participants