Do not create numpy on top of Tensor non-owning buffer#28088
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens ONNX Runtime’s Python tensor-to-numpy conversion to avoid returning numpy arrays that reference non-owning (externally backed) CPU tensor buffers, which can lead to dangling pointers when a model output aliases an input buffer (issue #21922).
Changes:
- Added a
zero_copy_non_owningparameter toGetPyObjFromTensorand updated CPU handling to only return zero-copy numpy arrays when the tensor owns its buffer (otherwise copy). - Updated
OrtValue.numpy()device-specific paths to explicitly allow zero-copy behavior via the new parameter. - Added Python regression tests covering input-as-output aliasing and confirming session-owned outputs remain zero-copy.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/test/python/onnxruntime_test_python.py | Adds regression test for input/output aliasing corruption and a test that session-owned outputs remain zero-copy. |
| onnxruntime/python/onnxruntime_pybind_state.cc | Adds ownership-aware CPU tensor-to-numpy conversion to avoid dangling pointers for non-owning buffers. |
| onnxruntime/python/onnxruntime_pybind_ortvalue.cc | Updates OrtValue.numpy() to pass the new zero_copy_non_owning flag through device-specific paths. |
| onnxruntime/python/onnxruntime_pybind_mlvalue.h | Extends GetPyObjFromTensor declaration with zero_copy_non_owning (defaulted). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for the focused fix. I avoided repeating the existing open threads on the direct non-alias test assertion and the OrtValue.numpy() zero-copy rationale. I found one small maintainability item around documenting the new lifetime-sensitive opt-out flag.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This pull request improves the safety and correctness of tensor-to-numpy conversions in the ONNX Runtime Python bindings, specifically addressing the issue of dangling pointers when model outputs alias input buffers. It introduces logic to ensure that numpy arrays returned from session outputs do not share memory with input arrays unless it is safe to do so, and adds targeted tests to prevent regressions.
Tensor-to-Numpy Conversion Safety Improvements:
GetPyObjFromTensorfunction signature in both the header (onnxruntime_pybind_mlvalue.h) and implementation (onnxruntime_pybind_state.cc) to accept a newzero_copy_non_owningboolean parameter, allowing explicit control over zero-copy behavior. [1] [2]GetPyObjFromTensorso that zero-copy numpy arrays are only created if the tensor owns its buffer or ifzero_copy_non_owningis explicitly set. Otherwise, the data is copied to prevent use-after-free errors when the original input memory might be released.Device Handling Updates:
onnxruntime_pybind_ortvalue.ccto always request zero-copy for outputs from non-CPU devices, ensuring consistent and safe behavior across all supported hardware backends.Testing and Regression Coverage:
onnxruntime_test_python.pyto verify that outputs which alias inputs are returned as independent numpy arrays, preventing data corruption from dangling pointers. Also added a test to confirm that session-allocated outputs still use efficient zero-copy numpy arrays.Addresses issue: #21922