Skip to content

Address external path references in loaded models#28709

Merged
yuslepukhin merged 2 commits into
mainfrom
yuslepukhin/sparse_in_memory
May 29, 2026
Merged

Address external path references in loaded models#28709
yuslepukhin merged 2 commits into
mainfrom
yuslepukhin/sparse_in_memory

Conversation

@yuslepukhin
Copy link
Copy Markdown
Member

This pull request strengthens security checks around loading external tensor data in ONNX Runtime, particularly to prevent malicious models from referencing unsafe file paths or in-memory address markers that could lead to arbitrary file access or unsafe memory dereferencing. The changes introduce stricter validation for external data paths and add explicit rejections for ORT in-memory address markers found in model protobufs, along with new and improved regression tests to verify this behavior.

Security hardening for external data loading:

  • Added ValidateExternalFilePathForTensor to enforce that external data paths are validated for all code paths loading external data (including those outside Graph::Resolve), rejecting absolute or directory-escaping paths and passing through only trusted in-memory markers. This is now called in GetExtDataFromTensorProto and LoadExtDataToTensorFromTensorProto to ensure defense-in-depth. [1] [2]
  • Updated the validation logic for sparse tensor sub-tensors with ValidateSparseSubTensorExternalDataPath, clarifying the handling of in-memory markers and ensuring only legitimate file paths are accepted.
  • Changed SparseTensorProtoToDenseTensorProto to use the new sparse sub-tensor validation for both values and indices.

Model loading and graph construction protections:

  • In Graph::Graph, added explicit rejection of ORT in-memory address markers in sparse tensor attributes and initializers when loading from a protobuf, preventing attackers from crafting models that could cause unsafe memory access during sparse-to-dense conversion or initializer resolution. [1] [2] [3]

Expanded and improved testing:

  • Added new unit tests to verify that absolute and directory-escaping external paths are rejected even when loading tensors directly (not via graph resolution), and that in-memory address markers are not accepted in dense or sparse initializers loaded from protobufs. [1] [2]
  • Updated an optimizer initializer test to reflect the new error handling for invalid external data paths.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens external data loading against malicious models by (1) adding a backstop path-validation call in GetExtDataFromTensorProto/LoadExtDataToTensorFromTensorProto so direct callers outside Graph::Resolve are also protected, and (2) explicitly rejecting ORT in-memory address markers (*/_ORT_MEM_ADDR_/*, */_ORT_NATIVE_ENDIAN_MEM_ADDR_/*) when they appear in dense initializers, sparse initializers, or sparse Constant-node attributes loaded from a protobuf. The in-memory markers are ORT-internal sentinels that cause reinterpret_cast<void*>(offset)-based dereferences; allowing them from an .onnx protobuf is a memory-safety hazard. ORT-format flatbuffer loading and the ModelEditor API use different Graph constructors and are unaffected.

Changes:

  • Introduce ValidateExternalFilePathForTensor and call it from GetExtDataFromTensorProto / LoadExtDataToTensorFromTensorProto; rename/refactor the sparse-only validator to ValidateSparseSubTensorExternalDataPath.
  • In Graph::Graph (GraphProto ctor), ORT_ENFORCE against in-memory address markers on dense initializers, sparse initializers (values + indices), and Constant-node sparse-tensor attributes.
  • Add direct loader tests for absolute and directory-escaping paths, plus model-load tests that reject in-memory markers on dense and sparse initializers; update one optimizer test to expect OnnxRuntimeException instead of filesystem_error.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
onnxruntime/core/framework/tensorprotoutils.cc Adds ValidateExternalFilePathForTensor backstop and renames the sparse helper to ValidateSparseSubTensorExternalDataPath.
onnxruntime/core/graph/graph.cc Rejects in-memory address markers on dense initializers, sparse initializers, and sparse Constant attributes in the GraphProto ctor.
onnxruntime/test/framework/tensorutils_test.cc Adds direct GetExtDataFromTensorProto tests for absolute and escaping external paths.
onnxruntime/test/ir/graph_test.cc Adds model-load tests verifying in-memory markers are rejected on dense and sparse (values/indices) initializers.
onnxruntime/test/optimizer/initializer_test.cc Updates expected exception type for an invalid external directory to OnnxRuntimeException.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu
tianleiwu previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Hardening Review

Strong defense-in-depth PR. The layered approach — Graph ctor as the gatekeeper for untrusted protobufs, with validation helpers enforcing path constraints at data-loading callsites — is well-designed.

Positives:

  • Adding path validation inside GetExtDataFromTensorProto and LoadExtDataToTensorFromTensorProto ensures callers outside Graph::Resolve (training checkpoints, custom-op initializers) are protected.
  • Rejecting in-memory markers at the protobuf entry point (Graph ctor) is the correct architectural choice — prevents attacker-controlled pointers from ever reaching data-loading code.
  • Renaming to ValidateSparseSubTensorExternalDataPath makes scope and intent clearer.
  • The trust-boundary comments (ORT-format flatbuffer = trusted, protobuf = untrusted) are valuable for future maintainers.
  • Test coverage is thorough: sparse values/indices markers, dense initializer markers, absolute paths, and escaping paths.

One minor suggestion inline.

Comment thread onnxruntime/core/framework/tensorprotoutils.cc
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuslepukhin yuslepukhin enabled auto-merge (squash) May 29, 2026 00:01
@yuslepukhin yuslepukhin merged commit 77d32c3 into main May 29, 2026
86 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/sparse_in_memory branch May 29, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Validate sparse tensors that use ORT's in-memory address markers

3 participants