Enable Robust Symlink Support for External Data (HF Cache Support)#27374
Merged
Enable Robust Symlink Support for External Data (HF Cache Support)#27374
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates ONNX Runtime’s external data path validation to properly handle models stored in symlinked directory layouts (notably Hugging Face Hub cache layouts), while keeping path-escape protections in place.
Changes:
- Extend
utils::ValidateExternalDataPathto optionally validate against the canonical (real) model directory when the logicalbase_dircheck fails. - Update
Graph::ConvertInitializersIntoOrtValuesto passmodel_pathintoValidateExternalDataPath. - Remove provider-bridge exposure of
ValidateExternalDataPathand add a Python symlink-based regression test.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/framework/tensorprotoutils.h | Extends ValidateExternalDataPath API to accept optional model_path for canonical validation. |
| onnxruntime/core/framework/tensorprotoutils.cc | Implements dual logical+canonical directory validation for external data paths. |
| onnxruntime/core/graph/graph.cc | Passes model_path into external data validation for initializer loading. |
| onnxruntime/core/session/provider_bridge_ort.cc | Removes provider-host bridge method for external data path validation. |
| onnxruntime/core/providers/shared_library/provider_interfaces.h | Removes Utils__ValidateExternalDataPath from the ProviderHost interface. |
| onnxruntime/core/providers/shared_library/provider_api.h | Removes inline ValidateExternalDataPath wrapper from provider API. |
| onnxruntime/test/python/onnxruntime_test_python_symlink_data.py | Adds Python unittest coverage for HF-style symlink layouts and a negative escape case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
onnxruntime/test/python/onnxruntime_test_python_symlink_data.py
Outdated
Show resolved
Hide resolved
onnxruntime/test/python/onnxruntime_test_python_symlink_data.py
Outdated
Show resolved
Hide resolved
onnxruntime/test/python/onnxruntime_test_python_symlink_data.py
Outdated
Show resolved
Hide resolved
onnxruntime/test/python/onnxruntime_test_python_symlink_data.py
Outdated
Show resolved
Hide resolved
tianleiwu
commented
Feb 18, 2026
edgchen1
reviewed
Feb 18, 2026
edgchen1
approved these changes
Feb 18, 2026
tianleiwu
added a commit
that referenced
this pull request
Feb 18, 2026
…27374) ## Summary This PR enables ONNX Runtime to correctly load models with external data when they are stored in symlinked directory structures. This is a common scenario for models cached by the Hugging Face Hub, where both the model file and data files are symlinks pointing into a flat `blobs/` directory. ## The Problem Previously, ONNX Runtime's external data path validation would only check if the resolved data path was under the logical directory of the model. In symlinked structures (like Hugging Face's `snapshots/` and `blobs/` layout), the resolved data path often sits in a different physical directory than the logical model path, leading to a "path escapes model directory" error even when the data is safely associated with the model. ## The Fix I have updated `ValidateExternalDataPath` to implement a dual-check mechanism: 1. **Logical Check:** Verify if the resolved data path is under the provided `base_dir` (the directory where the model was loaded from). 2. **Physical Check:** If the logical check fails, verify if the resolved data path is under the parent directory of the **real/canonical** model path. This approach ensures that models can load external data from both their logical siblings and their physical siblings while maintaining security constraints. ## Changes - **Core Logic:** - `onnxruntime/core/framework/tensorprotoutils.cc/h`: Updated `ValidateExternalDataPath` to accept `model_path` and perform the dual-check. - `onnxruntime/core/graph/graph.cc`: Updated call site in `ConvertInitializersIntoOrtValues` to pass the `model_path`. - **Cleanup:** - Removed unused `ValidateExternalDataPath` from `provider_api.h`, `provider_interfaces.h`, and `provider_bridge_ort.cc`. These were redundant as path validation is handled by core during session initialization. Provider has no need to validate it. - **Testing:** - Added `onnxruntime/test/python/onnxruntime_test_python_symlink_data.py` to simulate the Hugging Face Hub symlink structure and verify the fix.
tianleiwu
added a commit
that referenced
this pull request
Feb 18, 2026
This cherry-picks the following commits for the release: - #27350 [Build] Fix python packaging pipeline - #27349 [Build] Fix DML Nuget Pipeline for Release - #27334 [DNNL] Fix DNNL build - #27374 Enable Robust Symlink Support for External Data (HF Cache Support) --------- Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR enables ONNX Runtime to correctly load models with external data when they are stored in symlinked directory structures. This is a common scenario for models cached by the Hugging Face Hub, where both the model file and data files are symlinks pointing into a flat
blobs/directory.The Problem
Previously, ONNX Runtime's external data path validation would only check if the resolved data path was under the logical directory of the model. In symlinked structures (like Hugging Face's
snapshots/andblobs/layout), the resolved data path often sits in a different physical directory than the logical model path, leading to a "path escapes model directory" error even when the data is safely associated with the model.The Fix
I have updated
ValidateExternalDataPathto implement a dual-check mechanism:base_dir(the directory where the model was loaded from).This approach ensures that models can load external data from both their logical siblings and their physical siblings while maintaining security constraints.
Changes
onnxruntime/core/framework/tensorprotoutils.cc/h: UpdatedValidateExternalDataPathto acceptmodel_pathand perform the dual-check.onnxruntime/core/graph/graph.cc: Updated call site inConvertInitializersIntoOrtValuesto pass themodel_path.ValidateExternalDataPathfromprovider_api.h,provider_interfaces.h, andprovider_bridge_ort.cc. These were redundant as path validation is handled by core during session initialization. Provider has no need to validate it.onnxruntime/test/python/onnxruntime_test_python_symlink_data.pyto simulate the Hugging Face Hub symlink structure and verify the fix.