Fix missing cuDNN DLL preload from NVIDIA site packages on Windows#28787
Conversation
Add `cudnn_engines_tensor_ir64_9.dll` to the Windows cuDNN DLL preload list.
There was a problem hiding this comment.
Pull request overview
This PR updates ONNX Runtime’s Windows CUDA/cuDNN DLL preloading helper to include an additional cuDNN engine DLL when loading from NVIDIA “site-packages” installs, addressing a runtime failure when users rely on onnxruntime.preload_dlls(directory="") (i.e., without PyTorch and without modifying PATH).
Changes:
- Add
cudnn_engines_tensor_ir64_9.dllto the Windows cuDNN DLL preload list used for NVIDIA site-packages discovery.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
Review: Fix missing cuDNN DLL preload from NVIDIA site packages on Windows
The change correctly adds cudnn_engines_tensor_ir64_9.dll so that session.run works when relying solely on NVIDIA site packages on Windows. Adding it at the end of the list is fine for the site-packages full-path load, since its dependency (cudnn_graph64_9.dll) is already loaded earlier in the list.
Verdict: COMMENT — one backward-compatibility concern worth addressing before merge.
⚠️ Backward compatibility with older cuDNN (e.g. 9.13)
cudnn_engines_tensor_ir64_9.dll is a newer cuDNN file. It is present in cuDNN 9.23 but absent in older releases such as 9.13.0.50:
| cuDNN 9.23 (has it) | cuDNN 9.13 (missing it) |
|---|---|
cudnn_engines_tensor_ir64_9.dll ✅ |
❌ not present |
cudnn_ext64_9.dll ✅ |
❌ not present |
preload_dlls() loads the list in two passes:
- Site-packages pass — guarded by
os.path.isfile(dll_path), so a missing DLL is silently skipped. ✅ Older cuDNN is handled correctly here. - Default-path fallback pass — loads each not-yet-loaded entry by bare filename with no existence check:
With an older cuDNN (whether on
for relative_path in dll_paths: dll_filename = relative_path[-1] if dll_filename not in loaded_dlls: try: _ = ctypes.CDLL(dll_filename) except Exception as e: has_failure = True print(f"Failed to load {dll_filename}: {e}")
PATHor in site packages whereisfileskipped it),ctypes.CDLL("cudnn_engines_tensor_ir64_9.dll")raises, setshas_failure = True, and prints bothFailed to load cudnn_engines_tensor_ir64_9.dll: ...and the misleadingPlease follow ... to install CUDA and CuDNN.message — even though inference actually works fine without this optional engine DLL. This is a regression introduced by this PR for users on older cuDNN.
Suggested fix: treat version-specific/optional cuDNN DLLs as best-effort in the fallback pass so their absence does not flag has_failure. For example, keep a small set of optional filenames and skip the failure accounting for them:
# cuDNN DLLs that only exist in newer cuDNN releases (e.g. >= 9.23) and are
# optional for inference. Missing them on older cuDNN must not be treated as a failure.
_optional_dll_filenames = {"cudnn_engines_tensor_ir64_9.dll"}
for relative_path in dll_paths:
dll_filename = relative_path[-1]
if dll_filename not in loaded_dlls:
try:
_ = ctypes.CDLL(dll_filename)
except Exception as e:
if dll_filename not in _optional_dll_filenames:
has_failure = True
print(f"Failed to load {dll_filename}: {e}")That keeps the new fix for cuDNN 9.23 while avoiding a spurious failure message on cuDNN 9.13.
Minor / out of scope
cudnn_ext64_9.dllis also new in 9.23 and not in the list. If it is not needed bysession.run, no action needed; just confirming the omission is intentional.
|
Thanks, I added your suggested changes.
Yes, it is intentional. It was not needed in my tests. |
|
@badranX, please merge main to include a fix for ASan CI. |
…28787) Add `cudnn_engines_tensor_ir64_9.dll` to the Windows cuDNN DLL preload list when trying to load them from NVIDIA site packages. ### Motivation and Context When relying solely on nvidia site packages (without PyTorch installed and without adding DLLs to the system PATH), the recommended approach is: `onnxruntime.preload_dlls(directory="")` However, on the tested models, `session.run` fails on Windows because `cudnn_engines_tensor_ir64_9.dll` is not preloaded.
This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | 4743936 | #28787 | Fix missing cuDNN DLL preload from NVIDIA site packages on Windows | | 15d9b13 | #28816 | fix(ci): nodejs on nuget cuda 13 hangs | --------- Co-authored-by: Badran <yahya.badran@gmail.com>
Add
cudnn_engines_tensor_ir64_9.dllto the Windows cuDNN DLL preload list when trying to load them from NVIDIA site packages.Motivation and Context
When relying solely on nvidia site packages (without PyTorch installed and without adding DLLs to the system PATH), the recommended approach is:
onnxruntime.preload_dlls(directory="")However, on the tested models,
session.runfails on Windows becausecudnn_engines_tensor_ir64_9.dllis not preloaded.