-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onnxruntime-directml 1.18.0 broken multithreading inference session #20713
Comments
Tagging @PatriceVignola @smk2007 @fdwr for visibility. |
Same here on Windows, versions 1.16.0 to 1.17.3 work fine over multiple threads, however 1.18.0 gives
|
We’ve noted the issue with GPU resource contention due to multiple threads. This usage pattern is not recommended as it makes multiple threads request all of the GPU resources, and can cause contention. Also, the allocator in python API (both CUDA and DML) is explicitly not thread safe because it initializes the allocator as a global singleton due it living outside of the session. We’re investigating the recent failure and will address it. Meanwhile, please avoid this pattern to prevent GPU contention. |
Describe the issue
With the new version 1.18 it seems that trying to use different InferenceSession using the same DirectML device, all threads remain stalled without giving any exception or error
To reproduce
Thread 1
Thread n (where n can be any number)
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
1.18.0
The text was updated successfully, but these errors were encountered: