Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Open
Djdefrag opened this issue May 17, 2024 · 3 comments
Open

Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Djdefrag opened this issue May 17, 2024 · 3 comments
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform

Comments

@Djdefrag
Copy link

Djdefrag commented May 17, 2024

Describe the issue

With the new version 1.18 it seems that trying to use different InferenceSession using the same DirectML device, all threads remain stalled without giving any exception or error

To reproduce

Thread 1

 AI_model_loaded = onnx_load(AI_model_path)

AI_model = onnxruntime_inferenceSession(
    path_or_bytes = AI_model_loaded.SerializeToString(), 
    providers =  [('DmlExecutionProvider', {"device_id": "0"})]
)    

onnx_input  = {AI_model.get_inputs()[0].name: image}
onnx_output = AI_model.run(None, onnx_input)[0]

Thread n (where n can be any number)

AI_model_loaded = onnx_load(AI_model_path)

AI_model = onnxruntime_inferenceSession(
    path_or_bytes = AI_model_loaded.SerializeToString(), 
    providers =  [('DmlExecutionProvider', {"device_id": "0"})]
)    

onnx_input  = {AI_model.get_inputs()[0].name: image}
onnx_output = AI_model.run(None, onnx_input)[0]

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

1.18.0

@github-actions github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform labels May 17, 2024
@sophies927
Copy link
Contributor

Tagging @PatriceVignola @smk2007 @fdwr for visibility.

@saulthu
Copy link

saulthu commented Jun 3, 2024

Same here on Windows, versions 1.16.0 to 1.17.3 work fine over multiple threads, however 1.18.0 gives Windows fatal exception: access violation with the following stack trace produced by my own Windows SEH handler:

-----------
Caught unhandled exception...
-----------

Terminating from thread id 10152

Non-C++ exception:
  Error: EXCEPTION_ACCESS_VIOLATION
  Type: Read
  Addr: 0x0

Trace:
 40:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 39:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 38:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 37:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 36:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 35:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 34:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 33:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 32:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 31:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 30:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 29:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 28:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 27:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 26:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 25:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 24:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 23:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 22:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 21:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 20:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 19:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 18:  ?: pybind11::error_already_set::discard_as_unraisable  (onnxruntime_pybind11_state.pyd)
 17:  ?: PyObject_MakeTpCall  (python311.dll)
 16:  ?: PyObject_Vectorcall  (python311.dll)
 15:  ?: PyEval_EvalFrameDefault  (python311.dll)
 14:  ?: PyFunction_Vectorcall  (python311.dll)
 13:  ?: PyFunction_Vectorcall  (python311.dll)
 12:  ?: PyObject_CallObject  (python311.dll)
 11:  ?: PyEval_EvalFrameDefault  (python311.dll)
 10:  ?: PyFunction_Vectorcall  (python311.dll)
  9:  ?: PyObject_CallObject  (python311.dll)
  8:  ?: PyEval_EvalFrameDefault  (python311.dll)
  7:  ?: PyFunction_Vectorcall  (python311.dll)
  6:  ?: PyFunction_Vectorcall  (python311.dll)
  5:  ?: PyObject_Call  (python311.dll)
  4:  ?: PyInterpreterState_Delete  (python311.dll)
  3:  ?: PyInterpreterState_Delete  (python311.dll)
  2:  ?: recalloc  (ucrtbase.dll)
  1:  ?: BaseThreadInitThunk  (KERNEL32.DLL)
  0:  ?: RtlUserThreadStart  (ntdll.dll)

@liuyunms
Copy link

liuyunms commented Jun 7, 2024

We’ve noted the issue with GPU resource contention due to multiple threads. This usage pattern is not recommended as it makes multiple threads request all of the GPU resources, and can cause contention. Also, the allocator in python API (both CUDA and DML) is explicitly not thread safe because it initializes the allocator as a global singleton due it living outside of the session.

We’re investigating the recent failure and will address it. Meanwhile, please avoid this pattern to prevent GPU contention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

4 participants