[CUDA] support user_compute_stream in python API by tianleiwu · Pull Request #19229 · microsoft/onnxruntime

tianleiwu · 2024-01-22T23:43:32Z

Description

It is an important feature to pass user cuda stream to avoid synchronization in python API. Here we allow user to pass cuda stream for CUDA provider. Note that TRT or ROCm provider need similar change, which are not included in this pull request.

Note that we will set has_user_compute_stream automatically based on whether there is cuda stream passed, so setting has_user_compute_stream through python API has no effect.

Motivation and Context

#19094

jywu-msft · 2024-01-24T06:58:49Z

+@chilo-ms FYI , we can do this with TRT EP as well.

### Description Update python doc about user_compute_stream in CUDA python API for #19229. ### Motivation and Context

### Description  According to the pr #19229 supporting cuda EP use external compute stream, we add support for rocm EP. And when we testing this feature with torch, we found torch use stream 0 for the default stream, and `torch.cuda.current_stream()` returns `0` for current stream, but ort treat `0` or `nullptr` as invalid, and reset has_user_compute_stream to false. Will remove has_user_compute_stream option in the future. ### Motivation and Context  The motivation for this pr is that we want to use torch.cuda.graph to capture ort running kernel, which requires torch and ort are running in the same stream, so we use this API to set ort's working stream.

### Description  * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context  Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](#19619)

) ### Description  * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context  Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](microsoft#19619)

According to the pr #19229 supporting cuda EP use external compute stream, we add support for rocm EP. And when we testing this feature with torch, we found torch use stream 0 for the default stream, and `torch.cuda.current_stream()` returns `0` for current stream, but ort treat `0` or `nullptr` as invalid, and reset has_user_compute_stream to false. Will remove has_user_compute_stream option in the future.  The motivation for this pr is that we want to use torch.cuda.graph to capture ort running kernel, which requires torch and ort are running in the same stream, so we use this API to set ort's working stream.

parse user_compute_stream

6034ad9

tianleiwu requested review from hariharans29 and yufenglee January 23, 2024 17:18

tianleiwu mentioned this pull request Jan 23, 2024

[CUDA] update python doc for user_compute_stream #19245

Merged

yufenglee approved these changes Jan 26, 2024

View reviewed changes

tianleiwu merged commit d7ff81d into main Jan 26, 2024

tianleiwu deleted the tlwu/python_cuda_ep_option_stream branch January 26, 2024 18:34

kailums mentioned this pull request Feb 23, 2024

support user_compute_stream for rocm ep #19619

Merged

yf711 mentioned this pull request Apr 11, 2024

[TensorRT EP] support user_compute_stream in python API #20168

Merged

chilo-ms mentioned this pull request Jul 16, 2024

[TensorRT] Enable refitting an embedded engine when provided as byte stream #21357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] support user_compute_stream in python API#19229

[CUDA] support user_compute_stream in python API#19229
tianleiwu merged 1 commit into
mainfrom
tlwu/python_cuda_ep_option_stream

tianleiwu commented Jan 22, 2024 •

edited

Loading

Uh oh!

jywu-msft commented Jan 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tianleiwu commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

jywu-msft commented Jan 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianleiwu commented Jan 22, 2024 •

edited

Loading