[CUDA] support user_compute_stream in python API#19229
Merged
Merged
Conversation
Member
|
+@chilo-ms FYI , we can do this with TRT EP as well. |
yufenglee
approved these changes
Jan 26, 2024
tianleiwu
added a commit
that referenced
this pull request
Jan 26, 2024
### Description Update python doc about user_compute_stream in CUDA python API for #19229. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
kailums
added a commit
that referenced
this pull request
Feb 27, 2024
### Description <!-- Describe your changes. --> According to the pr #19229 supporting cuda EP use external compute stream, we add support for rocm EP. And when we testing this feature with torch, we found torch use stream 0 for the default stream, and `torch.cuda.current_stream()` returns `0` for current stream, but ort treat `0` or `nullptr` as invalid, and reset has_user_compute_stream to false. Will remove has_user_compute_stream option in the future. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The motivation for this pr is that we want to use torch.cuda.graph to capture ort running kernel, which requires torch and ort are running in the same stream, so we use this API to set ort's working stream.
yf711
referenced
this pull request
Apr 16, 2024
### Description
<!-- Describe your changes. -->
* Implement `user_compute_stream` python api for TensorRT EP
* Using this option will implicitly set `has_user_compute_stream` as
`true`
* Extend existing TRTEP unit test to verify `user_compute_stream` option
* This has been verified in local pytorch env, with
`torch.cuda.Stream()` passing into `user_compute_stream`:
```python
...
# Before inference
if torch.cuda.is_available():
s = torch.cuda.Stream()
option = {"user_compute_stream": str(s.cuda_stream)}
sess.set_providers(["TensorrtExecutionProvider"], [option])
options = sess.get_provider_options()
assert "TensorrtExecutionProvider" in options
assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream)
assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1"
...
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Align with existing `user_compute_stream` python implementations for
[CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm
EP](#19619)
TedThemistokleous
referenced
this pull request
in TedThemistokleous/onnxruntime
May 7, 2024
) ### Description <!-- Describe your changes. --> * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](microsoft#19619)
mindest
pushed a commit
that referenced
this pull request
Jun 14, 2024
<!-- Describe your changes. --> According to the pr #19229 supporting cuda EP use external compute stream, we add support for rocm EP. And when we testing this feature with torch, we found torch use stream 0 for the default stream, and `torch.cuda.current_stream()` returns `0` for current stream, but ort treat `0` or `nullptr` as invalid, and reset has_user_compute_stream to false. Will remove has_user_compute_stream option in the future. <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The motivation for this pr is that we want to use torch.cuda.graph to capture ort running kernel, which requires torch and ort are running in the same stream, so we use this API to set ort's working stream.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
It is an important feature to pass user cuda stream to avoid synchronization in python API. Here we allow user to pass cuda stream for CUDA provider. Note that TRT or ROCm provider need similar change, which are not included in this pull request.
Note that we will set
has_user_compute_streamautomatically based on whether there is cuda stream passed, so settinghas_user_compute_streamthrough python API has no effect.Motivation and Context
#19094