New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA Host Allocator] Add support of CudaHostRegister #108488
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108488
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 7b5470a with merge base a0cea51 (): UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D45843715 |
This PR needs a
|
cf9a231
to
1eeddd4
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister. Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
1eeddd4
to
ffb60f7
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister. Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D45843715 |
ffb60f7
to
415844f
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
415844f
to
3ace388
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
3ace388
to
e3efe51
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
e3efe51
to
f3e7a3d
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
f3e7a3d
to
77048d1
Compare
This pull request was exported from Phabricator. Differential Revision: D45843715 |
77048d1
to
20b7bb6
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
This pull request was exported from Phabricator. Differential Revision: D45843715 |
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
d20ba0c
to
8e49959
Compare
This pull request was exported from Phabricator. Differential Revision: D45843715 |
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Differential Revision: D45843715
test/test_cuda.py
Outdated
@@ -76,6 +76,13 @@ def tearDown(self): | |||
del self.autocast_lists | |||
super().tearDown() | |||
|
|||
def test_pinned_memory_with_cudaregister(self): | |||
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "pinned_use_cuda_host_register:True,pinned_num_register_threads:8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be torch.cuda.memory._set_allocator_settings
otherwise it won't have an effect. Make sure to set it back after the test.
8e49959
to
b035fb5
Compare
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Reviewed By: zdevito Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister to avoid large lock wait time with cudaHostAlloc. Reviewed By: zdevito Differential Revision: D45843715
This pull request was exported from Phabricator. Differential Revision: D45843715 |
b035fb5
to
7b5470a
Compare
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Follow up to #110123, removing the CUDA_VERSION check for ROCm because HIP already has hipMallocAsync() and doesn't need the version check there. Follow up to #108488, fixing the unit failing unit tests by accepting either a "cuda" or "hip" attribute for the caching allocator options. This is aligned to the masquerading strategy for ROCm/HIP. Pull Request resolved: #110715 Approved by: https://github.com/ezyang
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister.
Differential Revision: D45843715