-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[CUDA][CUDA Graphs] Fix potential race with autograd thread during a graph capture 2 #106570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106570
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 537dd71: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot cleaner as a way of resolving the race conditions. I don't see anything obviously wrong with CUDAGraph coming ups with its own IDs. I think using the capture ID was a holdover from when the allocator would look up what pool to use via that ID. Now that it is per-stream anyway, this seems much simpler.
aten/src/ATen/cuda/CUDAGraph.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't need this call anymore.
@pytorchmergebot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
fe52f3d
to
a68f8af
Compare
@pytorchmergebot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
e94cf13
to
6ab4622
Compare
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…graph capture 2 (pytorch#106570) An alternative to pytorch#106235 that just adds our own uid generation so that we can call `beginAllocateStreamToPool` (which notifies the caching allocator that a capture is starting) before actually starting the capture. Note that this does appear to change the behavior uid generation a bit from the CUDA API call (which seems to increment by 3 each time instead of 1). Looking at the changes again I'm not sure if both the _begin_ capture ordering change is needed in addition to the _end_ capture ordering change, but it makes me uneasy as I'm not sure anything prevents the autograd thread from running cleanup code "in-between" captures. CC @zdevito @eellison Pull Request resolved: pytorch#106570 Approved by: https://github.com/zdevito
An alternative to #106235 that just adds our own uid generation so that we can call
beginAllocateStreamToPool
(which notifies the caching allocator that a capture is starting) before actually starting the capture. Note that this does appear to change the behavior uid generation a bit from the CUDA API call (which seems to increment by 3 each time instead of 1).Looking at the changes again I'm not sure if both the begin capture ordering change is needed in addition to the end capture ordering change, but it makes me uneasy as I'm not sure anything prevents the autograd thread from running cleanup code "in-between" captures.
CC @zdevito @eellison
cc @mcarilli @ezyang