Skip to content

Conversation

eqy
Copy link
Collaborator

@eqy eqy commented Aug 3, 2023

An alternative to #106235 that just adds our own uid generation so that we can call beginAllocateStreamToPool (which notifies the caching allocator that a capture is starting) before actually starting the capture. Note that this does appear to change the behavior uid generation a bit from the CUDA API call (which seems to increment by 3 each time instead of 1).

Looking at the changes again I'm not sure if both the begin capture ordering change is needed in addition to the end capture ordering change, but it makes me uneasy as I'm not sure anything prevents the autograd thread from running cleanup code "in-between" captures.

CC @zdevito @eellison

cc @mcarilli @ezyang

@eqy eqy added open source module: cuda graphs Ability to capture and then replay streams of CUDA kernels topic: not user facing topic category module: CUDACachingAllocator labels Aug 3, 2023
@eqy eqy requested review from eellison and zdevito August 3, 2023 19:29
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106570

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 537dd71:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy eqy changed the title [CUDA][CUDA Graphs] Fix potential race with autograd thread during a graph capture [CUDA][CUDA Graphs] Fix potential race with autograd thread during a graph capture 2 Aug 3, 2023
Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot cleaner as a way of resolving the race conditions. I don't see anything obviously wrong with CUDAGraph coming ups with its own IDs. I think using the capture ID was a holdover from when the allocator would look up what pool to use via that ID. Now that it is per-stream anyway, this seems much simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't need this call anymore.

@eqy
Copy link
Collaborator Author

eqy commented Aug 4, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased graphsuid onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout graphsuid && git pull --rebase)

@eqy eqy added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Aug 7, 2023
@eqy
Copy link
Collaborator Author

eqy commented Aug 7, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased graphsuid onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout graphsuid && git pull --rebase)

@eqy
Copy link
Collaborator Author

eqy commented Aug 8, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Cyril-Anto pushed a commit to Cyril-Anto/pytorch that referenced this pull request Aug 17, 2023
…graph capture 2 (pytorch#106570)

An alternative to pytorch#106235 that just adds our own uid generation so that we can call `beginAllocateStreamToPool` (which notifies the caching allocator that a capture is starting) before actually starting the capture. Note that this does appear to change the behavior uid generation a bit from the CUDA API call (which seems to increment by 3 each time instead of 1).

Looking at the changes again I'm not sure if both the _begin_ capture ordering change is needed in addition to the _end_ capture ordering change, but it makes me uneasy as I'm not sure anything prevents the autograd thread from running cleanup code "in-between" captures.

CC @zdevito @eellison

Pull Request resolved: pytorch#106570
Approved by: https://github.com/zdevito
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: cuda graphs Ability to capture and then replay streams of CUDA kernels module: CUDACachingAllocator open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants