New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow consumer ops to sync on GraphRoot's gradient #45787
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch! Thanks for the fix and the doc update!
Codecov Report
@@ Coverage Diff @@
## master #45787 +/- ##
=======================================
Coverage 68.32% 68.32%
=======================================
Files 410 410
Lines 52978 52978
=======================================
+ Hits 36195 36196 +1
+ Misses 16783 16782 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the update!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@albanD and @mcarilli, if you could lend your insight. The new I'd like to know if this indicates a problem in our runtime, or a problem in pytorch. This is the line that reliably fails: Line 1774 in 7731370
The |
@jeffdaily the test might be exposing https://github.com/pytorch/pytorch/pull/45787/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R1772-R1773, which would be a separate issue with a separate fix. Can you put a torch.cuda.synchronize() right after backward() and see if that helps? |
@mcarilli That seems to help. How should we proceed? File a new issue? Do you have an idea of how to fix this already? |
Created new issue. #47028 |
Currently, a GraphRoot instance doesn't have an associated stream. Streaming backward synchronization logic assumes the instance ran on the default stream, and tells consumer ops to sync with the default stream. If the gradient the GraphRoot instance passes to consumer backward ops was populated on a non-default stream, we have a race condition.
The race condition can exist even if the user doesn't give a manually populated gradient:
This PR fixes the race condition by associating a GraphRoot instance, at construction time, with the current stream(s) on the device(s) of the grads it will pass to consumers. (i think this relies on GraphRoot executing in the main thread, before backward thread(s) fork, because the grads were populated on the main thread.)
The test demonstrates the race condition. It fails reliably without the PR's GraphRoot diffs and passes with the GraphRoot diffs.
With the GraphRoot diffs, manually populating an incoming-gradient arg for
backward
(ortorch.autograd.grad
) and the actual call toautograd.backward
will have the same stream-semantics relationship as any other pair of ops:This PR also adds the last three examples above to cuda docs and references them from autograd docstrings.