New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nondeterministic alert to index_copy, median CUDA and kthvalue CUDA #46942
Add nondeterministic alert to index_copy, median CUDA and kthvalue CUDA #46942
Conversation
💊 CI failures summary and remediationsAs of commit c42a254 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 74 times. |
2b49191
to
5f0265a
Compare
Quick notes (did not do a full review)
|
Hi @kurtamohler! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Hey @kurtamohler! Great to see more nondeterministic behavior caught by this flag. For now @ngimel and I think we should just always error like we do with other operations. It's an interesting design question to follow-up with for whether we should add these checks to all operations that are nondeterministic when given duplicate indices. |
Follow-up: we should also throw an error if the determinism flag is set and indices are returned for median or kth value on CUDA. |
30ae80e
to
8596d36
Compare
Tests for |
Looks like the |
8596d36
to
7c99047
Compare
Barring any additional test failures, I think this is ready for a re-review. |
2da9bf6
to
a564999
Compare
Hey @kurtamohler, made a few inline suggestions to align the comments and documentation. Is there a good determinism test that verifies this runtime error will trigger when these functions are called? It'd be good to see it triggered on the function, method, inplace, and out variants. For median it should not be triggered if indices aren't returned but triggered if they are. |
a564999
to
c83faf3
Compare
Codecov Report
@@ Coverage Diff @@
## master #46942 +/- ##
==========================================
+ Coverage 80.91% 81.14% +0.23%
==========================================
Files 1855 1838 -17
Lines 200241 198605 -1636
==========================================
- Hits 162021 161164 -857
+ Misses 38220 37441 -779 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kurtamohler!
Overall the actual change looks straightforward and good. Just a few questions/comments about "Python-ness" of the testing stuff.
27823d0
to
330ba8a
Compare
I'm not sure what is causing the |
There was upstream failure for this, rebase and you should be fine. |
330ba8a
to
c6048a1
Compare
e7386ba
to
9f68aaf
Compare
Had to rebase to fix a conflict. I think this is ready to go. |
@@ -3894,6 +3894,10 @@ def merge_dicts(*dicts): | |||
(see :func:`torch.squeeze`), resulting in both the :attr:`values` and | |||
:attr:`indices` tensors having 1 fewer dimension than the :attr:`input` tensor. | |||
|
|||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index_copy_ needs a note, too:
https://pytorch.org/docs/master/tensors.html?highlight=index_copy#torch.Tensor.index_copy_
but median already has one:
https://pytorch.org/docs/master/generated/torch.median.html?highlight=median#torch.median
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hey @kurtamohler! Had a look, just a few more comments/suggestions. |
@@ -408,33 +421,28 @@ def wrapper(*args, **kwargs): | |||
def wrapDeterministicFlagAPITest(fn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate in this comment that tests using this wrapper need to start a subprocess (how and why) so that their cuBLAS is initialized with the new workspace config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, let me know what you think
9f68aaf
to
c42a254
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Thanks Kurt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Also fixes issue where skipped tests did not properly restore deterministic flag.
Fixes #46743