-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a check for stride==0 in gradcheck #38774
Conversation
💊 CI failures summary and remediationsAs of commit 260f0dd (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
57c4d17
to
260f0dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
'This check will likely fail if all the inputs are ' | ||
'not of double precision floating point or complex. ') | ||
content = inp._values() if inp.is_sparse else inp | ||
if content.layout is not torch._mkldnn and any([s == 0 for s in content.stride()]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... you should also check if size > 1. If size <= 1 and stride = 0 it is okay. Also didn’t we have some preliminary code for checking overlapping indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
pytorch/aten/src/ATen/templates/TensorBody.h
Line 213 in 959afe0
bool is_non_overlapping_and_dense() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm also https://github.com/pytorch/pytorch/blob/2af64ba3ede8906c4b48e1d0988665520ad03c9d/aten/src/ATen/MemoryOverlap.cpp and
bool maybeOverlappingIndices(const Tensor& t) { |
we should really merge them... (cf #23586)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we have a few of these but none of them was exposed to python... So I went with the simple check here.
I guess Tensors with nelement == 1
can be excluded indeed. Not sure if we have many of them with stride 0 though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check in IndexUtils incorrectly computes the extent of the dimension
pytorch/aten/src/ATen/cuda/detail/IndexUtils.cu
Lines 64 to 68 in 3def765
for (int i = 0; i < (nonSize1Dims - 1); ++i) { | |
if (((info[i].size - 1) * info[i].stride) >= info[i + 1].stride) { | |
return true; | |
} | |
} |
but we did not fix it because looks like nothing is using it. The extent of
i
-th sorted dimension is (size-1)*stride+extent[i-1]
, not (size-1)*stride + 1
as implied here.But for autograd this maybe too involved, and just checking for 0 strides will weed out 99% erroneous cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but I don't think the check should be overly conservative (erroring out when it could be fine). It should detect a subset of error cases and then error out. So how about let's add the size
check and a TODO note to use existing functionalities when they are ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. Opened a PR to do that: #38877
Fix #38586
Raise a proper error and fix the failing test.