-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Don't call sum()
on a tensor that is not summable in layer_norm
#156600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156600
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 1 Pending, 1 Unrelated FailureAs of commit 65c15aa with merge base d061a02 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable with a test case?
I don't know the exact conditions when these are null, but this is failing inside meta for some reason and my hypothesis is it is due to these tensors being null. I am still testing the hypothesis. |
@eqy added a test and verified that it fails on the baseline. It needs PTAL. |
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place where this function is called dgamma and dbeta are defined
Tensor dgamma; |
sum()
on a tensor that is not summable in layer_norm
The if condition guard was correct because we don't assign But it was not that readable. Moreover the PR description was misleading (gamma was actually not PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo 2 small comments
@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
After I landed this PR: #156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: #156699 Approved by: https://github.com/eqy, https://github.com/ngimel
…#156699) After I landed this PR: pytorch#156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: pytorch#156699 Approved by: https://github.com/eqy, https://github.com/ngimel (cherry picked from commit 36dd598)
…nsors (#2583) After PR: pytorch#156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: pytorch#156699 Approved by: https://github.com/eqy, https://github.com/ngimel (cherry picked from commit 36dd598) Fixes SWDEV-547998 Co-authored-by: Ahmad Sharif <ahmads@fb.com>
Don't call
sum()
on a tensor that is default constructed.Previously we could call
sum()
on a tensor that was default-contructed. That would lead to an error like this:Now we only call
sum(0)
on tensors that are defined and properly guard thesum(0)
and assignment.