-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Renorm fix #59615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renorm fix #59615
Conversation
💊 CI failures summary and remediationsAs of commit 0192aa2 (more details on the Dr. CI page):
3 failures not recognized by patterns:
3 jobs timed out:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
It is known that it is not as precise and can hide failures in some cases, that's why we are keeping the periodic slow gradcheck build. Note that this is the first time since introduced that it actually hides a failure, so it's not too common.
You can also set gradcheck_fast_mode=False
on the OpInfo if you want to force it to run with slow gradcheck.
I am also working on adding a label to be able to test with slow gradcheck on PRs to make it easier to run it when we have doubts.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
"Can hide failure in some cases" and "doesn't flag completely wrong gradient computation" are different failure modes. This is the latter. Jacobians mostly consist of 0's, so if we are checking just this fact (not even positions of 0's), that's not very informative. |
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
[maxnorm_v, eps_v, one_v](vec_t norm) -> vec_t { | ||
auto fct = maxnorm_v / (norm + eps_v); | ||
return vec_t::blendv(fct, one_v, norm > maxnorm_v); | ||
return vec_t::blendv(one_v, fct, norm > maxnorm_v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this fix the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a separate issue where cpu produced completely wrong results.
Summary: Fixes pytorch#59584 albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job. Pull Request resolved: pytorch#59615 Reviewed By: jbschlosser Differential Revision: D28964271 Pulled By: ngimel fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a
Fixes #59584
@albanD, @soulitzer,
renorm
grad was completely busted. Fast gradcheck is definitely not doing its job.