Renorm fix #59615

ngimel · 2021-06-08T05:35:04Z

Fixes #59584
@albanD, @soulitzer, renorm grad was completely busted. Fast gradcheck is definitely not doing its job.

facebook-github-bot · 2021-06-08T05:35:10Z

💊 CI failures summary and remediations

As of commit 0192aa2 (more details on the Dr. CI page):

3/3 failures introduced in this PR

3 failures not recognized by patterns:

Job	Step	Action
^{pytorch_linux_bionic_py3_6_clang9_noarch_test}	^{Report results}	🔁 rerun
^{pytorch_linux_xenial_py3_6_gcc5_4_test}	^{Report results}	🔁 rerun
^{pytorch_macos_10_13_py3_test}	^{Report results}	🔁 rerun

3 jobs timed out:

pytorch_linux_bionic_py3_6_clang9_noarch_test
pytorch_linux_xenial_py3_6_gcc5_4_test
pytorch_macos_10_13_py3_test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

albanD

Thanks for the update!

It is known that it is not as precise and can hide failures in some cases, that's why we are keeping the periodic slow gradcheck build. Note that this is the first time since introduced that it actually hides a failure, so it's not too common.
You can also set gradcheck_fast_mode=False on the OpInfo if you want to force it to run with slow gradcheck.
I am also working on adding a label to be able to test with slow gradcheck on PRs to make it easier to run it when we have doubts.

facebook-github-bot · 2021-06-08T16:03:33Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2021-06-08T16:05:14Z

"Can hide failure in some cases" and "doesn't flag completely wrong gradient computation" are different failure modes. This is the latter. Jacobians mostly consist of 0's, so if we are checking just this fact (not even positions of 0's), that's not very informative.

facebook-github-bot · 2021-06-08T17:08:15Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

anjali411 · 2021-06-08T17:39:21Z

aten/src/ATen/native/cpu/RenormKernel.cpp

      [maxnorm_v, eps_v, one_v](vec_t norm) -> vec_t {
        auto fct = maxnorm_v / (norm + eps_v);
-        return vec_t::blendv(fct, one_v, norm > maxnorm_v);
+        return vec_t::blendv(one_v, fct, norm > maxnorm_v);


how does this fix the issue?

This is a separate issue where cpu produced completely wrong results.

facebook-github-bot · 2021-06-08T22:01:01Z

@ngimel merged this pull request in 9d533ef.

Summary: Fixes pytorch#59584 albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job. Pull Request resolved: pytorch#59615 Reviewed By: jbschlosser Differential Revision: D28964271 Pulled By: ngimel fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a

ngimel requested review from albanD and soulitzer as code owners June 8, 2021 05:35

facebook-github-bot added the cla signed label Jun 8, 2021

ngimel requested a review from peterbell10 June 8, 2021 05:35

albanD approved these changes Jun 8, 2021

View reviewed changes

peterbell10 approved these changes Jun 8, 2021

View reviewed changes

Natalia Gimelshein added 3 commits June 8, 2021 10:04

fix renorm

1d9f84a

lint

3f31f2e

remove skips

0192aa2

ngimel force-pushed the ngimel/renorm_fix branch from 2c2f77b to 0192aa2 Compare June 8, 2021 17:06

anjali411 reviewed Jun 8, 2021

View reviewed changes

facebook-github-bot closed this in 9d533ef Jun 8, 2021

facebook-github-bot added the Merged label Jun 8, 2021

github-actions bot deleted the ngimel/renorm_fix branch February 12, 2024 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Renorm fix #59615

Renorm fix #59615

Uh oh!

ngimel commented Jun 8, 2021

Uh oh!

facebook-github-bot commented Jun 8, 2021 •

edited

Loading

Uh oh!

albanD left a comment •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 8, 2021

Uh oh!

ngimel commented Jun 8, 2021

Uh oh!

facebook-github-bot commented Jun 8, 2021

Uh oh!

anjali411 Jun 8, 2021

Uh oh!

ngimel Jun 8, 2021

Uh oh!

facebook-github-bot commented Jun 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Renorm fix #59615

Renorm fix #59615

Uh oh!

Conversation

ngimel commented Jun 8, 2021

Uh oh!

facebook-github-bot commented Jun 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

3 failures not recognized by patterns:

Uh oh!

albanD left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 8, 2021

Uh oh!

ngimel commented Jun 8, 2021

Uh oh!

facebook-github-bot commented Jun 8, 2021

Uh oh!

anjali411 Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

facebook-github-bot commented Jun 8, 2021 •

edited

Loading

albanD left a comment •

edited

Loading