-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Fix torch.nn.functional.grid_sample
crashes if grid
has NaNs
#42703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
💊 CI failures summary and remediationsAs of commit a58b4b2 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
6a5ea42
to
c9185c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Wait. Isn't NaN propagation what we are supposed to do? |
|
I thought we decided to propagate NaN as much as we could. I suppose it is more important in grid_sample, which is often used as a layer/op in a neural network. Yet this PR stops propagation IIRC. |
But if there are |
Why can't we? If there is |
c9185c6
to
3f3a793
Compare
Propagating NaNs from grid into output would be quite expensive and before this patch would almost certainly result in segfault. |
3f3a793
to
82437bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Hmm sure. Maybe worth saying so in comments then? The previous commit says something like "it is important to clamp NaN", which was unintuitive to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify in the docs what happens for nan/inf values in the grid? Interestingly original issue was about very large value turning into inf, looks like it should have been handled before by minimum
/maximum
clipping?
This fix is for CPU only, what happens on CUDA?
@@ -203,7 +203,8 @@ struct ComputeLocationBase<scalar_t, /*align_corners=*/true> { | |||
} | |||
|
|||
inline Vec clip_coordinates(const Vec &in) const { | |||
return minimum(Vec(max_val), maximum(in, Vec(0))); | |||
// Invert order of clamp_min operands in order to clamp Nans to zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
In `clip_coordinates` replace `minimum(maximum(in))` composition with `clamp_max(clamp_min(in))` and swap order of `clamp_min` operands to clamp Nans in grid to 0
82437bf
to
a58b4b2
Compare
Done
It was turning into
Somehow it's already working like that on CUDA (perhaps GPU automatically turns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Great, can you add that to docs note too, that large values can be converted to |
FYI, the cuda fix is at #35506. Perhaps we can merge the test in this PR with this and remove Lines 10098 to 10139 in 3cf2551
( |
In
clip_coordinates
replaceminimum(maximum(in))
composition withclamp_max(clamp_min(in))
Swap order of
clamp_min
operands to clamp NaNs in grid to 0Fixes #42616