New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NumPy-like function torch.fmax() & torch.fmin() #49312
Conversation
Codecov Report
@@ Coverage Diff @@
## master #49312 +/- ##
=======================================
Coverage 80.65% 80.65%
=======================================
Files 1913 1913
Lines 208121 208145 +24
=======================================
+ Hits 167859 167887 +28
+ Misses 40262 40258 -4 |
Hey @Kiyosora! Thanks for another PR. We'll review this shortly. |
docs/source/torch.rst
Outdated
@@ -362,6 +362,8 @@ Reduction Ops | |||
amin | |||
max | |||
min | |||
fmax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be in the binary ops near minimum and maximum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, Thanks for correction!
torch/_torch_docs.py
Outdated
Computes the element-wise maximum of :attr:`input` and :attr:`other`. | ||
|
||
.. note:: | ||
If one of the elements being compared is a NaN, then the non-nan element is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-nan" -> "non-NaN"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this note and just write the following:
Computes the element-wise maximum of :attr:`input` and :attr:`other`. This is like :func:`torch.maximum` except it handles NaNs differently: if exactly one of the two elements being compared is a NaN then the non-NaN element is taken as the maximum. Only if both elements are NaN is NaN propagated.
This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function.
Supports :ref:broadcasting to a common shape <broadcasting-semantics>
,
:ref:type promotion <type-promotion-doc>
, and integer and floating-point inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is looking great. I left a suggestion to simplify the mask logic. This also needs to be made c10 compliant (see comments on native_functions) and add some more tests.
aten/src/ATen/native/BinaryOps.cpp
Outdated
@@ -862,6 +862,23 @@ Tensor max(const Tensor& self, const Tensor& other) { | |||
return at::maximum(self, other); | |||
} | |||
|
|||
Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) { | |||
auto self_isnan = self.isnan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include a fast-path for integer dtypes. Since integer cannot be nan, you can simply return at::gt_out(result, self, other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at::maximum_out, not at::gt_out, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But see my latest comment below about switching to use TensorIterator for simplicity and performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've improved the code to use TensorIterator instead.
aten/src/ATen/native/BinaryOps.cpp
Outdated
auto self_isnan = self.isnan(); | ||
auto other_isnan = other.isnan(); | ||
auto both_nan = self_isnan.logical_and(other_isnan); | ||
return at::maximum(at::where(self_isnan == both_nan, self, other.to(self.dtype())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be written more efficiently as: other_isnan || (!self_isnan && self > other)
auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self > other));
return at::where(mask, self, other);
You'd have to do the type promotion beforehand which I think is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do something similar to the out version, though torch.where does not have an out version so you'd have to copy the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea, but I think it'll be easier to write this using TensorIterator, which will handle broadcasting and type promotion and be much faster. See my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've improved the code to use TensorIterator instead.
aten/src/ATen/native/BinaryOps.cpp
Outdated
auto self_isnan = self.isnan(); | ||
auto other_isnan = other.isnan(); | ||
auto both_nan = self_isnan.logical_and(other_isnan); | ||
return at::minimum(at::where(self_isnan == both_nan, self, other.to(self.dtype())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this can be written more efficiently as: other_isnan || (!self_isnan && self < other)
auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self < other));
return at::where(mask, self, other);
You'd have to do the type promotion beforehand which I think is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I think (self >= other).logical_or(other.isnan())
would work?
However, I would like to recommend this be written like minimum and maximum to use TensorIterator. That will be much more performant and readable. It will require, however, that a gradient be added. The gradient for maximum and minimum should be a good guide. The dispatch will also have to change to no longer be a Math kernel.
The TensorIterator kernels can just be wrappers around std::fmin
and std::fmax
in C++. I propose we adopt Python's standard for floating-point NaN handling and treat all NaNs as identical, so in this case we don't need to worry about which NaN to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your equation works if we can assume that comparisons between NaN always return False. This is usually the case but I believe it is not guaranteed in the C++ standard. I agree that writing using TensorIterator will be better and more future proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, we should probably add a note that we assume IEEE 754.
aten/src/ATen/native/BinaryOps.cpp
Outdated
return at::minimum(self, other); | ||
} | ||
|
||
Tensor& fmin_out(Tensor& result, const Tensor& self, const Tensor& other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result tensor must be the last parameter (see comments on native_functions.yaml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
aten/src/ATen/native/BinaryOps.cpp
Outdated
@@ -862,6 +862,23 @@ Tensor max(const Tensor& self, const Tensor& other) { | |||
return at::maximum(self, other); | |||
} | |||
|
|||
Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result tensor must be the last parameter (see comments on native_functions.yaml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
test/test_binary_ufuncs.py
Outdated
self.assertEqual(tensor_result, numpy_result) | ||
self.assertEqual(out, numpy_result) | ||
|
||
@dtypes(*(torch.testing.get_all_fp_dtypes())) | ||
def test_maximum_minimum_float_nan_and_inf(self, device, dtype): | ||
# np.maximum and np.minimum functions compare input arrays element-wisely. | ||
# if one of the elements being compared is a NaN, then that element is returned. | ||
ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum)) | ||
ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum), | ||
(torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin)) | ||
a_vals = (float('inf'), -float('inf'), float('nan'), float('nan')) | ||
b_vals = (-float('inf'), float('inf'), float('inf'), float('nan')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test a combination of nan and non-nan values to ensure we cover the cases where the value returned should come from one tensor vs the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current test does check this with a_vals
and b_vals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only comparison is between float('nan') and float('inf') but what about where the only nan value is in the second tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Looks like the test could be more thorough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've add the case of float('inf')
and float('nan')
to prove the situation of only nan value in the second tensor.
Hey @Kiyosora! Thanks for implementing torch.fmax and torch.fmin. This looks good, but there are few changes I'd like to propose:
Let me know your thoughts. Looking forward to adding these functions to PyTorch! |
9f2feb5
to
d041f7f
Compare
@@ -505,6 +505,60 @@ void minimum_kernel(TensorIterator& iter) { | |||
} | |||
} | |||
|
|||
void fmax_kernel(TensorIterator& iter) { | |||
if (iter.dtype() == ScalarType::Bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few updates needed here:
- use iter.common_dtype() to test for whether it's a floating type or not and to dispatch
- if iter.common_dtype() is not a floating point type then just call maximum
- in the floating point kernel I don't think you need the NaN checks, I think this can just call std::fmax
Similar changes will need to be made for fmin and the CUDA code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I'll fix it right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
test/test_binary_ufuncs.py
Outdated
b_vals = (-float('inf'), float('inf'), float('inf'), float('nan')) | ||
ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum), | ||
(torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin)) | ||
a_vals = (float('inf'), -float('inf'), float('nan'), float('inf'), float('nan')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extension is good. Also pair a float('nan')
with a real number like 0, 1, or .5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
tools/autograd/derivatives.yaml
Outdated
@@ -714,6 +714,10 @@ | |||
self: grad.clone().masked_fill_(self <= other, 0) | |||
other: grad.clone().masked_fill_(self > other, 0) | |||
|
|||
- name: fmax(Tensor self, Tensor other) -> Tensor | |||
self: grad.clone().masked_fill_(self <= other, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same formula as for maximum:
That seems odd since these are different functions.
Let's look at the truth table:
- if self and other are numbers, then gradient goes to self if self is > other, sure
- if self and other are nan then this comparison is False and self and other both receive gradient, which is interesting
- if either of self or other are nan then this comparison is false and both receive gradient, that seems odd
In particular, if self is a number and other is NaN then it seems like gradient should go to self but not to other? And, symmetrically, if self is NaN and other is a number then it seems like gradient should go to other and not to self?
I think we should also bias towards self receiving grad, all else being equal, and rewrite these expressions as:
(self >= other).logical_or_(other.isnan()).logical_not_()
Then the masked_fill for other would be:
(self >= other).logical_or_(other.isnan())
@heitorschueroff Does that look correct to you? What do you think, @Kiyosora?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's does makes sense for me, Thanks for pointing this out, I will fix it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look really good, @Kiyosora!
I think the implementation of the function can be simplified by relying on minimum/maximum, and I have some questions about the gradients, too. Looking forward to hearing your thoughts!
d041f7f
to
1863539
Compare
Hi @mruberry, Sorry for the delay and Thank you so much for your advice! |
Thanks @Kiyosora! @heitorschueroff and I will take a look when we're back at work next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is looking great overall except for some minor fixes for which I left comments. The test failures are unrelated.
torch/_torch_docs.py
Outdated
This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function. | ||
|
||
Supports :ref:broadcasting to a common shape <broadcasting-semantics>, | ||
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing formatting issues, just a little tweak will do it as follows:
Supports :ref:`broadcasting to a common shape <broadcasting-semantics>`,
:ref:`type promotion <type-promotion-doc>`, and integer and floating-point inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
|
||
Supports :ref:broadcasting to a common shape <broadcasting-semantics>, | ||
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above for fmax on the formatting.
>>> a = torch.tensor([9.7, float('nan'), 3.1, float('nan')]) | ||
>>> b = torch.tensor([-2.2, 0.5, float('nan'), float('nan')]) | ||
>>> torch.fmax(a, b) | ||
tensor([9.7000, 0.5000, 3.1000, nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good example.
>>> a = torch.tensor([2.2, float('nan'), 2.1, float('nan')]) | ||
>>> b = torch.tensor([-9.3, 0.1, float('nan'), float('nan')]) | ||
>>> torch.fmin(a, b) | ||
tensor([-9.3000, 0.1000, 2.1000, nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good example.
if (isFloatingType(iter.common_dtype())) { | ||
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cpu", [&]() { | ||
cpu_kernel(iter, | ||
[](scalar_t a, scalar_t b) -> scalar_t {return std::fmax(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the linter will complain that the return statement is on the same line. Could you move it to its own line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
} | ||
} | ||
|
||
void fmin_kernel(TensorIterator& iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kernels are looking good, just the same changes as for the fmax kernel.
@@ -62,7 +62,33 @@ void minimum_kernel_cuda(TensorIterator& iter) { | |||
} | |||
} | |||
|
|||
void fmax_kernel_cuda(TensorIterator& iter) { | |||
if (isFloatingType(iter.common_dtype())) { | |||
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cuda", [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iter.dtype -> iter.common_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed!
void fmax_kernel_cuda(TensorIterator& iter) { | ||
if (isFloatingType(iter.common_dtype())) { | ||
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cuda", [&]() { | ||
gpu_kernel_with_scalars(iter, []GPU_LAMBDA(double a, double b) -> scalar_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double -> scalar_t ? or do we need double here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be scalar_t
here, I just made a misunderstand, thanks for correcting!
} | ||
} | ||
|
||
void fmin_kernel_cuda(TensorIterator& iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same changes as for the fmax.
80d56a2
to
ef015b6
Compare
Thank you so much for being such patient in code reviewing, @heitorschueroff. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@Kiyosora There were some internal issues preventing this PR from landing and looks like it picked up a merge conflict now. Do you mind rebasing and fixing the conflict? |
ef015b6
to
87927d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@heitorschueroff merged this pull request in 4803eaf. |
torch.fmax()
andtorch.fmin()
recommended in Implement NumPy-like function torch.msort() #48440