Implement NumPy-like function torch.fmax() & torch.fmin() #49312

Kiyosora · 2020-12-14T06:36:44Z

Implementing the NumPy-like functiontorch.fmax() and torch.fmin() recommended in Implement NumPy-like function torch.msort() #48440

aten/src/ATen/native/BinaryOps.cpp

codecov · 2020-12-14T14:26:06Z

Codecov Report

Merging #49312 (33af88b) into master (ce30dba) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #49312   +/-   ##
=======================================
  Coverage   80.65%   80.65%           
=======================================
  Files        1913     1913           
  Lines      208121   208145   +24     
=======================================
+ Hits       167859   167887   +28     
+ Misses      40262    40258    -4

mruberry · 2020-12-21T21:44:10Z

Hey @Kiyosora! Thanks for another PR. We'll review this shortly.

mruberry · 2020-12-22T19:34:53Z

docs/source/torch.rst

@@ -362,6 +362,8 @@ Reduction Ops
    amin
    max
    min
+    fmax


These should be in the binary ops near minimum and maximum

Addressed, Thanks for correction!

mruberry · 2020-12-22T19:38:57Z

torch/_torch_docs.py

+Computes the element-wise maximum of :attr:`input` and :attr:`other`.
+
+.. note::
+    If one of the elements being compared is a NaN, then the non-nan element is returned. 


"non-nan" -> "non-NaN"

I think we can remove this note and just write the following:

Computes the element-wise maximum of :attr:`input` and :attr:`other`. This is like :func:`torch.maximum` except it handles NaNs differently: if exactly one of the two elements being compared is a NaN then the non-NaN element is taken as the maximum. Only if both elements are NaN is NaN propagated.

This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function.

Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.

heitorschueroff

This PR is looking great. I left a suggestion to simplify the mask logic. This also needs to be made c10 compliant (see comments on native_functions) and add some more tests.

heitorschueroff · 2020-12-22T22:12:59Z

aten/src/ATen/native/BinaryOps.cpp

@@ -862,6 +862,23 @@ Tensor max(const Tensor& self, const Tensor& other) {
  return at::maximum(self, other);
 }

+Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) {
+    auto self_isnan = self.isnan();


Include a fast-path for integer dtypes. Since integer cannot be nan, you can simply return at::gt_out(result, self, other).

at::maximum_out, not at::gt_out, though

But see my latest comment below about switching to use TensorIterator for simplicity and performance.

I've improved the code to use TensorIterator instead.

heitorschueroff · 2020-12-22T22:40:44Z

aten/src/ATen/native/BinaryOps.cpp

+    auto self_isnan = self.isnan();
+    auto other_isnan = other.isnan();
+    auto both_nan = self_isnan.logical_and(other_isnan);
+    return at::maximum(at::where(self_isnan == both_nan, self, other.to(self.dtype())),


This can be written more efficiently as: other_isnan || (!self_isnan && self > other)

auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self > other)); return at::where(mask, self, other);

You'd have to do the type promotion beforehand which I think is preferable.

You could do something similar to the out version, though torch.where does not have an out version so you'd have to copy the result.

This is an interesting idea, but I think it'll be easier to write this using TensorIterator, which will handle broadcasting and type promotion and be much faster. See my comment below.

I've improved the code to use TensorIterator instead.

heitorschueroff · 2020-12-22T22:44:31Z

aten/src/ATen/native/BinaryOps.cpp

+    auto self_isnan = self.isnan();
+    auto other_isnan = other.isnan();
+    auto both_nan = self_isnan.logical_and(other_isnan);
+    return at::minimum(at::where(self_isnan == both_nan, self, other.to(self.dtype())),


Similarly, this can be written more efficiently as: other_isnan || (!self_isnan && self < other)

auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self < other)); return at::where(mask, self, other);

You'd have to do the type promotion beforehand which I think is preferable.

In this case I think (self >= other).logical_or(other.isnan()) would work?

However, I would like to recommend this be written like minimum and maximum to use TensorIterator. That will be much more performant and readable. It will require, however, that a gradient be added. The gradient for maximum and minimum should be a good guide. The dispatch will also have to change to no longer be a Math kernel.

The TensorIterator kernels can just be wrappers around std::fmin and std::fmax in C++. I propose we adopt Python's standard for floating-point NaN handling and treat all NaNs as identical, so in this case we don't need to worry about which NaN to return.

Your equation works if we can assume that comparisons between NaN always return False. This is usually the case but I believe it is not guaranteed in the C++ standard. I agree that writing using TensorIterator will be better and more future proof.

That's true, we should probably add a note that we assume IEEE 754.

aten/src/ATen/native/native_functions.yaml

heitorschueroff · 2020-12-22T22:47:58Z

aten/src/ATen/native/BinaryOps.cpp

+    return at::minimum(self, other);
+}
+
+Tensor& fmin_out(Tensor& result, const Tensor& self, const Tensor& other) {


The result tensor must be the last parameter (see comments on native_functions.yaml)

heitorschueroff · 2020-12-22T22:48:02Z

aten/src/ATen/native/BinaryOps.cpp

@@ -862,6 +862,23 @@ Tensor max(const Tensor& self, const Tensor& other) {
  return at::maximum(self, other);
 }

+Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) {


The result tensor must be the last parameter (see comments on native_functions.yaml)

aten/src/ATen/native/native_functions.yaml

heitorschueroff · 2020-12-22T22:54:42Z

test/test_binary_ufuncs.py

            self.assertEqual(tensor_result, numpy_result)
            self.assertEqual(out, numpy_result)

    @dtypes(*(torch.testing.get_all_fp_dtypes()))
    def test_maximum_minimum_float_nan_and_inf(self, device, dtype):
        # np.maximum and np.minimum functions compare input arrays element-wisely.
        # if one of the elements being compared is a NaN, then that element is returned.
-        ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum))
+        ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum),
+               (torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin))
        a_vals = (float('inf'), -float('inf'), float('nan'), float('nan'))
        b_vals = (-float('inf'), float('inf'), float('inf'), float('nan'))


We should also test a combination of nan and non-nan values to ensure we cover the cases where the value returned should come from one tensor vs the other.

I think the current test does check this with a_vals and b_vals?

I think the only comparison is between float('nan') and float('inf') but what about where the only nan value is in the second tensor?

Good point. Looks like the test could be more thorough.

I've add the case of float('inf') and float('nan') to prove the situation of only nan value in the second tensor.

mruberry · 2020-12-23T00:19:47Z

Hey @Kiyosora!

Thanks for implementing torch.fmax and torch.fmin. This looks good, but there are few changes I'd like to propose:

as @heitorschueroff points out, dispatch will need to start using c10: full and the argument order needs to be the same going forward between signatures in native_functions.yaml and in the *.cpp files like BinaryOps.cpp. This is a new change and the documentation is not in the build yet. See Enforce c10-fullness for all ops #49619 where the documentation is being added.
I think we can implement this more efficiently and readably as a wrapper around C++'s fmin and fmax
Some documentation updates

Let me know your thoughts. Looking forward to adding these functions to PyTorch!

mruberry · 2020-12-25T08:32:37Z

aten/src/ATen/native/cpu/BinaryOpsKernel.cpp

@@ -505,6 +505,60 @@ void minimum_kernel(TensorIterator& iter) {
  }
 }

+void fmax_kernel(TensorIterator& iter) {
+  if (iter.dtype() == ScalarType::Bool) {


There are a few updates needed here:

use iter.common_dtype() to test for whether it's a floating type or not and to dispatch

if iter.common_dtype() is not a floating point type then just call maximum

in the floating point kernel I don't think you need the NaN checks, I think this can just call std::fmax

Similar changes will need to be made for fmin and the CUDA code.

Gotcha, I'll fix it right away.

mruberry · 2020-12-25T08:34:29Z

test/test_binary_ufuncs.py

-        b_vals = (-float('inf'), float('inf'), float('inf'), float('nan'))
+        ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum),
+               (torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin))
+        a_vals = (float('inf'), -float('inf'), float('nan'), float('inf'), float('nan'))


The extension is good. Also pair a float('nan') with a real number like 0, 1, or .5.

mruberry · 2020-12-25T09:02:50Z

tools/autograd/derivatives.yaml

@@ -714,6 +714,10 @@
  self: grad.clone().masked_fill_(self <= other, 0)
  other: grad.clone().masked_fill_(self > other, 0)

+- name: fmax(Tensor self, Tensor other) -> Tensor
+  self: grad.clone().masked_fill_(self <= other, 0)


This is the same formula as for maximum:

https://github.com/pytorch/pytorch/blob/d041f7f2fd46f7f646d959d2ea7c6ca9613d7211/tools/autograd/derivatives.yaml#L714

That seems odd since these are different functions.

Let's look at the truth table:

if self and other are numbers, then gradient goes to self if self is > other, sure

if self and other are nan then this comparison is False and self and other both receive gradient, which is interesting

if either of self or other are nan then this comparison is false and both receive gradient, that seems odd

In particular, if self is a number and other is NaN then it seems like gradient should go to self but not to other? And, symmetrically, if self is NaN and other is a number then it seems like gradient should go to other and not to self?

I think we should also bias towards self receiving grad, all else being equal, and rewrite these expressions as:

(self >= other).logical_or_(other.isnan()).logical_not_()

Then the masked_fill for other would be:

(self >= other).logical_or_(other.isnan())

@heitorschueroff Does that look correct to you? What do you think, @Kiyosora?

It's does makes sense for me, Thanks for pointing this out, I will fix it soon.

mruberry

Updates look really good, @Kiyosora!

I think the implementation of the function can be simplified by relying on minimum/maximum, and I have some questions about the gradients, too. Looking forward to hearing your thoughts!

Kiyosora · 2020-12-31T09:58:46Z

Hi @mruberry, Sorry for the delay and Thank you so much for your advice!
I've improved the implement and gradients of the function, it look much better now I think.
Please take a look at your convenience. 😃

mruberry · 2021-01-02T02:36:46Z

Thanks @Kiyosora! @heitorschueroff and I will take a look when we're back at work next week.

heitorschueroff

This PR is looking great overall except for some minor fixes for which I left comments. The test failures are unrelated.

heitorschueroff · 2021-01-05T21:02:08Z

torch/_torch_docs.py

+This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function.
+
+Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
+:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.


This is causing formatting issues, just a little tweak will do it as follows:

Supports :ref:`broadcasting to a common shape <broadcasting-semantics>`,
:ref:`type promotion <type-promotion-doc>`, and integer and floating-point inputs.

heitorschueroff · 2021-01-05T21:02:39Z

torch/_torch_docs.py

+
+Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
+:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.
+


See comment above for fmax on the formatting.

heitorschueroff · 2021-01-05T21:03:02Z

torch/_torch_docs.py

+    >>> a = torch.tensor([9.7, float('nan'), 3.1, float('nan')])
+    >>> b = torch.tensor([-2.2, 0.5, float('nan'), float('nan')])
+    >>> torch.fmax(a, b)
+    tensor([9.7000, 0.5000, 3.1000,    nan])


Good example.

heitorschueroff · 2021-01-05T21:03:13Z

torch/_torch_docs.py

+    >>> a = torch.tensor([2.2, float('nan'), 2.1, float('nan')])
+    >>> b = torch.tensor([-9.3, 0.1, float('nan'), float('nan')])
+    >>> torch.fmin(a, b)
+    tensor([-9.3000, 0.1000, 2.1000,    nan])


Good example.

aten/src/ATen/native/cpu/BinaryOpsKernel.cpp

heitorschueroff · 2021-01-05T21:13:55Z

aten/src/ATen/native/cpu/BinaryOpsKernel.cpp

+  if (isFloatingType(iter.common_dtype())) {
+    AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cpu", [&]() {
+      cpu_kernel(iter,
+        [](scalar_t a, scalar_t b) -> scalar_t {return std::fmax(a, b);


I believe the linter will complain that the return statement is on the same line. Could you move it to its own line.

heitorschueroff · 2021-01-05T21:16:52Z

aten/src/ATen/native/cpu/BinaryOpsKernel.cpp

+  }
+}
+
+void fmin_kernel(TensorIterator& iter) {


These kernels are looking good, just the same changes as for the fmax kernel.

heitorschueroff · 2021-01-05T21:20:18Z

aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu

@@ -62,7 +62,33 @@ void minimum_kernel_cuda(TensorIterator& iter) {
  }
 }

+void fmax_kernel_cuda(TensorIterator& iter) {
+  if (isFloatingType(iter.common_dtype())) {
+    AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cuda", [&]() {


iter.dtype -> iter.common_dtype

heitorschueroff · 2021-01-05T21:21:35Z

aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu

+void fmax_kernel_cuda(TensorIterator& iter) {
+  if (isFloatingType(iter.common_dtype())) {
+    AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cuda", [&]() {
+      gpu_kernel_with_scalars(iter, []GPU_LAMBDA(double a, double b) -> scalar_t {


double -> scalar_t ? or do we need double here?

It should be scalar_t here, I just made a misunderstand, thanks for correcting!

heitorschueroff · 2021-01-05T21:21:50Z

aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu

+  }
+}
+
+void fmin_kernel_cuda(TensorIterator& iter) {


Same changes as for the fmax.

Kiyosora · 2021-01-11T09:51:33Z

Thank you so much for being such patient in code reviewing, @heitorschueroff. 👍
I have corrected my code, please kindly take a look at your convenience.

heitorschueroff

@mruberry and I reviewed it and it's looking good to go. Thank you for this great contribution @Kiyosora.

facebook-github-bot

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

heitorschueroff · 2021-01-19T03:40:57Z

@Kiyosora There were some internal issues preventing this PR from landing and looks like it picked up a merge conflict now. Do you mind rebasing and fixing the conflict?

facebook-github-bot

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-20T15:13:46Z

@heitorschueroff merged this pull request in 4803eaf.

facebook-github-bot added the cla signed label Dec 14, 2020

pytorchbot added the open source label Dec 14, 2020

Kiyosora changed the title ~~[WIP] Implement NumPy-like function torch.fmax() & torch.fmin()~~ Implement NumPy-like function torch.fmax() & torch.fmin() Dec 14, 2020

Kiyosora marked this pull request as ready for review December 14, 2020 09:27

vadimkantorov reviewed Dec 14, 2020

View reviewed changes

aten/src/ATen/native/BinaryOps.cpp Outdated Show resolved Hide resolved

mrshenli added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Dec 18, 2020

mrshenli requested a review from mruberry December 18, 2020 03:16

mrshenli added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 18, 2020

heitorschueroff self-requested a review December 22, 2020 19:00

mruberry reviewed Dec 22, 2020

View reviewed changes

heitorschueroff suggested changes Dec 22, 2020

View reviewed changes

Kiyosora force-pushed the implement_fmax_fmin branch 4 times, most recently from 9f2feb5 to d041f7f Compare December 25, 2020 01:22

Kiyosora requested review from mruberry and heitorschueroff December 25, 2020 08:20

mruberry reviewed Dec 25, 2020

View reviewed changes

Kiyosora force-pushed the implement_fmax_fmin branch from d041f7f to 1863539 Compare December 31, 2020 05:53

Kiyosora requested a review from mruberry December 31, 2020 09:29

heitorschueroff suggested changes Jan 5, 2021

View reviewed changes

Kiyosora force-pushed the implement_fmax_fmin branch 3 times, most recently from 80d56a2 to ef015b6 Compare January 11, 2021 07:05

Kiyosora requested a review from heitorschueroff January 11, 2021 09:52

heitorschueroff approved these changes Jan 12, 2021

View reviewed changes

facebook-github-bot reviewed Jan 12, 2021

View reviewed changes

Kiyosora added 7 commits January 19, 2021 13:17

implement NumPy-like function torch.fmax & torch.fmin

6a290b2

add test case for fmax & fmin

63197db

add docs and improve test case

c9124c5

fix

c5c439d

feedback

b9810db

feedback2

d397fc8

feedback3

87927d9

Kiyosora force-pushed the implement_fmax_fmin branch from ef015b6 to 87927d9 Compare January 19, 2021 05:19

fixing the conflict

33af88b

facebook-github-bot reviewed Jan 19, 2021

View reviewed changes

facebook-github-bot closed this in 4803eaf Jan 20, 2021

facebook-github-bot added the Merged label Jan 20, 2021

@@ @@ -362,6 +362,8 @@ Reduction Ops @@
                   amin
                   max
                   min
+                  fmax


		Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
		:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.

Implement NumPy-like function torch.fmax() & torch.fmin() #49312

Implement NumPy-like function torch.fmax() & torch.fmin() #49312

Conversation

Kiyosora commented Dec 14, 2020

codecov bot commented Dec 14, 2020 • edited

Codecov Report

mruberry commented Dec 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heitorschueroff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mruberry commented Dec 23, 2020

mruberry Dec 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mruberry left a comment

Choose a reason for hiding this comment

Kiyosora commented Dec 31, 2020

mruberry commented Jan 2, 2021

heitorschueroff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kiyosora commented Jan 11, 2021

heitorschueroff left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

heitorschueroff commented Jan 19, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 20, 2021

codecov bot commented Dec 14, 2020 •

edited

mruberry Dec 25, 2020 •

edited