Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NumPy-like function torch.fmax() & torch.fmin() #49312

Closed
wants to merge 8 commits into from

Conversation

Kiyosora
Copy link
Contributor

@Kiyosora Kiyosora changed the title [WIP] Implement NumPy-like function torch.fmax() & torch.fmin() Implement NumPy-like function torch.fmax() & torch.fmin() Dec 14, 2020
@Kiyosora Kiyosora marked this pull request as ready for review December 14, 2020 09:27
@codecov
Copy link

codecov bot commented Dec 14, 2020

Codecov Report

Merging #49312 (33af88b) into master (ce30dba) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #49312   +/-   ##
=======================================
  Coverage   80.65%   80.65%           
=======================================
  Files        1913     1913           
  Lines      208121   208145   +24     
=======================================
+ Hits       167859   167887   +28     
+ Misses      40262    40258    -4     

@mrshenli mrshenli added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Dec 18, 2020
@mrshenli mrshenli added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 18, 2020
@mruberry
Copy link
Collaborator

Hey @Kiyosora! Thanks for another PR. We'll review this shortly.

@@ -362,6 +362,8 @@ Reduction Ops
amin
max
min
fmax
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be in the binary ops near minimum and maximum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, Thanks for correction!

Computes the element-wise maximum of :attr:`input` and :attr:`other`.

.. note::
If one of the elements being compared is a NaN, then the non-nan element is returned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-nan" -> "non-NaN"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this note and just write the following:

Computes the element-wise maximum of :attr:`input` and :attr:`other`. This is like :func:`torch.maximum` except it handles NaNs differently: if exactly one of the two elements being compared is a NaN then the non-NaN element is taken as the maximum. Only if both elements are NaN is NaN propagated.

This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function.

Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

Copy link
Contributor

@heitorschueroff heitorschueroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is looking great. I left a suggestion to simplify the mask logic. This also needs to be made c10 compliant (see comments on native_functions) and add some more tests.

@@ -862,6 +862,23 @@ Tensor max(const Tensor& self, const Tensor& other) {
return at::maximum(self, other);
}

Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) {
auto self_isnan = self.isnan();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a fast-path for integer dtypes. Since integer cannot be nan, you can simply return at::gt_out(result, self, other).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at::maximum_out, not at::gt_out, though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But see my latest comment below about switching to use TensorIterator for simplicity and performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the code to use TensorIterator instead.

auto self_isnan = self.isnan();
auto other_isnan = other.isnan();
auto both_nan = self_isnan.logical_and(other_isnan);
return at::maximum(at::where(self_isnan == both_nan, self, other.to(self.dtype())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be written more efficiently as: other_isnan || (!self_isnan && self > other)

auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self > other));
return at::where(mask, self, other);

You'd have to do the type promotion beforehand which I think is preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something similar to the out version, though torch.where does not have an out version so you'd have to copy the result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea, but I think it'll be easier to write this using TensorIterator, which will handle broadcasting and type promotion and be much faster. See my comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the code to use TensorIterator instead.

auto self_isnan = self.isnan();
auto other_isnan = other.isnan();
auto both_nan = self_isnan.logical_and(other_isnan);
return at::minimum(at::where(self_isnan == both_nan, self, other.to(self.dtype())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this can be written more efficiently as: other_isnan || (!self_isnan && self < other)

auto mask = other.isnan().logical_or_(self.isnan().logical_not_().logical_and_(self < other));
return at::where(mask, self, other);

You'd have to do the type promotion beforehand which I think is preferable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think (self >= other).logical_or(other.isnan()) would work?

However, I would like to recommend this be written like minimum and maximum to use TensorIterator. That will be much more performant and readable. It will require, however, that a gradient be added. The gradient for maximum and minimum should be a good guide. The dispatch will also have to change to no longer be a Math kernel.

The TensorIterator kernels can just be wrappers around std::fmin and std::fmax in C++. I propose we adopt Python's standard for floating-point NaN handling and treat all NaNs as identical, so in this case we don't need to worry about which NaN to return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your equation works if we can assume that comparisons between NaN always return False. This is usually the case but I believe it is not guaranteed in the C++ standard. I agree that writing using TensorIterator will be better and more future proof.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, we should probably add a note that we assume IEEE 754.

aten/src/ATen/native/native_functions.yaml Show resolved Hide resolved
return at::minimum(self, other);
}

Tensor& fmin_out(Tensor& result, const Tensor& self, const Tensor& other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result tensor must be the last parameter (see comments on native_functions.yaml)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

@@ -862,6 +862,23 @@ Tensor max(const Tensor& self, const Tensor& other) {
return at::maximum(self, other);
}

Tensor& fmax_out(Tensor& result, const Tensor& self, const Tensor& other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result tensor must be the last parameter (see comments on native_functions.yaml)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

aten/src/ATen/native/native_functions.yaml Show resolved Hide resolved
self.assertEqual(tensor_result, numpy_result)
self.assertEqual(out, numpy_result)

@dtypes(*(torch.testing.get_all_fp_dtypes()))
def test_maximum_minimum_float_nan_and_inf(self, device, dtype):
# np.maximum and np.minimum functions compare input arrays element-wisely.
# if one of the elements being compared is a NaN, then that element is returned.
ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum))
ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum),
(torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin))
a_vals = (float('inf'), -float('inf'), float('nan'), float('nan'))
b_vals = (-float('inf'), float('inf'), float('inf'), float('nan'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test a combination of nan and non-nan values to ensure we cover the cases where the value returned should come from one tensor vs the other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current test does check this with a_vals and b_vals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only comparison is between float('nan') and float('inf') but what about where the only nan value is in the second tensor?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Looks like the test could be more thorough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've add the case of float('inf') and float('nan') to prove the situation of only nan value in the second tensor.

@mruberry
Copy link
Collaborator

Hey @Kiyosora!

Thanks for implementing torch.fmax and torch.fmin. This looks good, but there are few changes I'd like to propose:

  • as @heitorschueroff points out, dispatch will need to start using c10: full and the argument order needs to be the same going forward between signatures in native_functions.yaml and in the *.cpp files like BinaryOps.cpp. This is a new change and the documentation is not in the build yet. See Enforce c10-fullness for all ops #49619 where the documentation is being added.
  • I think we can implement this more efficiently and readably as a wrapper around C++'s fmin and fmax
  • Some documentation updates

Let me know your thoughts. Looking forward to adding these functions to PyTorch!

@Kiyosora Kiyosora force-pushed the implement_fmax_fmin branch 4 times, most recently from 9f2feb5 to d041f7f Compare December 25, 2020 01:22
@@ -505,6 +505,60 @@ void minimum_kernel(TensorIterator& iter) {
}
}

void fmax_kernel(TensorIterator& iter) {
if (iter.dtype() == ScalarType::Bool) {
Copy link
Collaborator

@mruberry mruberry Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few updates needed here:

  • use iter.common_dtype() to test for whether it's a floating type or not and to dispatch
  • if iter.common_dtype() is not a floating point type then just call maximum
  • in the floating point kernel I don't think you need the NaN checks, I think this can just call std::fmax

Similar changes will need to be made for fmin and the CUDA code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll fix it right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

b_vals = (-float('inf'), float('inf'), float('inf'), float('nan'))
ops = ((torch.maximum, torch.max, np.maximum), (torch.minimum, torch.min, np.minimum),
(torch.fmax, None, np.fmax), (torch.fmin, None, np.fmin))
a_vals = (float('inf'), -float('inf'), float('nan'), float('inf'), float('nan'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extension is good. Also pair a float('nan') with a real number like 0, 1, or .5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

@@ -714,6 +714,10 @@
self: grad.clone().masked_fill_(self <= other, 0)
other: grad.clone().masked_fill_(self > other, 0)

- name: fmax(Tensor self, Tensor other) -> Tensor
self: grad.clone().masked_fill_(self <= other, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same formula as for maximum:

https://github.com/pytorch/pytorch/blob/d041f7f2fd46f7f646d959d2ea7c6ca9613d7211/tools/autograd/derivatives.yaml#L714

That seems odd since these are different functions.

Let's look at the truth table:

  • if self and other are numbers, then gradient goes to self if self is > other, sure
  • if self and other are nan then this comparison is False and self and other both receive gradient, which is interesting
  • if either of self or other are nan then this comparison is false and both receive gradient, that seems odd

In particular, if self is a number and other is NaN then it seems like gradient should go to self but not to other? And, symmetrically, if self is NaN and other is a number then it seems like gradient should go to other and not to self?

I think we should also bias towards self receiving grad, all else being equal, and rewrite these expressions as:

(self >= other).logical_or_(other.isnan()).logical_not_()

Then the masked_fill for other would be:

(self >= other).logical_or_(other.isnan())

@heitorschueroff Does that look correct to you? What do you think, @Kiyosora?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's does makes sense for me, Thanks for pointing this out, I will fix it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look really good, @Kiyosora!

I think the implementation of the function can be simplified by relying on minimum/maximum, and I have some questions about the gradients, too. Looking forward to hearing your thoughts!

@Kiyosora
Copy link
Contributor Author

Hi @mruberry, Sorry for the delay and Thank you so much for your advice!
I've improved the implement and gradients of the function, it look much better now I think.
Please take a look at your convenience. 😃

@mruberry
Copy link
Collaborator

mruberry commented Jan 2, 2021

Thanks @Kiyosora! @heitorschueroff and I will take a look when we're back at work next week.

Copy link
Contributor

@heitorschueroff heitorschueroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is looking great overall except for some minor fixes for which I left comments. The test failures are unrelated.

This function is a wrapper around C++'s ``std::fmax`` and is similar to NumPy's ``fmax`` function.

Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing formatting issues, just a little tweak will do it as follows:

Supports :ref:`broadcasting to a common shape <broadcasting-semantics>`,
:ref:`type promotion <type-promotion-doc>`, and integer and floating-point inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!


Supports :ref:broadcasting to a common shape <broadcasting-semantics>,
:ref:type promotion <type-promotion-doc>, and integer and floating-point inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above for fmax on the formatting.

>>> a = torch.tensor([9.7, float('nan'), 3.1, float('nan')])
>>> b = torch.tensor([-2.2, 0.5, float('nan'), float('nan')])
>>> torch.fmax(a, b)
tensor([9.7000, 0.5000, 3.1000, nan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example.

>>> a = torch.tensor([2.2, float('nan'), 2.1, float('nan')])
>>> b = torch.tensor([-9.3, 0.1, float('nan'), float('nan')])
>>> torch.fmin(a, b)
tensor([-9.3000, 0.1000, 2.1000, nan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example.

aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated Show resolved Hide resolved
if (isFloatingType(iter.common_dtype())) {
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cpu", [&]() {
cpu_kernel(iter,
[](scalar_t a, scalar_t b) -> scalar_t {return std::fmax(a, b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the linter will complain that the return statement is on the same line. Could you move it to its own line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

}
}

void fmin_kernel(TensorIterator& iter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kernels are looking good, just the same changes as for the fmax kernel.

@@ -62,7 +62,33 @@ void minimum_kernel_cuda(TensorIterator& iter) {
}
}

void fmax_kernel_cuda(TensorIterator& iter) {
if (isFloatingType(iter.common_dtype())) {
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cuda", [&]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter.dtype -> iter.common_dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

void fmax_kernel_cuda(TensorIterator& iter) {
if (isFloatingType(iter.common_dtype())) {
AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, iter.dtype(), "fmax_cuda", [&]() {
gpu_kernel_with_scalars(iter, []GPU_LAMBDA(double a, double b) -> scalar_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double -> scalar_t ? or do we need double here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be scalar_t here, I just made a misunderstand, thanks for correcting!

}
}

void fmin_kernel_cuda(TensorIterator& iter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same changes as for the fmax.

@Kiyosora Kiyosora force-pushed the implement_fmax_fmin branch 3 times, most recently from 80d56a2 to ef015b6 Compare January 11, 2021 07:05
@Kiyosora
Copy link
Contributor Author

Thank you so much for being such patient in code reviewing, @heitorschueroff. 👍
I have corrected my code, please kindly take a look at your convenience.

Copy link
Contributor

@heitorschueroff heitorschueroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry and I reviewed it and it's looking good to go. Thank you for this great contribution @Kiyosora.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@heitorschueroff
Copy link
Contributor

@Kiyosora There were some internal issues preventing this PR from landing and looks like it picked up a merge conflict now. Do you mind rebasing and fixing the conflict?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@heitorschueroff merged this pull request in 4803eaf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: numpy Related to numpy support, and also numpy compatibility of our operators open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants