Throwing errors for min and max reductions in empty CUDA tensors #19612

LeviViana · 2019-04-23T12:37:24Z

Related to #17750.

ezyang · 2019-04-23T16:06:45Z

This needs a test. I would also really appreciate it if you could rewrite the code to use nDimension and/or numel. Here's the relevant comment explaining the distinction:

// [NOTE: nDimension vs nDimensionLegacyNoScalars vs nDimensionLegacyAll]
// nDimension                 corresponds to the "true" ATen dimension.
// nDimensionLegacyNoScalars  correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors.
// nDimensionLegacyAll        corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors
//                            and tensors with a dimension of size zero are collapsed to 0-dimensional tensors.
//
// Eventually, everything should go through nDimension or tensor->dim().

ezyang

needs test

LeviViana · 2019-04-24T07:28:56Z

The tests are not passing for CUDA Tensors. There is some weird behavior going on for integral types:

import torch
x = torch.ones(0)
print(x.int().max().item()) # Gives -2147483648
x = x.cuda()
print(x.int().max().item()) # Gives 0

Is this behavior expected?

ezyang · 2019-04-24T14:32:53Z

Looking at the linked issue, one plausible explanation is that the CUDA float kernel returns -Inf, but the CUDA int kernel returns 0. In which case, that would need to be fixed too.

I don't think it necessarily has to be fixed in this PR, but you would have to adjust the tests to expect broken for now (until it is fixed.)

gchanan

See my reply to #17750 -- we should have this match the reduction-over-dimension case, which is what NumPy does.

LeviViana · 2019-04-25T09:03:10Z

The initial option hasn't been implemented yet. Please give me some feedback on these changes. I've replicated the numpy approach to reducing empty tensors, as you can see below:

import torch
import numpy as np

print(np.ones((0, 3, 4)).max(1).shape) # (0, 4)
print(np.ones((0, 3, 4)).max(2).shape) # (0, 3)

print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(1))) # (0, 4) (0, 4)
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(2))) # (0, 3) (0, 3)

print(np.ones((0, 3, 4)).max(0).shape) # Raises an identity error
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(0))) # Raises an identity error

The tests have been adjusted and are all passing.

ezyang · 2019-04-26T04:13:14Z

I'll let @gchanan review this, the zero dim reduction logic makes my head hurt XD

gchanan

this doesn't match what the linked comment suggests, which is that these should be error cases.

gchanan · 2019-04-30T18:22:11Z

aten/src/ATen/native/ReduceOpsUtils.h

@@ -37,6 +37,10 @@ static bool _dimreduce_return_trivial_no_ident(Tensor &result, const Tensor &sel
    return true;
  }

+  if (self.numel() == 0 && self.ndimension() > 0) {


this can't be correct, because a tensor with numel == 0 can't have have 0 dimensions (all 0-dimensional tensors have 1 element).

I actually didn't understand your comment in #17750. I thought you wanted to replicate numpy's behavior, and that's what I did. This condition was added to allow numel == 0 tensors to have dimensions, again, to match numpy's behavior.

gchanan · 2019-04-30T18:27:54Z

aten/src/ATen/native/TensorCompare.cpp

@@ -109,8 +109,26 @@ std::tuple<Tensor &,Tensor &> mode_out(Tensor& values, Tensor& indices,
           "mode only supports CPU AND CUDA backend, got: ", toString(self.type().backend()));
  dim = maybe_wrap_dim(dim, self.dim());
  if (_dimreduce_return_trivial_no_ident(values, self, dim, keepdim, "mode")) {
-    AT_ASSERT(values.dim() == 0);
-    indices.resize_({}).fill_(0);
+    if (self.dim() == 0){


I'm not sure why this is changed -- wasn't this working before for mode?

This change was made to handle the new case that arises due to the change in aten/src/ATen/native/ReduceOpsUtils.h.

gchanan · 2019-04-30T18:29:42Z

aten/src/TH/generic/THTensorEvenMoreMath.cpp

@@ -569,7 +569,10 @@ scalar_t THTensor_(minall)(THTensor *tensor)
  scalar_t theMin;
  scalar_t value;

-  THArgCheck(THTensor_nDimensionLegacyAll(tensor) > 0, 1, "tensor must have one dimension");
+  if (THTensor_(nElement)(tensor) == 0){
+    return std::numeric_limits<scalar_t>::max();


I thought we weren't going to do this but were going to have the same behavior as the dimensional case?

That's precisely the opposite of what I was thinking :)

gchanan · 2019-04-30T18:29:47Z

aten/src/TH/generic/THTensorEvenMoreMath.cpp

@@ -587,7 +590,10 @@ scalar_t THTensor_(maxall)(THTensor *tensor)
  scalar_t theMax;
  scalar_t value;

-  THArgCheck(THTensor_nDimensionLegacyAll(tensor) > 0, 1, "tensor must have one dimension");
+  if (THTensor_(nElement)(tensor) == 0){


gchanan · 2019-04-30T18:35:42Z

aten/src/ATen/native/TensorCompare.cpp

+               "on tensor with no elements because the "
+               "operation does not have an identity");
+
+      std::vector<int64_t> sizes = {};


I don't think you need this; as noted above, this case should be an error.

This is also incorrect in that it doesn't take keepdim into account.

Again, this was done to replicate numpy's behavior. Sure, keepdim isn't taken into account, but my idea was to have a first feedback on the direction I was taking for the PR, and fortunately you just made it clear that basically nothing matches your plans, so, I won't be coding anything until I'm sure I got what you want.

LeviViana · 2019-05-02T08:42:26Z

@gchanan, It is pretty clear that I just don't know what I'm supposed to do in this PR. Regarding the reduction operation, I honestly thought that replicating numpy's behavior was the way to go. Regarding the numeric limits, I tried to follow this recommendation. Could you explain exactly what are your plans?

For instance, what these statements should return?

print(torch.ones(0).max().item())
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(1)))
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(0)))

ezyang · 2019-05-06T11:21:48Z

I can't 100% speak for @gchanan, but I believe the behavior he is requesting is specifically:

If you run minall/maxall on an empty tensor with no extra kwargs, it gives an error
If you add a new kwarg initial to maxall, you can have it default to that value for empty tensors when it is set

So, to do your examples, all of them would error.

I think the confusion here is that if you don't add the initial kwarg (which I assume you'd prefer not to do in this PR), the correct thing to do in this case is to error, which means you aren't really adding empty tensor support, you're just making something that used to give garbage raise an error instead. But it's consistent with eventually adding actual support, via the initial kwarg. Does that make sense?

LeviViana · 2019-05-06T12:08:39Z

Thanks @ezyang, it makes a lot of sense. I'll wait for @gchanan to confirm this understanding before making changes.

gchanan · 2019-05-06T16:02:40Z

@ezyang is correct.

Here are some relevant test cases (you can ignore the initial= ones for now, they are just there for completeness):

>>> np.amax(np.random.randn(0,3))
ValueError: zero-size array to reduction operation maximum which has no identity

>>> np.amax(np.random.randn(0,3), initial=5)
5.0

>>> np.amax(np.random.randn(0,3), axis=0)
ValueError: zero-size array to reduction operation maximum which has no identity

>>> np.amax(np.random.randn(0,3), axis=1)
array([], dtype=float64)

>>> np.amax(np.random.randn(0,3), axis=0, initial=5)
array([5., 5., 5.])

>>> np.amax(np.random.randn(0,3), axis=1, initial=5)
array([], dtype=float64)

This reverts commit fbdec0b.

This reverts commit ed06a55.

This reverts commit bb3a9eb.

This reverts commit bd12ac2.

This reverts commit 5de26bd.

LeviViana · 2019-05-28T14:04:05Z

Now torch.max(torch.rand(0,4).cuda()) throws an error. Tests have been added. Is it ok ?

soumith · 2019-05-29T00:06:53Z

fyi, @gchanan is on leave till the end of this week

gchanan

I think this is good to go. I also made a suggestion below about how to improve it if you want. Let me know if you want me to merge as is or you want to do the improvement.

gchanan · 2019-06-05T22:27:35Z

aten/src/THC/generic/THCTensorMathReduce.cu

@@ -302,6 +302,7 @@ accreal THCTensor_(meanall)(THCState *state, THCTensor *self)

 scalar_t THCTensor_(minall)(THCState *state, THCTensor *self) {
  THCAssertSameGPU(THCTensor_(checkGPU)(state, 1, self));
+  THArgCheck(THTensor_nDimensionLegacyAll(self) > 0, 1, "tensor must have one dimension");


This is good because it makes the CPU and CUDA error messages match up. Let me know if you want me to merge it as (can you fix the merge conflict?) or alternatively, I have a suggestion for improvement here.

This isn't quite the right error message (even in the CPU case which you didn't touch), and it's totally not clear why because this is legacy code -- sorry about that!

But basically, tensors with 0 elements, used to have 0-dimensions, and THTensor_nDimensionLegacyAll is properly checking that, but:

we shouldn't be using that API in any new code, we should just check that self->numel() > 0

as above, the error message isn't quite correct, because "must have one dimension" is referring to legacy dimension before, but what we really want is a message about identity that matches the case where a dimension is specified, which is "cannot perform reduction function max on tensor with no elements because the operation does not have an identity".

LeviViana · 2019-06-06T10:08:33Z

Thanks @gchanan, I implemented the improvement and changed the tests to fit the error message.

fmassa · 2019-06-06T17:00:10Z

There are some test failures that seem related

Jun 06 16:16:21 C++ exception with description "invalid argument 1: cannot perform reduction function min on tensor with no elements because the operation does not have an identity at /var/lib/jenkins/workspace/aten/src/THC/generic/THCTensorMathReduce.cu:64" thrown in the test body.
Jun 06 16:16:21 [  FAILED  ] TestNative.NativeTestGPU (1738 ms)
Jun 06 16:16:21 [----------] 2 tests from TestNative (4796 ms total)

ezyang · 2019-06-06T18:54:09Z

@pytorchbot rebase this please

LeviViana · 2019-06-06T21:46:48Z

There are some test failures that seem related

Jun 06 16:16:21 C++ exception with description "invalid argument 1: cannot perform reduction function min on tensor with no elements because the operation does not have an identity at /var/lib/jenkins/workspace/aten/src/THC/generic/THCTensorMathReduce.cu:64" thrown in the test body.
Jun 06 16:16:21 [  FAILED  ] TestNative.NativeTestGPU (1738 ms)
Jun 06 16:16:21 [----------] 2 tests from TestNative (4796 ms total)

Thanks @fmassa, I guess the error comes from the Natives Tests implemented here, but I didn't find any reduction operation being tested. Could you please help me identify the source of this error ?

gchanan · 2019-06-10T17:52:50Z

@LeviViana here's the backtrace I get for your error:

0  THCudaByteTensor_minall (state=0x1233d10, self=0x3fd883b0)
    at /data/users/gchanan/_pytorch8/aten/src/THC/generic/THCTensorMathReduce.cu:57
#1  0x00007fffd3f7e016 in THCudaTensor_equal (state=0x1233d10, self_=0x1715190, src_=0x3fd88080)
    at /data/users/gchanan/_pytorch8/aten/src/THC/generic/THCTensorMathPairwise.cu:22
#2  0x00007fffd66b01b0 in at::native::legacy::cuda::_th_equal (self=..., other=...) at aten/src/ATen/LegacyTHFunctionsCUDA.cpp:2918
#3  0x00007fffd668266b in at::CUDAType::equal (this=0x1234880, self=..., other=...) at aten/src/ATen/CUDAType.cpp:1630
#4  0x00000000004578e4 in at::Tensor::equal (this=0x17133f0, other=...) at ../aten/src/ATen/core/TensorMethods.h:1290
#5  0x000000000044cf41 in requireEqualTensorList (t1=..., t2=...) at ../aten/src/ATen/test/native_test.cpp:20
#6  0x000000000044d19d in TestSplit (T=..., t=...) at ../aten/src/ATen/test/native_test.cpp:28
#7  0x0000000000455606 in test (T=..., AccT=...) at ../aten/src/ATen/test/native_test.cpp:199
#8  0x000000000045572e in TestNative_NativeTestGPU_Test::TestBody (this=0x1220f10) at ../aten/src/ATen/test/native_test.cpp:218
#9  0x0000000000487d20 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x1220f10,
    method=&virtual testing::Test::TestBody(), location=0x4984eb "the test body")
    at ../third_party/googletest/googletest/src/gtest.cc:2443
#10 0x00000000004825b8 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x1220f10,
    method=&virtual testing::Test::TestBody(), location=0x4984eb "the test body")
    at ../third_party/googletest/googletest/src/gtest.cc:2479
#11 0x00000000004644eb in testing::Test::Run (this=0x1220f10) at ../third_party/googletest/googletest/src/gtest.cc:2518

so the issue looks like where calls equal, which now needs to be guarded to avoid calling minall when the tensors are empty/

fmassa · 2019-06-12T12:56:02Z

@LeviViana to further explain @gchanan comment

Note that torch.equal on the GPU calls into minall

pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu

Line 22 in 12bc81a

unsigned char min = THCudaByteTensor_minall(state, buf);

And this now raises an error with empty tensors. You need to guard it to avoid calling minall when empty tensors are passed.

gchanan

lgtm.

facebook-github-bot

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-14T16:18:00Z

@gchanan merged this pull request in deb2140.

…612) Summary: Related to pytorch/pytorch#17750. Pull Request resolved: pytorch/pytorch#19612 Differential Revision: D15813649 Pulled By: gchanan fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35

gchanan · 2019-06-14T16:58:13Z

Thanks @LeviViana!

LeviViana · 2019-06-14T21:12:47Z

Thanks @gchanan for the help and review!

Enabling empty tensors to minall and maxall

5de26bd

pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels Apr 23, 2019

ezyang requested a review from gchanan April 23, 2019 16:06

ezyang requested changes Apr 23, 2019

View reviewed changes

LeviViana added 2 commits April 24, 2019 09:17

Making changes requested & adding tests

bd12ac2

Minor changes

bb3a9eb

LeviViana mentioned this pull request Apr 24, 2019

Unexpected numeric limit for CUDA integral types #19673

Closed

Adjusting tests until pytorch#19673 is solved

ed06a55

gchanan requested changes Apr 24, 2019

View reviewed changes

Supporting reduction on multi-dimensional empty Tensors

fbdec0b

gchanan requested changes Apr 30, 2019

View reviewed changes

LeviViana added 7 commits May 28, 2019 15:38

Reverting commits, throwing errors on empty tensors

a01a43e

Merge branch 'tmp' into empty_max

d818b46

Revert "Supporting reduction on multi-dimensional empty Tensors"

abae89e

This reverts commit fbdec0b.

Revert "Adjusting tests until pytorch#19673 is solved"

9823ba6

This reverts commit ed06a55.

Revert "Minor changes"

961e7f5

This reverts commit bb3a9eb.

Revert "Making changes requested & adding tests"

861e082

This reverts commit bd12ac2.

Revert "Enabling empty tensors to minall and maxall"

72eadcb

This reverts commit 5de26bd.

pytorchbot added the module: cuda Related to torch.cuda, and CUDA support in general label May 28, 2019

LeviViana changed the title ~~Enabling empty tensors to minall and maxall~~ Throwing errors for min and max reductions in CUDA tensors May 28, 2019

LeviViana changed the title ~~Throwing errors for min and max reductions in CUDA tensors~~ Throwing errors for min and max reductions in empty CUDA tensors May 28, 2019

ezyang added the open source label Jun 5, 2019

gchanan approved these changes Jun 5, 2019

View reviewed changes

LeviViana added 2 commits June 6, 2019 11:45

Implementing improvements

64e6d32

Small change

eee8eab

LeviViana added 2 commits June 6, 2019 14:40

Merge branch 'master' into empty_max

8584012

Fixing merge

5cb526f

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 6, 2019

Merge remote-tracking branch 'origin/master' into HEAD

cbcda1c

Trying to fix minall error en equal call

dadef4e

gchanan approved these changes Jun 13, 2019

View reviewed changes

facebook-github-bot reviewed Jun 13, 2019

View reviewed changes

facebook-github-bot closed this in deb2140 Jun 14, 2019

facebook-github-bot added the merged label Jun 14, 2019

LeviViana deleted the empty_max branch June 14, 2019 16:23

Throwing errors for min and max reductions in empty CUDA tensors #19612

Throwing errors for min and max reductions in empty CUDA tensors #19612

Conversation

LeviViana commented Apr 23, 2019

ezyang commented Apr 23, 2019

ezyang left a comment

Choose a reason for hiding this comment

LeviViana commented Apr 24, 2019 • edited

ezyang commented Apr 24, 2019

gchanan left a comment

Choose a reason for hiding this comment

LeviViana commented Apr 25, 2019

ezyang commented Apr 26, 2019

gchanan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviViana May 2, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviViana commented May 2, 2019

ezyang commented May 6, 2019

LeviViana commented May 6, 2019

gchanan commented May 6, 2019

LeviViana commented May 28, 2019

soumith commented May 29, 2019

gchanan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviViana commented Jun 6, 2019

fmassa commented Jun 6, 2019

ezyang commented Jun 6, 2019

LeviViana commented Jun 6, 2019

gchanan commented Jun 10, 2019

fmassa commented Jun 12, 2019

gchanan left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 14, 2019

gchanan commented Jun 14, 2019

LeviViana commented Jun 14, 2019

LeviViana commented Apr 24, 2019 •

edited

LeviViana May 2, 2019 •

edited