New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throwing errors for min and max reductions in empty CUDA tensors #19612
Conversation
This needs a test. I would also really appreciate it if you could rewrite the code to use nDimension and/or numel. Here's the relevant comment explaining the distinction:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs test
The tests are not passing for CUDA Tensors. There is some weird behavior going on for integral types: import torch
x = torch.ones(0)
print(x.int().max().item()) # Gives -2147483648
x = x.cuda()
print(x.int().max().item()) # Gives 0 Is this behavior expected? |
Looking at the linked issue, one plausible explanation is that the CUDA float kernel returns -Inf, but the CUDA int kernel returns 0. In which case, that would need to be fixed too. I don't think it necessarily has to be fixed in this PR, but you would have to adjust the tests to expect broken for now (until it is fixed.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my reply to #17750 -- we should have this match the reduction-over-dimension case, which is what NumPy does.
The import torch
import numpy as np
print(np.ones((0, 3, 4)).max(1).shape) # (0, 4)
print(np.ones((0, 3, 4)).max(2).shape) # (0, 3)
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(1))) # (0, 4) (0, 4)
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(2))) # (0, 3) (0, 3)
print(np.ones((0, 3, 4)).max(0).shape) # Raises an identity error
print(*map(lambda x:x.size(), torch.ones((0, 3, 4)).max(0))) # Raises an identity error The tests have been adjusted and are all passing. |
I'll let @gchanan review this, the zero dim reduction logic makes my head hurt XD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't match what the linked comment suggests, which is that these should be error cases.
@@ -37,6 +37,10 @@ static bool _dimreduce_return_trivial_no_ident(Tensor &result, const Tensor &sel | |||
return true; | |||
} | |||
|
|||
if (self.numel() == 0 && self.ndimension() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can't be correct, because a tensor with numel == 0
can't have have 0 dimensions (all 0-dimensional tensors have 1 element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually didn't understand your comment in #17750. I thought you wanted to replicate numpy's behavior, and that's what I did. This condition was added to allow numel == 0
tensors to have dimensions, again, to match numpy's behavior.
@@ -109,8 +109,26 @@ std::tuple<Tensor &,Tensor &> mode_out(Tensor& values, Tensor& indices, | |||
"mode only supports CPU AND CUDA backend, got: ", toString(self.type().backend())); | |||
dim = maybe_wrap_dim(dim, self.dim()); | |||
if (_dimreduce_return_trivial_no_ident(values, self, dim, keepdim, "mode")) { | |||
AT_ASSERT(values.dim() == 0); | |||
indices.resize_({}).fill_(0); | |||
if (self.dim() == 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this is changed -- wasn't this working before for mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was made to handle the new case that arises due to the change in aten/src/ATen/native/ReduceOpsUtils.h
.
@@ -569,7 +569,10 @@ scalar_t THTensor_(minall)(THTensor *tensor) | |||
scalar_t theMin; | |||
scalar_t value; | |||
|
|||
THArgCheck(THTensor_nDimensionLegacyAll(tensor) > 0, 1, "tensor must have one dimension"); | |||
if (THTensor_(nElement)(tensor) == 0){ | |||
return std::numeric_limits<scalar_t>::max(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we weren't going to do this but were going to have the same behavior as the dimensional case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's precisely the opposite of what I was thinking :)
@@ -587,7 +590,10 @@ scalar_t THTensor_(maxall)(THTensor *tensor) | |||
scalar_t theMax; | |||
scalar_t value; | |||
|
|||
THArgCheck(THTensor_nDimensionLegacyAll(tensor) > 0, 1, "tensor must have one dimension"); | |||
if (THTensor_(nElement)(tensor) == 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
"on tensor with no elements because the " | ||
"operation does not have an identity"); | ||
|
||
std::vector<int64_t> sizes = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this; as noted above, this case should be an error.
This is also incorrect in that it doesn't take keepdim
into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this was done to replicate numpy's behavior. Sure, keepdim
isn't taken into account, but my idea was to have a first feedback on the direction I was taking for the PR, and fortunately you just made it clear that basically nothing matches your plans, so, I won't be coding anything until I'm sure I got what you want.
@gchanan, It is pretty clear that I just don't know what I'm supposed to do in this PR. Regarding the reduction operation, I honestly thought that replicating numpy's behavior was the way to go. Regarding the numeric limits, I tried to follow this recommendation. Could you explain exactly what are your plans? For instance, what these statements should return?
|
I can't 100% speak for @gchanan, but I believe the behavior he is requesting is specifically:
So, to do your examples, all of them would error. I think the confusion here is that if you don't add the initial kwarg (which I assume you'd prefer not to do in this PR), the correct thing to do in this case is to error, which means you aren't really adding empty tensor support, you're just making something that used to give garbage raise an error instead. But it's consistent with eventually adding actual support, via the initial kwarg. Does that make sense? |
@ezyang is correct. Here are some relevant test cases (you can ignore the
|
Now |
fyi, @gchanan is on leave till the end of this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go. I also made a suggestion below about how to improve it if you want. Let me know if you want me to merge as is or you want to do the improvement.
@@ -302,6 +302,7 @@ accreal THCTensor_(meanall)(THCState *state, THCTensor *self) | |||
|
|||
scalar_t THCTensor_(minall)(THCState *state, THCTensor *self) { | |||
THCAssertSameGPU(THCTensor_(checkGPU)(state, 1, self)); | |||
THArgCheck(THTensor_nDimensionLegacyAll(self) > 0, 1, "tensor must have one dimension"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good because it makes the CPU and CUDA error messages match up. Let me know if you want me to merge it as (can you fix the merge conflict?) or alternatively, I have a suggestion for improvement here.
This isn't quite the right error message (even in the CPU case which you didn't touch), and it's totally not clear why because this is legacy code -- sorry about that!
But basically, tensors with 0 elements, used to have 0-dimensions, and THTensor_nDimensionLegacyAll
is properly checking that, but:
- we shouldn't be using that API in any new code, we should just check that
self->numel() > 0
- as above, the error message isn't quite correct, because "must have one dimension" is referring to legacy dimension before, but what we really want is a message about identity that matches the case where a dimension is specified, which is "cannot perform reduction function max on tensor with no elements because the operation does not have an identity".
Thanks @gchanan, I implemented the improvement and changed the tests to fit the error message. |
There are some test failures that seem related
|
@pytorchbot rebase this please |
Thanks @fmassa, I guess the error comes from the Natives Tests implemented here, but I didn't find any reduction operation being tested. Could you please help me identify the source of this error ? |
@LeviViana here's the backtrace I get for your error:
so the issue looks like where calls |
@LeviViana to further explain @gchanan comment Note that
And this now raises an error with empty tensors. You need to guard it to avoid calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…612) Summary: Related to pytorch/pytorch#17750. Pull Request resolved: pytorch/pytorch#19612 Differential Revision: D15813649 Pulled By: gchanan fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35
Thanks @LeviViana! |
Thanks @gchanan for the help and review! |
Related to #17750.