-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.topk has trouble with inf's and nan's #16762
Comments
I think a fix similar to #15886 should be applied to topk (and maybe kthvalue as well). |
cc @umanwizard |
It's not clear to me what the "correct" output is. Should NaN be regarded as smaller than any other element, larger than any other element, or does it depend on the direction? (For |
make it an option maybe? It would also seem weird if the output of |
I might look into this when I think about moving topk to ATen. |
@umanwizard are you still planning to work on this? I think it is more important for topk to be consistent with sort, than it is for it to return "non-NaN"s when there are NaNs in the input. NaN means something was done wrong. |
I completely forgot about this. Thanks for the reminder. Yes I will fix it. |
I think the updated |
@VitalyFedyunin to check if this is fixed. |
I'm also having issues with topk, even with the latest version available on conda. The differences in behaviour between inf and nan handling on cpu and gpu are quite disruptive:
|
Sorry, I am not working on this anymore |
Just for tracking: this is still broken as of |
The original issue is fixed on CPU as of 1.5.0a0+1a589f5 x = torch.ones(10) / torch.zeros(10)
x[0] = torch.zeros(1) / torch.zeros(1)
x[3] = 3
print(x)
print(x.topk(1, largest=True))
print(x.topk(1, largest=False)) outputs:
This is consistent with In [6]: a = torch.tensor([1,2,float('nan'),3,float('inf'),4])
In [7]: a
Out[7]: tensor([1., 2., nan, 3., inf, 4.])
In [8]: b = a.clone().cuda()
In [9]: b
Out[9]: tensor([1., 2., nan, 3., inf, 4.], device='cuda:0')
In [10]: a.sort()
Out[10]:
torch.return_types.sort(
values=tensor([1., 2., 3., 4., inf, nan]),
indices=tensor([0, 1, 3, 5, 4, 2]))
In [11]: b.sort()
Out[11]:
torch.return_types.sort(
values=tensor([1., 2., 3., 4., inf, nan], device='cuda:0'),
indices=tensor([0, 1, 3, 5, 4, 2], device='cuda:0')) This also seems to be the desired default behaviour because it's consistent with numpy, as per #15886 The remaining two issues are:
|
CUDA topk also handles In [4]: a = torch.tensor([1,2,float('nan'),3,float('inf'),4])
In [5]: a.topk(4)
Out[5]:
torch.return_types.topk(
values=tensor([nan, inf, 4., 3.]),
indices=tensor([2, 4, 5, 3]))
In [6]: a.cuda().topk(4)
Out[6]:
torch.return_types.topk(
values=tensor([nan, inf, 4., 3.], device='cuda:0'),
indices=tensor([2, 4, 5, 3], device='cuda:0')) |
Marking this as not triaged because it is high priority and nobody is assigned to work on it. |
@cpuhrsch triaged is for marking priority and modules, not assigning work. |
To clarify, I thought this issue fell through the cracks, because some follow up work wasn't done and wanted to resurface it during our triage review. The last few comments seem to indicate it might be resolved, but it's also still marked high priority, so I thought it was worth bringing up again. |
Comments above say that the issue is fixed, leaving open for now to double check if there are tests. |
status of daniyar-niantic's script:
|
So hooray, CPU and CUDA are consistent now (as peterbell10 says), so we just need to make sure this is actually tested. |
which it is in peterbell10's script ,so I'm gonna close this issue |
In master, the code:
outputs:
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @heitorschueroff
The text was updated successfully, but these errors were encountered: