Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

scatter_ throwing a RunTimeError #40359

Closed
alfoudari opened this issue Jun 21, 2020 · 9 comments
Closed

scatter_ throwing a RunTimeError #40359

alfoudari opened this issue Jun 21, 2020 · 9 comments

Comments

@alfoudari
Copy link

alfoudari commented Jun 21, 2020

馃悰 Bug

I'm trying to turn a tensor into three different tensors for a CNN with three channels. The following happened while testing.

To Reproduce

Steps to reproduce the behavior:

(Pdb) y
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0.])
(Pdb) (y == 0).nonzero()
tensor([[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8]])
(Pdb) y.scatter_(0, (y == 0).nonzero(), 1)
*** RuntimeError: size.size() == stride.size() INTERNAL ASSERT FAILED at /pytorch/aten/src/ATen/native/Resize.h:96, please report a bug to PyTorch.

Expected behavior

A tensor with 1 at every index.

Environment

$ python collect_env.py
Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.1

Python version: 3.8
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] torch==1.5.0
[conda] Could not collect

Additional context

So squeeze solves it, however, I'm leaving this issue since an internal assert failed:

>>> y = torch.zeros([9])
>>> y.scatter(0, (y == 0).nonzero().squeeze(), 1)
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1.])
@ngimel
Copy link
Collaborator

ngimel commented Jun 22, 2020

cc @nikitaved - is something missing in the shape checks? We should not be getting internal assert.

@nikitaved
Copy link
Collaborator

nikitaved commented Jun 22, 2020

I cannot reproduce on master. This is what I see

In [1]: import torch

In [2]: y = torch.zeros([9])

In [3]: y.scatter_(0, (y == 0).nonzero(), 1)
/home/nik/miniconda3/envs/pytorch-cuda-dev/bin/ipython:1: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  ../torch/csrc/utils/python_arg_parser.cpp:766.)
  #!/home/nik/miniconda3/envs/pytorch-cuda-dev/bin/python
Out[3]: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1.])
Collecting environment information...
PyTorch version: 1.6.0a0+04cc47d
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (crosstool-NG 1.23.0.450-d54ae) 7.3.0
CMake version: version 3.16.3

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: 
GPU 0: TITAN RTX
GPU 1: TITAN RTX

Nvidia driver version: 440.33.01
cuDNN version: /usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn.so.7

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.6.0a0+04cc47d
[conda] magma-cuda101             2.5.1                         1    pytorch
[conda] mkl                       2019.4                      243  
[conda] mkl-include               2019.5                      281    conda-forge
[conda] numpy                     1.18.1           py36h95a1406_0    conda-forge
[conda] torch                     1.6.0a0+04cc47d           dev_0    <develop>

@zou3519
Copy link
Contributor

zou3519 commented Jun 23, 2020

I can confirm this bug exists on 1.5.0 and 1.5.1 but appears to be fixed in master. I am closing this issue because it has been fixed on master.

@zou3519 zou3519 closed this as completed Jun 23, 2020
@ngimel
Copy link
Collaborator

ngimel commented Jun 23, 2020

There's still something wrong with the shape checking it seems, the documentation says that the number of dimensions should be the same for src and idx, yet in this case idx has 2 dimensions (as a result of .nonzero(0), src is 1-d, and yet somehow it still works. @nikitaved can you please double check it?

@nikitaved
Copy link
Collaborator

nikitaved commented Jun 23, 2020

@ngimel, it works in this case because the shape check iterates over self.dim. So, yeah, it requires a fix, I guess.
Shall we actually allow index/src to have one dimension more than self.dim if their sizes are 1?
But then the case self.dim == 1, src.dim == index.dim == 2 with src.dim(0) == index.dim(0) == 1 should be allowed as well I think. So, the scatter/gather seem to process only the first self.dim dimensions of index and src.

@ngimel
Copy link
Collaborator

ngimel commented Jun 23, 2020

I'm not sure what exactly we should allow, but in any case it should be documented, and it also should be more or less general. E.g. in the cases you describe, why stop at one extra dimension and not allow [N, 1, 1]? Also, can singleton dimensions be appended only, or appended and prepended? Seems easier to me to just disallow all questionable cases, with good error messages :-)

@nikitaved
Copy link
Collaborator

nikitaved commented Jun 23, 2020

OK, it is much easier to fix the dim to be exactly the same for all inputs, otherwise we could allow broadcasting if needed, but this will change the documentation. All right, just a simple fix will do :)

@ngimel
Copy link
Collaborator

ngimel commented Jun 23, 2020

I'm a bit wary about allowing general broadcasting - would require some thought to get right. E.g. for in-place scatter self cannot be broadcasted, so values/indices should be broadcastable to self, right? For gather it's probably enough to require indices and self to be broadcastable to a common shape. Do you remember if there were enhancement requests to enable broadcasting for scatter/gather ops? In any case, imo it makes sense for now to bring behavior in agreement with the documentation, and discuss broadcasting separately.

@nikitaved
Copy link
Collaborator

No, I do not remember the broadcasting requests, but I do remember expectations that index/src, for example, do broadcast over self in scatter. Sure, I will make it as the documentation says.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants