Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple v *= v_scale error #46820

Open
linqingfan opened this issue Oct 25, 2020 · 8 comments
Open

simple v *= v_scale error #46820

linqingfan opened this issue Oct 25, 2020 · 8 comments
Labels
actionable module: autograd Related to torch.autograd, and the autograd engine in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@linqingfan
Copy link

linqingfan commented Oct 25, 2020

馃悰 Bug

To Reproduce

Steps to reproduce the behavior:
v *= v_scale

RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "../torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.

I printed the v value as

tensor([[[[ 1.2014e-02, 1.2068e-02, 9.7856e-03, 8.8714e-03, 8.5734e-03,
2.9168e-03, 2.1199e-05, -2.8829e-03, -8.2607e-03, -1.5328e-02,
-2.5013e-02, -3.1222e-02],
[ 1.0266e-02, 6.2610e-03, 5.2078e-03, 5.4408e-03, 4.0872e-03,
-5.5038e-04, -4.3396e-03, -7.8755e-03, -1.2391e-02, -1.8298e-02,
-2.3656e-02, -2.3437e-02],
[ 5.6735e-03, 4.5379e-03, 3.9397e-03, 4.7426e-03, 1.8061e-03,
-2.3692e-03, -7.8620e-03, -1.2199e-02, -1.4298e-02, -1.7047e-02,
-2.0498e-02, -2.0740e-02],
[ 8.3388e-03, 5.1143e-03, 2.9209e-03, 5.5275e-03, 2.9254e-03,
-2.9371e-04, -4.7937e-03, -1.0673e-02, -1.2947e-02, -1.5372e-02,
-1.9295e-02, -2.1778e-02],
[ 8.7581e-03, 7.9502e-03, 7.1754e-03, 7.7227e-03, 6.9899e-03,
5.8939e-03, 5.0108e-03, -7.9185e-03, -9.1207e-03, -1.3086e-02,
-1.6816e-02, -2.1605e-02],
[ 7.9434e-03, 6.1096e-03, 3.8051e-03, 3.1724e-03, 2.2082e-03,
2.1375e-03, -2.9212e-04, -3.9539e-03, -7.1065e-03, -1.2883e-02,
-1.7969e-02, -2.4725e-02]]], [[[ 6.5310e-02, 5.6975e-02, 4.5132e-02, 5.0792e-02, 5.6704e-02,
6.2313e-02, 6.5036e-02, 6.4221e-02, 5.9827e-02, 6.2116e-02,
6.5767e-02, 7.4780e-02],
[ 5.6775e-02, 5.0288e-02, 5.4376e-02, 6.2040e-02, 6.0906e-02,
5.9159e-02, 5.9787e-02, 6.1453e-02, 5.8250e-02, 5.6893e-02,
6.1770e-02, 6.7854e-02],
[ 4.7995e-02, 4.8208e-02, 5.2908e-02, 5.5330e-02, 6.1871e-02,
5.6681e-02, 5.6556e-02, 6.0526e-02, 5.0920e-02, 5.3691e-02,
5.8827e-02, 6.3531e-02],
[ 3.0758e-02, 4.0276e-02, 4.7759e-02, 4.0098e-02, 4.0556e-02,
3.1987e-02, 3.8289e-02, 4.4429e-02, 4.1669e-02, 4.7020e-02,
5.2110e-02, 5.7387e-02],
[ 1.5687e-02, 2.2496e-02, 2.2303e-02, 3.8733e-03, -7.9459e-03,
-1.0541e-02, -6.2762e-03, 1.3099e-02, 2.7646e-02, 3.7377e-02,
4.5027e-02, 4.2339e-02],
[-1.0777e-02, -1.2646e-02, -1.3509e-02, -1.3324e-02, -1.8688e-02,
-3.3734e-02, -2.8426e-02, -1.3815e-02, 6.6503e-03, 1.6921e-02, 3.4458e-02, 3.6185e-02]]]], device='cuda:0',grad_fn=SplitBackward )

v_scale = 3.2000000000000006

Is very strange that there is error during the mulitplcation of a scaler with a tensor.
I guess the errors occurs at the autograd backward part

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

Collecting environment information...
PyTorch version: 1.8.0a0+37dbc61
Is debug build: True
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.18.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3090
Nvidia driver version: 455.32.00
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.8.0a0
[pip3] torchvision==0.8.0a0+cffac64
[conda] blas 1.0 mkl
[conda] magma-cuda110 2.5.2 1 pytorch
[conda] mkl 2020.2 256
[conda] mkl-include 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.2.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.19.1 py38hbc911f0_0
[conda] numpy-base 1.19.1 py38hfa32c7d_0
[conda] torch 1.8.0a0 pypi_0 pypi
[conda] torchvision 0.8.0a0+cffac64 pypi_0 pypi

Additional context

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved

@linqingfan linqingfan changed the title simple v *= v error simple v *= v*v_scale error Oct 25, 2020
@linqingfan linqingfan changed the title simple v *= v*v_scale error simple v *= v_scale error Oct 25, 2020
@gchanan
Copy link
Contributor

gchanan commented Oct 26, 2020

do you get a warning like the following?

main:1: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /opt/conda/conda-bld/pytorch_1601363278767/work/torch/csrc/autograd/variable.cpp:480.)

@linqingfan
Copy link
Author

do you get a warning like the following?

main:1: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /opt/conda/conda-bld/pytorch_1601363278767/work/torch/csrc/autograd/variable.cpp:480.)

Yes, the following is the warning, what could have been the problem?:

UserWarning: Output 1 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at ../torch/csrc/autograd/variable.cpp:491.)
v *= v_scale

@linqingfan
Copy link
Author

do you get a warning like the following?

main:1: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /opt/conda/conda-bld/pytorch_1601363278767/work/torch/csrc/autograd/variable.cpp:480.)

Thanks for the hints. I just changed the following and it works now:

u, v = flow.chunk(2, dim=1)
#u *= u_scale
u1 = u_scale*u
u = u1.clone()
#print(v,v_scale)

#v *= v_scale
v1 = v_scale*v
v = v1.clone()

@gchanan
Copy link
Contributor

gchanan commented Oct 26, 2020

related: huggingface/transformers#8022

@gchanan
Copy link
Contributor

gchanan commented Oct 26, 2020

Should we be throwing a better error message?

@albanD
Copy link
Collaborator

albanD commented Oct 26, 2020

After discussing it on slack, the right solution here is most likely to finish the deprecation cycle for the view/inplace behavior. This will make it have the proper error message and not an internal assert error.
This would consist in:

  • update it here:
    TORCH_WARN(msg);
  • update the messages just above in the same function to remove the mention to the deprecation,
  • update the tests that check for the deprecation warning to now expect a full error

@albanD albanD added actionable module: autograd Related to torch.autograd, and the autograd engine in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Oct 26, 2020
@linqingfan
Copy link
Author

linqingfan commented Oct 27, 2020

Hi, I found that I 've gotten very bad results compared to the original inplace operation in pytorch 1.6.0
After verifying with v1.6.0, I got different results for the following codes:

original code:
u, v = flow.chunk(2, dim=1)
u *= u_scale
v *= v_scale
return torch.cat([u, v], dim=1)

vs new code:
u, v = flow.chunk(2, dim=1)
u1 = u_scale * u
v1 = v_scale * v
return torch.cat([u1, v1], dim=1)

I would expect the same results from these two codes isn't it?

@albanD
Copy link
Collaborator

albanD commented Oct 27, 2020

Unfortunately, the old version was not doing the right thing and so was silently returning wrong gradients. This is why we are doing this BC-breaking change to prevent people from doing that.

ImanHosseini added a commit to ImanHosseini/NVAE that referenced this issue Feb 2, 2021
Fixes this issue: pytorch/pytorch#46820
I came across this when I was running the code with pytorch==1.7, getting this error message (and this change would fix the issue):
"""
/home/iman/projs/NVAE/distributions.py:31: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  /pytorch/torch/csrc/autograd/variable.cpp:491.)
  self.mu = soft_clamp5(mu)
/home/iman/projs/NVAE/distributions.py:32: UserWarning: Output 1 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  /pytorch/torch/csrc/autograd/variable.cpp:491.)
  log_sigma = soft_clamp5(log_sigma)
Traceback (most recent call last):
  File "train.py", line 415, in <module>
    init_processes(0, size, main, args)
  File "train.py", line 281, in init_processes
    fn(args)
  File "train.py", line 92, in main
    train_nelbo, global_step = train(train_queue, model, cnn_optimizer, grad_scalar, global_step, warmup_iters, writer, logging)
  File "train.py", line 164, in train
    logits, log_q, log_p, kl_all, kl_diag = model(x)
  File "/home/iman/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/iman/projs/NVAE/model.py", line 358, in forward
    dist = Normal(mu_q, log_sig_q)   # for the first approx. posterior
  File "/home/iman/projs/NVAE/distributions.py", line 32, in __init__
    log_sigma = soft_clamp5(log_sigma)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/iman/projs/NVAE/distributions.py", line 19, in soft_clamp5
    # xx = 5.0*torch.tanh( x / 5.0)
    # return  5.0*torch.tanh( x / 5.0)
    return x.div_(5.).tanh_().mul(5.)    #  5. * torch.tanh(x / 5.) <--> soft differentiable clamp between [-5, 5]
           ~~~~~~ <--- HERE
RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable module: autograd Related to torch.autograd, and the autograd engine in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants