simple v *= v_scale error #46820

linqingfan · 2020-10-25T13:29:37Z

🐛 Bug

To Reproduce

Steps to reproduce the behavior:
v *= v_scale

RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "../torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.

I printed the v value as

tensor([[[[ 1.2014e-02, 1.2068e-02, 9.7856e-03, 8.8714e-03, 8.5734e-03,
2.9168e-03, 2.1199e-05, -2.8829e-03, -8.2607e-03, -1.5328e-02,
-2.5013e-02, -3.1222e-02],
[ 1.0266e-02, 6.2610e-03, 5.2078e-03, 5.4408e-03, 4.0872e-03,
-5.5038e-04, -4.3396e-03, -7.8755e-03, -1.2391e-02, -1.8298e-02,
-2.3656e-02, -2.3437e-02],
[ 5.6735e-03, 4.5379e-03, 3.9397e-03, 4.7426e-03, 1.8061e-03,
-2.3692e-03, -7.8620e-03, -1.2199e-02, -1.4298e-02, -1.7047e-02,
-2.0498e-02, -2.0740e-02],
[ 8.3388e-03, 5.1143e-03, 2.9209e-03, 5.5275e-03, 2.9254e-03,
-2.9371e-04, -4.7937e-03, -1.0673e-02, -1.2947e-02, -1.5372e-02,
-1.9295e-02, -2.1778e-02],
[ 8.7581e-03, 7.9502e-03, 7.1754e-03, 7.7227e-03, 6.9899e-03,
5.8939e-03, 5.0108e-03, -7.9185e-03, -9.1207e-03, -1.3086e-02,
-1.6816e-02, -2.1605e-02],
[ 7.9434e-03, 6.1096e-03, 3.8051e-03, 3.1724e-03, 2.2082e-03,
2.1375e-03, -2.9212e-04, -3.9539e-03, -7.1065e-03, -1.2883e-02,
-1.7969e-02, -2.4725e-02]]], [[[ 6.5310e-02, 5.6975e-02, 4.5132e-02, 5.0792e-02, 5.6704e-02,
6.2313e-02, 6.5036e-02, 6.4221e-02, 5.9827e-02, 6.2116e-02,
6.5767e-02, 7.4780e-02],
[ 5.6775e-02, 5.0288e-02, 5.4376e-02, 6.2040e-02, 6.0906e-02,
5.9159e-02, 5.9787e-02, 6.1453e-02, 5.8250e-02, 5.6893e-02,
6.1770e-02, 6.7854e-02],
[ 4.7995e-02, 4.8208e-02, 5.2908e-02, 5.5330e-02, 6.1871e-02,
5.6681e-02, 5.6556e-02, 6.0526e-02, 5.0920e-02, 5.3691e-02,
5.8827e-02, 6.3531e-02],
[ 3.0758e-02, 4.0276e-02, 4.7759e-02, 4.0098e-02, 4.0556e-02,
3.1987e-02, 3.8289e-02, 4.4429e-02, 4.1669e-02, 4.7020e-02,
5.2110e-02, 5.7387e-02],
[ 1.5687e-02, 2.2496e-02, 2.2303e-02, 3.8733e-03, -7.9459e-03,
-1.0541e-02, -6.2762e-03, 1.3099e-02, 2.7646e-02, 3.7377e-02,
4.5027e-02, 4.2339e-02],
[-1.0777e-02, -1.2646e-02, -1.3509e-02, -1.3324e-02, -1.8688e-02,
-3.3734e-02, -2.8426e-02, -1.3815e-02, 6.6503e-03, 1.6921e-02, 3.4458e-02, 3.6185e-02]]]], device='cuda:0',grad_fn=SplitBackward )

v_scale = 3.2000000000000006

Is very strange that there is error during the mulitplcation of a scaler with a tensor.
I guess the errors occurs at the autograd backward part

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

Collecting environment information...
PyTorch version: 1.8.0a0+37dbc61
Is debug build: True
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.18.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3090
Nvidia driver version: 455.32.00
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.8.0a0
[pip3] torchvision==0.8.0a0+cffac64
[conda] blas 1.0 mkl
[conda] magma-cuda110 2.5.2 1 pytorch
[conda] mkl 2020.2 256
[conda] mkl-include 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.2.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.19.1 py38hbc911f0_0
[conda] numpy-base 1.19.1 py38hfa32c7d_0
[conda] torch 1.8.0a0 pypi_0 pypi
[conda] torchvision 0.8.0a0+cffac64 pypi_0 pypi

Additional context

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved

The text was updated successfully, but these errors were encountered:

gchanan · 2020-10-26T01:21:59Z

do you get a warning like the following?

main:1: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /opt/conda/conda-bld/pytorch_1601363278767/work/torch/csrc/autograd/variable.cpp:480.)

linqingfan · 2020-10-26T10:08:57Z

do you get a warning like the following?

main:1: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /opt/conda/conda-bld/pytorch_1601363278767/work/torch/csrc/autograd/variable.cpp:480.)

Yes, the following is the warning, what could have been the problem?:

UserWarning: Output 1 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at ../torch/csrc/autograd/variable.cpp:491.)
v *= v_scale

linqingfan · 2020-10-26T11:15:29Z

do you get a warning like the following?

main:1: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using unsafe_ version of the function that produced this view or don't modify this view inplace. (Triggered internally at /opt/conda/conda-bld/pytorch_1601363278767/work/torch/csrc/autograd/variable.cpp:480.)

Thanks for the hints. I just changed the following and it works now:

u, v = flow.chunk(2, dim=1)
#u *= u_scale
u1 = u_scale*u
u = u1.clone()
#print(v,v_scale)

#v *= v_scale
v1 = v_scale*v
v = v1.clone()

gchanan · 2020-10-26T17:15:54Z

related: huggingface/transformers#8022

gchanan · 2020-10-26T17:16:13Z

Should we be throwing a better error message?

albanD · 2020-10-26T19:15:07Z

After discussing it on slack, the right solution here is most likely to finish the deprecation cycle for the view/inplace behavior. This will make it have the proper error message and not an internal assert error.
This would consist in:

update it here:

pytorch/torch/csrc/autograd/variable.cpp

Line 491 in b5662ba

TORCH_WARN(msg);
update the messages just above in the same function to remove the mention to the deprecation,
update the tests that check for the deprecation warning to now expect a full error

linqingfan · 2020-10-27T03:32:39Z

Hi, I found that I 've gotten very bad results compared to the original inplace operation in pytorch 1.6.0
After verifying with v1.6.0, I got different results for the following codes:

original code:
u, v = flow.chunk(2, dim=1)
u *= u_scale
v *= v_scale
return torch.cat([u, v], dim=1)

vs new code:
u, v = flow.chunk(2, dim=1)
u1 = u_scale * u
v1 = v_scale * v
return torch.cat([u1, v1], dim=1)

I would expect the same results from these two codes isn't it?

albanD · 2020-10-27T17:45:12Z

Unfortunately, the old version was not doing the right thing and so was silently returning wrong gradients. This is why we are doing this BC-breaking change to prevent people from doing that.

Fixes this issue: pytorch/pytorch#46820 I came across this when I was running the code with pytorch==1.7, getting this error message (and this change would fix the issue): """ /home/iman/projs/NVAE/distributions.py:31: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:491.) self.mu = soft_clamp5(mu) /home/iman/projs/NVAE/distributions.py:32: UserWarning: Output 1 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:491.) log_sigma = soft_clamp5(log_sigma) Traceback (most recent call last): File "train.py", line 415, in <module> init_processes(0, size, main, args) File "train.py", line 281, in init_processes fn(args) File "train.py", line 92, in main train_nelbo, global_step = train(train_queue, model, cnn_optimizer, grad_scalar, global_step, warmup_iters, writer, logging) File "train.py", line 164, in train logits, log_q, log_p, kl_all, kl_diag = model(x) File "/home/iman/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/iman/projs/NVAE/model.py", line 358, in forward dist = Normal(mu_q, log_sig_q) # for the first approx. posterior File "/home/iman/projs/NVAE/distributions.py", line 32, in __init__ log_sigma = soft_clamp5(log_sigma) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "/home/iman/projs/NVAE/distributions.py", line 19, in soft_clamp5 # xx = 5.0*torch.tanh( x / 5.0) # return 5.0*torch.tanh( x / 5.0) return x.div_(5.).tanh_().mul(5.) # 5. * torch.tanh(x / 5.) <--> soft differentiable clamp between [-5, 5] ~~~~~~ <--- HERE RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch. """

linqingfan changed the title ~~simple v *= v error~~ simple v *= v*v_scale error Oct 25, 2020

linqingfan changed the title ~~simple v *= v*v_scale error~~ simple v *= v_scale error Oct 25, 2020

linqingfan closed this as completed Oct 26, 2020

gchanan reopened this Oct 26, 2020

gchanan added the triage review label Oct 26, 2020

albanD added actionable module: autograd Related to torch.autograd, and the autograd engine in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Oct 26, 2020

cheolhwanyoo mentioned this issue Dec 30, 2020

Eval accuracy slightly different anita-hu/MSAF#1

Closed

ImanHosseini mentioned this issue Feb 2, 2021

Fix issue with pytorch NVlabs/NVAE#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple v *= v_scale error #46820

simple v *= v_scale error #46820

linqingfan commented Oct 25, 2020 •

edited by pytorch-probot bot

gchanan commented Oct 26, 2020 •

edited

linqingfan commented Oct 26, 2020

linqingfan commented Oct 26, 2020

gchanan commented Oct 26, 2020

gchanan commented Oct 26, 2020

albanD commented Oct 26, 2020

linqingfan commented Oct 27, 2020 •

edited

albanD commented Oct 27, 2020

simple v *= v_scale error #46820

simple v *= v_scale error #46820

Comments

linqingfan commented Oct 25, 2020 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

gchanan commented Oct 26, 2020 • edited

linqingfan commented Oct 26, 2020

linqingfan commented Oct 26, 2020

gchanan commented Oct 26, 2020

gchanan commented Oct 26, 2020

albanD commented Oct 26, 2020

linqingfan commented Oct 27, 2020 • edited

albanD commented Oct 27, 2020

linqingfan commented Oct 25, 2020 •

edited by pytorch-probot bot

gchanan commented Oct 26, 2020 •

edited

linqingfan commented Oct 27, 2020 •

edited