Unexpected crash in autograd.grad #36903

el-hult · 2020-04-19T18:20:23Z

🐛 Bug

When running torch.autograd.grad recursively to compute a hessian, the code crashes. It does not crash if i mutiply my variables by 1.

To Reproduce

Steps to reproduce the behavior:

Run Adam Paszkes code that computes hessians at https://gist.github.com/apaszke/226abdf867c4e9d6698bd198f3b45fb7#file-jacobian_hessian-py-L1-L15

import torch 

def jacobian(y, x, create_graph=False):                                                               
    jac = []                                                                                          
    flat_y = y.reshape(-1)                                                                            
    grad_y = torch.zeros_like(flat_y)                                                                 
    for i in range(len(flat_y)):                                                                      
        grad_y[i] = 1.                                                                                
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))                                                           
        grad_y[i] = 0.                                                                                
    return torch.stack(jac).reshape(y.shape + x.shape)                                                
                                                                                                      
def hessian(y, x):                                                                                    
    return jacobian(jacobian(y, x, create_graph=True), x)

Run torch.autograd.set_detect_anomaly(True) to get better error messages
try x = torch.tensor([1.],requires_grad=True); hessian(x * x * 1, x) and see that it returns 2 correctly
try x = torch.tensor([1.],requires_grad=True); hessian(x * x, x) and see that the code crashes

The error message is

Warning: Traceback of forward call that caused the error:
  File "C:\Users\Ludvig\Miniconda3\envs\robustness2019\lib\traceback.py", line 197, in format_stack
    return format_list(extract_stack(f, limit=limit))
 (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:57)
Traceback (most recent call last):
  File "C:\Users\Ludvig\Miniconda3\envs\robustness2019\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-31-789b47ffdeb8>", line 19, in <module>
    x = torch.tensor([1.],requires_grad=True);  hessian(x * x, x)     # breaks
  File "<ipython-input-31-789b47ffdeb8>", line 16, in hessian
    return jacobian(jacobian(y, x, create_graph=True), x)
  File "<ipython-input-31-789b47ffdeb8>", line 10, in jacobian
    grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
  File "C:\Users\Ludvig\Miniconda3\envs\robustness2019\lib\site-packages\torch\autograd\__init__.py", line 157, in grad
    inputs, allow_unused)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck

I cannot troubleshoot this further. It seems to happen somewhere in the C++ code?

Expected behavior

I expect that both my runs (with and without multiplying by 1) returns the same answer.

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): Windows 10
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source):
Python version: ``3.7.5`
CUDA/cuDNN version: cpuonly
GPU models and configuration: n/a
Any other relevant information:

Additional context

Related to this SO thread: https://stackoverflow.com/questions/61308237/cannot-find-in-place-operation-causing-runtimeerror-one-of-the-variables-neede

The text was updated successfully, but these errors were encountered:

albanD · 2020-04-20T16:38:22Z

Hi,

We use github issues only for bugs or feature requests.
Please use the forum to ask questions: https://discuss.pytorch.org/

For your particular issue, this is not related to pytorch.
What happens is that when you do x * x only, then the gradient return in the jacobian computation is the same as the one that is passed as input. And it is modified inplace to compute the full Jacobian. But that value is needed for the double backward computation.
You can fix this by:

Make sure that returned gradient is not the same by making the backward clone it (this is what happens when you multiply by 1)
Fix the jacobian code to not reuse the original Tensor by doing jac.append(grad_x.reshape(x.shape).clone()).

albanD closed this as completed Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected crash in autograd.grad #36903

Unexpected crash in autograd.grad #36903

el-hult commented Apr 19, 2020 •

edited

albanD commented Apr 20, 2020

Unexpected crash in autograd.grad #36903

Unexpected crash in autograd.grad #36903

Comments

el-hult commented Apr 19, 2020 • edited

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

albanD commented Apr 20, 2020

el-hult commented Apr 19, 2020 •

edited