Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected crash in autograd.grad #36903

Closed
el-hult opened this issue Apr 19, 2020 · 1 comment
Closed

Unexpected crash in autograd.grad #36903

el-hult opened this issue Apr 19, 2020 · 1 comment

Comments

@el-hult
Copy link

el-hult commented Apr 19, 2020

馃悰 Bug

When running torch.autograd.grad recursively to compute a hessian, the code crashes. It does not crash if i mutiply my variables by 1.

To Reproduce

Steps to reproduce the behavior:

  1. Run Adam Paszkes code that computes hessians at https://gist.github.com/apaszke/226abdf867c4e9d6698bd198f3b45fb7#file-jacobian_hessian-py-L1-L15
import torch 

def jacobian(y, x, create_graph=False):                                                               
    jac = []                                                                                          
    flat_y = y.reshape(-1)                                                                            
    grad_y = torch.zeros_like(flat_y)                                                                 
    for i in range(len(flat_y)):                                                                      
        grad_y[i] = 1.                                                                                
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))                                                           
        grad_y[i] = 0.                                                                                
    return torch.stack(jac).reshape(y.shape + x.shape)                                                
                                                                                                      
def hessian(y, x):                                                                                    
    return jacobian(jacobian(y, x, create_graph=True), x)  
  1. Run torch.autograd.set_detect_anomaly(True) to get better error messages

  2. try x = torch.tensor([1.],requires_grad=True); hessian(x * x * 1, x) and see that it returns 2 correctly

  3. try x = torch.tensor([1.],requires_grad=True); hessian(x * x, x) and see that the code crashes

The error message is

Warning: Traceback of forward call that caused the error:
  File "C:\Users\Ludvig\Miniconda3\envs\robustness2019\lib\traceback.py", line 197, in format_stack
    return format_list(extract_stack(f, limit=limit))
 (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:57)
Traceback (most recent call last):
  File "C:\Users\Ludvig\Miniconda3\envs\robustness2019\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-31-789b47ffdeb8>", line 19, in <module>
    x = torch.tensor([1.],requires_grad=True);  hessian(x * x, x)     # breaks
  File "<ipython-input-31-789b47ffdeb8>", line 16, in hessian
    return jacobian(jacobian(y, x, create_graph=True), x)
  File "<ipython-input-31-789b47ffdeb8>", line 10, in jacobian
    grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
  File "C:\Users\Ludvig\Miniconda3\envs\robustness2019\lib\site-packages\torch\autograd\__init__.py", line 157, in grad
    inputs, allow_unused)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck

I cannot troubleshoot this further. It seems to happen somewhere in the C++ code?

Expected behavior

I expect that both my runs (with and without multiplying by 1) returns the same answer.

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0): 1.4.0
  • OS (e.g., Linux): Windows 10
  • How you installed PyTorch (conda, pip, source): conda
  • Build command you used (if compiling from source):
  • Python version: ``3.7.5`
  • CUDA/cuDNN version: cpuonly
  • GPU models and configuration: n/a
  • Any other relevant information:

Additional context

Related to this SO thread: https://stackoverflow.com/questions/61308237/cannot-find-in-place-operation-causing-runtimeerror-one-of-the-variables-neede

@albanD
Copy link
Collaborator

albanD commented Apr 20, 2020

Hi,

We use github issues only for bugs or feature requests.
Please use the forum to ask questions: https://discuss.pytorch.org/

For your particular issue, this is not related to pytorch.
What happens is that when you do x * x only, then the gradient return in the jacobian computation is the same as the one that is passed as input. And it is modified inplace to compute the full Jacobian. But that value is needed for the double backward computation.
You can fix this by:

  • Make sure that returned gradient is not the same by making the backward clone it (this is what happens when you multiply by 1)
  • Fix the jacobian code to not reuse the original Tensor by doing jac.append(grad_x.reshape(x.shape).clone()).

@albanD albanD closed this as completed Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants