Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. #44023

Closed
curehabit opened this issue Sep 2, 2020 · 7 comments
Labels
hackathon module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@curehabit
Copy link

curehabit commented Sep 2, 2020

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. follow the mnist tutorial (https://pytorch-lightning.readthedocs.io/en/stable/new-project.html)
  2. use Trainer(tpu_cores=1)
  3. run
Traceback (most recent call last):
  File "plmnist.py", line 80, in <module>
    trainer.fit(model, train_loader, val_loader)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
    result = fn(self, *args, **kwargs)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py",line 1078, in fit
    self.accelerator_backend.train(model)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/accelerators/tpu_backend.py", line 87, in train
    start_method=self.start_method
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 284, in spawn
    return _run_direct(fn, args, nprocs, join, daemon, start_method)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 245, in _run_direct
    fn(0, *args)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/accelerators/tpu_backend.py", line 112, in tpu_train_in_process
    results = trainer.run_pretrain_routine(model)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py",line 1239, in run_pretrain_routine
    self.train()
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 394, in train
    self.run_training_epoch()
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 491, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 844, in run_training_batch
    self.hiddens
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 1049, in optimizer_closure
    training_step_output_for_epoch_end = copy(training_step_output)
  File "/***/anaconda3/envs/turing/lib/python3.7/copy.py", line 88, in copy
    return copier(x)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/pytorch_lightning/core/step_result.py", line 302, in __copy__
    newone[k] = copy(v)
  File "/***/anaconda3/envs/turing/lib/python3.7/copy.py", line 96, in copy
    rv = reductor(4)
  File "/***/anaconda3/envs/turing/lib/python3.7/site-packages/torch/tensor.py", line 87, in __reduce_ex__
    args = (self.cpu().numpy(),
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

Expected behavior

Environment

Environment
CUDA:

  • GPU:
  • available: False
  • version: None
    Packages:
  • numpy: 1.19.0
  • pyTorch_debug: False
  • pyTorch_version: 1.6.0.dev20200622
  • pytorch-lightning: 0.9.0
  • tensorboard: 2.2.0
  • tqdm: 4.48.2
    System:
  • OS: Linux
  • architecture:
  • 64bit
  • processor:
  • python: 3.7.7
  • version: Matrix multiplication operator #1 SMP Debian 4.14.81.bm.15 Sun Sep 8 05:02:31 UTC 2019

PyTorch Version (e.g., 1.0): 1.6
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.7.8
CUDA/cuDNN version: None
GPU models and configuration: None
Any other relevant information: torch_xla:1.6.0

Additional context

/pytorch_lightning/trainer/training_loop.py(1049)optimizer_closure()

1044
1045                # if the user decides to finally reduce things in epoch_end, save raw output without graphs
1046                if isinstance(training_step_output_for_epoch_end, torch.Tensor):
1047                    training_step_output_for_epoch_end = training_step_output_for_epoch_end.detach()
1048                elif is_result_obj:
1049B->              training_step_output_for_epoch_end = copy(training_step_output) ###<- there should be detach before copy
1050                    training_step_output_for_epoch_end.detach()
1051                else:
1052                    training_step_output_for_epoch_end = recursive_detach(training_step_output_for_epoch_end)
1053

cc @brianjo @mruberry @jlin27

@smessmer smessmer added module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Sep 2, 2020
@mariosasko
Copy link
Contributor

The error is pretty self-explanatory. You can't call .numpy() on a tensor if that tensor is part of the computation graph. You first have to detach it from the graph and this will return a new tensor that shares the same underlying storage but doesn't track gradients (requires_grad is False). Then you can call .numpy() safely. So just replace tensor.numpy() with tensor.detach().numpy().

If this doesn't work, please open an issue in the PytorchLightning repo.

@fe1w0
Copy link

fe1w0 commented Jun 14, 2021

The error is pretty self-explanatory. You can't call .numpy() on a tensor if that tensor is part of the computation graph. You first have to detach it from the graph and this will return a new tensor that shares the same underlying storage but doesn't track gradients (requires_grad is False). Then you can call .numpy() safely. So just replace tensor.numpy() with tensor.detach().numpy().

If this doesn't work, please open an issue in the PytorchLightning repo.

My temporary solution is to directly modify the source code according to the error message. At the same time, I don't think this is a good solution. In any case, it did solve my problem.
like:
image

@mariosasko
Copy link
Contributor

@fe1w0 Yes, modifying the source is not the best solution. Can you please provide the minimal reproducible example? I'm not even sure this issue is still relevant as I can't find the relevant code in the PyTorchLightning source. Which version of PyTorchLightning are you using?

@fe1w0
Copy link

fe1w0 commented Jun 21, 2021

@fe1w0 Yes, modifying the source is not the best solution. Can you please provide the minimal reproducible example? I'm not even sure this issue is still relevant as I can't find the relevant code in the PyTorchLightning source. Which version of PyTorchLightning are you using?

I'm very sorry😵‍💫, but I haven't responded to the message until now. I think this is not a pytorch problem, it should be related to my tenserflow environment. When I reconfigure tenserflow, after
conda install pytorch, so far no problems have been encountered. Thank you very much for your attention.

This is the reference link for my reinstallation, suitable for apple m1
https://developer.apple.com/metal/tensorflow-plugin/

@mruberry
Copy link
Collaborator

This appears to be an issue with PyTorch Lightning (and possibly only an older version of it). So closing it here.

You might want to open an issue at the PyTorch Lightning Github (https://github.com/PyTorchLightning/pytorch-lightning), @curehabit.

@solarshao1006
Copy link

solarshao1006 commented Apr 22, 2022

When I run this:

#run style transfer
max_iter = 500
show_iter = 50
optimizer = optim.LBFGS([opt_img]);
n_iter=[0]

while n_iter[0] <= max_iter:

    def closure():
        optimizer.zero_grad()
        
        out = vgg(opt_img, loss_layers)
        layer_losses = [weights[a] * loss_fns[a](A, targets[a]) for a,A in enumerate(out)]
        
        loss = sum(layer_losses)
        loss.backward()
        n_iter[0]+=1
        #print loss
        if n_iter[0]%show_iter == (show_iter-1):
            print('Iteration: %d, loss: %f'%(n_iter[0]+1, loss.item()))
#             print([loss_layers[li] + ': ' +  str(l.data[0]) for li,l in enumerate(layer_losses)]) #loss of each layer
        return loss

    optimizer.step(closure)
    
#display result
out_img = postp(opt_img.data[0].cpu().squeeze())
imshow(out_img)
gcf().set_size_inches(10,10)

I got this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-40-f6b88457c654> in <module>
     22         return loss
     23 
---> 24     optimizer.step(closure)
     25 
     26 #display result

~/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

~/opt/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py in step(self, closure)
    309 
    310         # evaluate initial f(x) and df/dx
--> 311         orig_loss = closure()
    312         loss = float(orig_loss)
    313         current_evals = 1

~/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

<ipython-input-40-f6b88457c654> in closure()
     13         layer_losses = [weights[a] * loss_fns[a](A, targets[a]) for a,A in enumerate(out)]
     14         print(layer_losses)
---> 15         loss = sum(layer_losses)
     16         loss.backward()
     17         n_iter[0]+=1

<__array_function__ internals> in sum(*args, **kwargs)

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2258 
   2259     return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
-> 2260                           initial=initial, where=where)
   2261 
   2262 

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     84                 return reduction(axis=axis, out=out, **passkwargs)
     85 
---> 86     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     87 
     88 

~/opt/anaconda3/lib/python3.7/site-packages/torch/tensor.py in __array__(self, dtype)
    628             return handle_torch_function(Tensor.__array__, relevant_args, self, dtype=dtype)
    629         if dtype is None:
--> 630             return self.numpy()
    631         else:
    632             return self.numpy().astype(dtype, copy=False)

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

I believe this error is related to the source code and changing the tensor source code is not the best solution. Any recommendations on what to do?

@mruberry
Copy link
Collaborator

When I run this:

#run style transfer
max_iter = 500
show_iter = 50
optimizer = optim.LBFGS([opt_img]);
n_iter=[0]

while n_iter[0] <= max_iter:

    def closure():
        optimizer.zero_grad()
        
        out = vgg(opt_img, loss_layers)
        layer_losses = [weights[a] * loss_fns[a](A, targets[a]) for a,A in enumerate(out)]
        
        loss = sum(layer_losses)
        loss.backward()
        n_iter[0]+=1
        #print loss
        if n_iter[0]%show_iter == (show_iter-1):
            print('Iteration: %d, loss: %f'%(n_iter[0]+1, loss.item()))
#             print([loss_layers[li] + ': ' +  str(l.data[0]) for li,l in enumerate(layer_losses)]) #loss of each layer
        return loss

    optimizer.step(closure)
    
#display result
out_img = postp(opt_img.data[0].cpu().squeeze())
imshow(out_img)
gcf().set_size_inches(10,10)

I got this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-40-f6b88457c654> in <module>
     22         return loss
     23 
---> 24     optimizer.step(closure)
     25 
     26 #display result

~/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

~/opt/anaconda3/lib/python3.7/site-packages/torch/optim/lbfgs.py in step(self, closure)
    309 
    310         # evaluate initial f(x) and df/dx
--> 311         orig_loss = closure()
    312         loss = float(orig_loss)
    313         current_evals = 1

~/opt/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

<ipython-input-40-f6b88457c654> in closure()
     13         layer_losses = [weights[a] * loss_fns[a](A, targets[a]) for a,A in enumerate(out)]
     14         print(layer_losses)
---> 15         loss = sum(layer_losses)
     16         loss.backward()
     17         n_iter[0]+=1

<__array_function__ internals> in sum(*args, **kwargs)

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2258 
   2259     return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
-> 2260                           initial=initial, where=where)
   2261 
   2262 

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     84                 return reduction(axis=axis, out=out, **passkwargs)
     85 
---> 86     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     87 
     88 

~/opt/anaconda3/lib/python3.7/site-packages/torch/tensor.py in __array__(self, dtype)
    628             return handle_torch_function(Tensor.__array__, relevant_args, self, dtype=dtype)
    629         if dtype is None:
--> 630             return self.numpy()
    631         else:
    632             return self.numpy().astype(dtype, copy=False)

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

I believe this error is related to the source code and changing the tensor source code is not the best solution. Any recommendations on what to do?

It looks like you're calling NumPy's sum on a PyTorch tensor that requires grad, but NumPy doesn't support gradients. You probably want to use torch.sum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hackathon module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants