Grad is None after using view #19778

sdaulton · 2019-04-25T21:53:28Z

🐛 Bug

After initializing a tensor with requires_grad=True, applying a view, summing, and calling backward, the gradient is None. This is not the case if the tensor is initialized using the dimensions specified in the view.

To Reproduce

import torch

# test without view
X = torch.tensor([[[0.25]],[[ 0.75]]],requires_grad=True,)
print(f"X.shape: {X.shape}")
X.sum().backward()
print(f"X.grad: {X.grad}")

# test with view
X_view = torch.tensor([0.25, 0.75], requires_grad=True,).view(2, 1, 1)
print(f"X_view.shape: {X_view.shape}")
X_view.sum().backward()
print(f"X_view.grad: {X_view.grad}")
print(f"X_view.grad is None: {X_view.grad is None}")

Output

X.shape: torch.Size([2, 1, 1])
X.grad: tensor([[[1.]],

        [[1.]]])
X_view.shape: torch.Size([2, 1, 1])
X_view.grad: None
X_view.grad is None: True

Expected behavior

Grad is not None and X.grad is the same as X_view.grad

Environment

Collecting environment information...
PyTorch version: 1.0.0a0
Is debug build: No
CUDA used to build PyTorch: 9.2.88

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
CMake version: Could not collect

Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

sdaulton · 2019-04-25T22:23:58Z

So this isn't a bug per se, but it is definitely a source of confusion. The issue with the above code is that the gradient information is attached to the initial tensor before the view, but not the viewed tensor. Performing the initialization and view operation before assigning the tensor to the variable results in losing the access to the gradient information. Splitting out the view works fine. It would be useful to call this out in the docs (maybe I missed this).

X0 = torch.tensor([0.25, 0.75], requires_grad=True,)
X_view = X0.view(2, 1, 1)
print(f"X_view.shape: {X_view.shape}")
X_view.sum().backward()
print(f"X_view.grad: {X_view.grad}")
print(f"X_view.grad is None: {X_view.grad is None}")
print(f"X0.grad: {X0.grad}")

Output:

X_view.shape: torch.Size([2, 1, 1])
X_view.grad: None
X_view.grad is None: True
X0.grad: tensor([1., 1.])

fmassa · 2019-04-26T08:40:34Z

You analysis is correct: we only retain gradients of leaf variables. Once you do tensor.view(-1), this returns a new tensor which is not a leaf variable anymore (and has a grad_fn).

Where do you think we could improve the documentation to explain this? Maybe in the https://pytorch.org/docs/stable/notes/autograd.html or in https://pytorch.org/docs/stable/notes/faq.html

Also, could you send a PR improving the documentation?

kit1980 · 2019-05-01T23:36:12Z

I think this "only retain gradients of leaf variables" should be in the FAQ.
Just saw a StackOverflow question about this: https://stackoverflow.com/questions/55942423/pytorch-backpropagating-from-sum-of-matrix-elements-to-leaf-variable

…easons (#30531) Summary: Fix #2362 and #19778 To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients. Pull Request resolved: #30531 Differential Revision: D18832767 Pulled By: albanD fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff

…easons (pytorch#30531) Summary: Fix pytorch#2362 and pytorch#19778 To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients. Pull Request resolved: pytorch#30531 Differential Revision: D18832767 Pulled By: albanD fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff

albanD · 2020-02-09T01:50:59Z

We added a warning for this special case. Please re-open if you need more.

fmassa added module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 26, 2019

albanD mentioned this issue Dec 9, 2019

Add useful warnings for t.grad when it won't be populated for known reasons #30531

Closed

albanD closed this as completed Feb 9, 2020

Waino mentioned this issue Feb 5, 2024

Shared embeddings without a shared subword vocabulary Helsinki-NLP/mammoth#50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grad is None after using view #19778

Grad is None after using view #19778

sdaulton commented Apr 25, 2019

sdaulton commented Apr 25, 2019

fmassa commented Apr 26, 2019

kit1980 commented May 1, 2019

albanD commented Feb 9, 2020

Grad is None after using view #19778

Grad is None after using view #19778

Comments

sdaulton commented Apr 25, 2019

🐛 Bug

To Reproduce

Expected behavior

Environment

sdaulton commented Apr 25, 2019

fmassa commented Apr 26, 2019

kit1980 commented May 1, 2019

albanD commented Feb 9, 2020