Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grad is None after using view #19778

Closed
sdaulton opened this issue Apr 25, 2019 · 4 comments
Closed

Grad is None after using view #19778

sdaulton opened this issue Apr 25, 2019 · 4 comments
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@sdaulton
Copy link

🐛 Bug

After initializing a tensor with requires_grad=True, applying a view, summing, and calling backward, the gradient is None. This is not the case if the tensor is initialized using the dimensions specified in the view.

To Reproduce

import torch

# test without view
X = torch.tensor([[[0.25]],[[ 0.75]]],requires_grad=True,)
print(f"X.shape: {X.shape}")
X.sum().backward()
print(f"X.grad: {X.grad}")

# test with view
X_view = torch.tensor([0.25, 0.75], requires_grad=True,).view(2, 1, 1)
print(f"X_view.shape: {X_view.shape}")
X_view.sum().backward()
print(f"X_view.grad: {X_view.grad}")
print(f"X_view.grad is None: {X_view.grad is None}")

Output

X.shape: torch.Size([2, 1, 1])
X.grad: tensor([[[1.]],

        [[1.]]])
X_view.shape: torch.Size([2, 1, 1])
X_view.grad: None
X_view.grad is None: True

Expected behavior

Grad is not None and X.grad is the same as X_view.grad

Environment

Collecting environment information...
PyTorch version: 1.0.0a0
Is debug build: No
CUDA used to build PyTorch: 9.2.88

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
CMake version: Could not collect

Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect

@sdaulton
Copy link
Author

So this isn't a bug per se, but it is definitely a source of confusion. The issue with the above code is that the gradient information is attached to the initial tensor before the view, but not the viewed tensor. Performing the initialization and view operation before assigning the tensor to the variable results in losing the access to the gradient information. Splitting out the view works fine. It would be useful to call this out in the docs (maybe I missed this).

X0 = torch.tensor([0.25, 0.75], requires_grad=True,)
X_view = X0.view(2, 1, 1)
print(f"X_view.shape: {X_view.shape}")
X_view.sum().backward()
print(f"X_view.grad: {X_view.grad}")
print(f"X_view.grad is None: {X_view.grad is None}")
print(f"X0.grad: {X0.grad}")

Output:

X_view.shape: torch.Size([2, 1, 1])
X_view.grad: None
X_view.grad is None: True
X0.grad: tensor([1., 1.])

@fmassa
Copy link
Member

fmassa commented Apr 26, 2019

You analysis is correct: we only retain gradients of leaf variables. Once you do tensor.view(-1), this returns a new tensor which is not a leaf variable anymore (and has a grad_fn).

Where do you think we could improve the documentation to explain this? Maybe in the https://pytorch.org/docs/stable/notes/autograd.html or in https://pytorch.org/docs/stable/notes/faq.html

Also, could you send a PR improving the documentation?

@fmassa fmassa added module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 26, 2019
@kit1980
Copy link
Member

kit1980 commented May 1, 2019

I think this "only retain gradients of leaf variables" should be in the FAQ.
Just saw a StackOverflow question about this: https://stackoverflow.com/questions/55942423/pytorch-backpropagating-from-sum-of-matrix-elements-to-leaf-variable

facebook-github-bot pushed a commit that referenced this issue Dec 11, 2019
…easons (#30531)

Summary:
Fix #2362 and #19778

To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients.
Pull Request resolved: #30531

Differential Revision: D18832767

Pulled By: albanD

fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this issue Jan 30, 2020
…easons (pytorch#30531)

Summary:
Fix pytorch#2362 and pytorch#19778

To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients.
Pull Request resolved: pytorch#30531

Differential Revision: D18832767

Pulled By: albanD

fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff
@albanD
Copy link
Collaborator

albanD commented Feb 9, 2020

We added a warning for this special case. Please re-open if you need more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants