-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jacobians computed by autograd.functional.jacobian with compute_graph sometimes set requires_grad True #46918
Comments
Hi, This is expected as mentioned in the doc: |
I agree that it is expected that the results should not require gradients, but the problem is that in this case they do! The unexpected behavior that the result has require_grad=True is the bug that I am referring to. |
I am not sure what you expect? |
As I wrote above, I expect that J2.requires_grad should be False, but it is True. J3_requires_grad is indeed False, as expected. I just included that to show that the behavior depends on the concrete function used for the jacobian computation. |
I am afraid this is expected as an independent gradient can show up as: "a graph that returns a Tensor full of 0s or None gradient". I do agree that this is not the state we would have in an ideal world but we cannot detect what is the reason for these Tensors to require gradients (side effect of the computation we do or a Tensor that is used internally by your function that requires gradients). So we have to just return them as is. Note that in any case, the gradient will be "correct" (under the definition that no graph / None / Tensor full of 0s are the same thing). |
I'm sorry, but I don't really understand this. None of the input variables requires_grad, but the output of the jacobian computation does require_grad. It's not a leaf variable, so i can't manually set requires_grad=False. But also .detach() is not an option, because this jacobian computation is part of a function that should be differentiable itself. This is pretty annoying to me because the (from my perspective) randomly occurring requires_grad flag leads to huge computation graphs and me having to very carefully think about where to break gradient flow in a BPTT setting. |
The problem here is that you set I am not sure to understand your sentence as you say "None of the input variables requires_grad" but "this jacobian computation is part of a function that should be differentiable itself". If no input requires gradient, then no backward needs to run for that jacobian? So you could just set Does this make it clearer? |
First of all, thanks a lot albanD for taking the time for this. I highly, highly appreciate that! :) I understand what you write about the create_graph flag. I think your second paragraph helps get to the core of this. I am in fact not using backward() at the moment, because I'm not doing parameter learning yet. I am building the function / model that that's supposed to be differentiable. This model includes various Jacobian computations "in it's forward function" (e.g. it involves an extended Kalman filter). I tried to create a minimum example to illustrate the problem:
The problem is that this code becomes extremely slow as the loop rolls out (and with more complex functions in reality). This is because the output of h requires_grad (although it's not a child of any other variable that requires_grad [from the user perspective]). This leads to a large computation graph that spans all the time steps.
|
Thanks for the code sample that makes things much clearer! I think what you want to change is this line: Another approach, if you know that you're in the case where you want to create the graph iif the input you give requires grad (an assumption that we cannot make in general unfortunately :/ ) is to change all your call to jacobian to do def g(x, fun):
J = torch.autograd.functional.jacobian(fun, x, create_graph=x.requires_grad)
return torch.sigmoid(J.mv(x+1)) # some computation involving J
def h(x, fun):
J = torch.autograd.functional.jacobian(fun, x, create_graph=x.requires_grad)
return torch.tanh(J.mv(x*2)) # some computation involving J That will simplify your life quite a bit I think. |
馃悰 Bug
The Jacobians computed by torch.autograd.functional.jacobian sometimes have require_grad=True when compute_graph=True is given, but not always. It depends on the concrete function that is input to the Jacobian computation.
To Reproduce
Steps to reproduce the behavior:
`import torch
def f1(x):
return x
def f2(x):
return x*x
x0 = torch.tensor([4., 2., 2.])
print("x0.requries_grad", x0.requires_grad)
print("--- Returning just x with create graph ---")
J1 = torch.autograd.functional.jacobian(f1, (x0), create_graph=True)
print("J1.requires_grad", J1.requires_grad)
print("--- Returning x*x with create graph ---")
x0 = torch.tensor([4., 2., 2.])
J2 = torch.autograd.functional.jacobian(f2, (x0), create_graph=True)
print("J2.requires_grad", J2.requires_grad)
print("--- Returning x*x without create graph ---")
x0 = torch.tensor([4., 2., 2.])
J3 = torch.autograd.functional.jacobian(f2, (x0))
print("J3.requires_grad", J3.requires_grad)`
The output here is
Expected behavior
In the example above, J2.requires_grad should be False.
Environment
PyTorch version: 1.8.0.dev20201027
Is debug build: True
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.8.0.dev20201027
[pip3] torchvision==0.8.0.dev20201026
[conda] Could not collect
Additional context
cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved
The text was updated successfully, but these errors were encountered: