Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.4.0 no longer seems to support backward() with the inputs parameter referencing a sub-module's parameters #233

Closed
cwognum opened this issue Nov 6, 2021 · 14 comments

Comments

@cwognum
Copy link

cwognum commented Nov 6, 2021

I am playing around with the DomainBed repository. I noticed that for the implementation of Fishr, they specifically install version 1.3.0 and I was wondering why.

After a bit of experimentation, it seems that it is no longer possible to use backward(inputs=...) where inputs is a submodule. I adjusted the example from your documentation to replicate the issue:

from torch.nn import CrossEntropyLoss, Flatten, Linear, Sequential

from backpack import backpack, extend
from backpack.extensions import BatchGrad
from backpack.utils.examples import load_one_batch_mnist

X, y = load_one_batch_mnist(batch_size=512)

model = Sequential(Flatten(), Linear(784, 128), Linear(128, 10))  # I added an additional layer here
lossfunc = CrossEntropyLoss()

model = extend(model)
lossfunc = extend(lossfunc)

loss = lossfunc(model(X), y)
with backpack(BatchGrad()):
    loss.backward(inputs=list(model[-1].parameters()))  # I am trying to get the gradient with respect to the last submodule

for name, param in model[-1].named_parameters():  # I only loop over the parameters in the last submodule
    print(name)
    print(".grad.shape:             ", param.grad.shape)
    print(".grad_batch.shape:       ", param.grad_batch.shape)

With backpack-for-pytorch==1.4.0, this given

AttributeError: 'Parameter' object has no attribute 'grad_batch'

With backpack-for-pytorch==1.3.0, this prints the expected output:

weight
.grad.shape:              torch.Size([10, 128])
.grad_batch.shape:        torch.Size([512, 10, 128])
bias
.grad.shape:              torch.Size([10])
.grad_batch.shape:        torch.Size([512, 10])

I tried going through the git history of this repository to identify what changed between these two versions, but I have not managed to pin down the change that caused this. I was wondering whether this is intentional or a bug.

@cwognum cwognum changed the title v1.4.0 no longers support backward with inputs v1.4.0 no longers support backward with inputs specified Nov 6, 2021
@cwognum cwognum changed the title v1.4.0 no longers support backward with inputs specified v1.4.0 no longer seems to support backward() with the inputs parameter specified Nov 6, 2021
@cwognum
Copy link
Author

cwognum commented Nov 6, 2021

Changing list(model[-1].parameters()) to list(model.parameters())[2:] (which effectively is the same thing), does work as expected. So it seems to specifically be caused by referencing a sub-module of the main module.

@cwognum cwognum changed the title v1.4.0 no longer seems to support backward() with the inputs parameter specified v1.4.0 no longer seems to support backward() with the inputs parameter referencing a sub-module's parameters Nov 6, 2021
@f-dangel
Copy link
Owner

f-dangel commented Nov 8, 2021

Hi Cas,

thanks for your detailed description and the code snippet. One main difference between 1.3.0 and 1.4.0 is that we replaced backward_hooks with full_backward_hooks. One explanation for the behavior you observe is that, somehow, the full_backward_hook is not triggered with list(model[-1]).parameters()), but with list(model.parameters()[2:]).

  • Can you turn on debug=True in the calls to extend and backpack and post the output here? This should reveal which hooks are triggered.
  • Also, I'd be interested about the PyTorch version you're using

Best,
Felix

@f-dangel
Copy link
Owner

f-dangel commented Nov 8, 2021

As BackPACK was originally designed to work with loss.backward() without any arguments, you can try circumventing your issue by setting requires_grad=False for all parameters except those you are interested in, then running loss.backward() without specifying inputs=....

@cwognum
Copy link
Author

cwognum commented Nov 8, 2021

Hi Felix,

Thank you for the quick response and the proposed workaround.

First of all: I actually think I made an error somewhere when I tried changing list(model[-1].parameters()) to list(model.parameters())[2:]. I can no longer reproduce this discrepancy. With version 1.4.0 both give the same AttributeError for me. With regards to your questions:

I am using torch==1.10.0

With backpack-for-pytorch==1.3.0, the debug information is:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7ffa5cb79070> on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7ffa5cb79070> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7ffa5cb79070> on Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)

With backpack-for-pytorch==1.4.0, the debug information is:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fb880926dc0> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()

The hooks do not seem to be called at all for the module in this case. If I change list(model[-1].parameters()) to simply list(model.parameters()), it gives:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f4cb80b0df0> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f4cb80b0df0> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension hook on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f4cb80b0df0> on Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Running extension hook on Linear(in_features=784, out_features=128, bias=True)

You specified that BackPACK was originally designed to work with loss.backward() without any arguments. Could it be that when extending the model, the first level of recursion gets "special treatment"? Any change between the two versions that would explain this behavior? And would you argue that this is expected behavior?

@f-dangel
Copy link
Owner

f-dangel commented Nov 9, 2021

Hi,
thanks for your clarifications.

Could it be that when extending the model, the first level of recursion gets "special treatment"?

There's no special treatment of the first hierarchy when extending a model. extend is called recursively on the submodules, as is indicated by the DEBUG messages you posted.

Any change between the two versions that would explain this behavior?

From the DEBUG messages, I still believe the different behavior results from full_backward_hook (1.4.0) versus backward_hook (1.3.0). I don't know how to further boil down the cause, but maybe backward works differently when inputs=... is specified.

I would recommend to try the above workaround. Let me know if it works.

@ngonthier
Copy link

The above workaround doesn't seem to work.

@f-dangel
Copy link
Owner

Hi @ngonthier,

can you describe in more detail how/why the workaround does not seem to work?

@ngonthier
Copy link

ngonthier commented Jan 4, 2022

Hi,
Even if I set requires_grad=False for all parameters except the one that I are interested in (namely Var1) and then run loss.backward(). The gradient will be computed for all the parameters of the model and not only for Var1.
I am using the version 1.4.0

@f-dangel
Copy link
Owner

f-dangel commented Jan 6, 2022

Hi,

that indeed sounds like unintended behavior.
Could you provide a minimum working example that reproduces this issue?

@cwognum
Copy link
Author

cwognum commented Jan 13, 2022

Hi @ngonthier and @f-dangel,

Sorry for not replying any sooner. I believe @ngonthier his observation is correct. See the minimal working example below:

from torch.nn import CrossEntropyLoss, Flatten, Linear, Sequential

from backpack import backpack, extend
from backpack.extensions import BatchGrad
from backpack.utils.examples import load_one_batch_mnist

X, y = load_one_batch_mnist(batch_size=512)

l1 = Linear(784, 128)
l1.requires_grad = False
l2 = Linear(128, 10)

model = Sequential(Flatten(), l1, l2)
lossfunc = CrossEntropyLoss()

model = extend(model, debug=True)
lossfunc = extend(lossfunc, debug=True)

loss = lossfunc(model(X), y)
with backpack(BatchGrad(), debug=True):
    loss.backward()

# This should fail for the first layer, right? It doesn't!
for name, param in model.named_parameters():
    print(name)
    print(".grad.shape:             ", param.grad.shape)
    print(".grad_batch.shape:       ", param.grad_batch.shape)

This is the DEBUG output:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fe05e8db0d0> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fe05e8db0d0> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension hook on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fe05e8db0d0> on Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Running extension hook on Linear(in_features=784, out_features=128, bias=True)
1.weight
.grad.shape:              torch.Size([128, 784])
.grad_batch.shape:        torch.Size([512, 128, 784])
1.bias
.grad.shape:              torch.Size([128])
.grad_batch.shape:        torch.Size([512, 128])
2.weight
.grad.shape:              torch.Size([10, 128])
.grad_batch.shape:        torch.Size([512, 10, 128])
2.bias
.grad.shape:              torch.Size([10])
.grad_batch.shape:        torch.Size([512, 10])

@f-dangel
Copy link
Owner

Hi,

thanks for providing a script to reproduce the issue.

I think you're incorrectly setting requires_grad: It's an attribute of the module's parameters, not the module itself (correct me if I'm wrong).

The correct way to disable gradients is

for p in l1.parameters():
    p.requires_grad = False

instead of

l1.requires_grad = False

@cwognum
Copy link
Author

cwognum commented Jan 13, 2022

You're right! I was under the impression that this would recursively disable grad for all parameters... 👀

With the suggested change it does work. I also checked if the output is the same for these two methods in version 1.3.0 once seeded and that is indeed the case:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f070e920f10> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f070e920f10> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension hook on Linear(in_features=128, out_features=10, bias=True)
weight
.grad.shape:              torch.Size([10, 128])
.grad_batch.shape:        torch.Size([512, 10, 128])
bias
.grad.shape:              torch.Size([10])
.grad_batch.shape:        torch.Size([512, 10])

I think that leaves me with a last question before closing the issue: Should there be a more informative error / warning on Backpack's side when using the inputs argument?

@f-dangel
Copy link
Owner

Should there be a more informative error / warning on Backpack's side when using the inputs argument?

I'm not sure how one would detect that backward was called with the inputs argument from within BackPACK. Do you have an idea how to do that?

@cwognum
Copy link
Author

cwognum commented Jan 14, 2022

No I'm not sure. I'm not familiar enough with the backpack codebase I'm afraid... I'll close this issue then. Thank you for thinking along these last couple of weeks. 🙂 👍

@cwognum cwognum closed this as completed Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants