v1.4.0 no longer seems to support `backward()` with the `inputs` parameter referencing a sub-module's parameters #233

cwognum · 2021-11-06T16:20:14Z

I am playing around with the DomainBed repository. I noticed that for the implementation of Fishr, they specifically install version 1.3.0 and I was wondering why.

After a bit of experimentation, it seems that it is no longer possible to use backward(inputs=...) where inputs is a submodule. I adjusted the example from your documentation to replicate the issue:

from torch.nn import CrossEntropyLoss, Flatten, Linear, Sequential

from backpack import backpack, extend
from backpack.extensions import BatchGrad
from backpack.utils.examples import load_one_batch_mnist

X, y = load_one_batch_mnist(batch_size=512)

model = Sequential(Flatten(), Linear(784, 128), Linear(128, 10))  # I added an additional layer here
lossfunc = CrossEntropyLoss()

model = extend(model)
lossfunc = extend(lossfunc)

loss = lossfunc(model(X), y)
with backpack(BatchGrad()):
    loss.backward(inputs=list(model[-1].parameters()))  # I am trying to get the gradient with respect to the last submodule

for name, param in model[-1].named_parameters():  # I only loop over the parameters in the last submodule
    print(name)
    print(".grad.shape:             ", param.grad.shape)
    print(".grad_batch.shape:       ", param.grad_batch.shape)

With backpack-for-pytorch==1.4.0, this given

AttributeError: 'Parameter' object has no attribute 'grad_batch'

With backpack-for-pytorch==1.3.0, this prints the expected output:

weight
.grad.shape:              torch.Size([10, 128])
.grad_batch.shape:        torch.Size([512, 10, 128])
bias
.grad.shape:              torch.Size([10])
.grad_batch.shape:        torch.Size([512, 10])

I tried going through the git history of this repository to identify what changed between these two versions, but I have not managed to pin down the change that caused this. I was wondering whether this is intentional or a bug.

The text was updated successfully, but these errors were encountered:

cwognum · 2021-11-06T16:36:45Z

Changing list(model[-1].parameters()) to list(model.parameters())[2:] (which effectively is the same thing), does work as expected. So it seems to specifically be caused by referencing a sub-module of the main module.

f-dangel · 2021-11-08T10:38:13Z

Hi Cas,

thanks for your detailed description and the code snippet. One main difference between 1.3.0 and 1.4.0 is that we replaced backward_hooks with full_backward_hooks. One explanation for the behavior you observe is that, somehow, the full_backward_hook is not triggered with list(model[-1]).parameters()), but with list(model.parameters()[2:]).

Can you turn on debug=True in the calls to extend and backpack and post the output here? This should reveal which hooks are triggered.
Also, I'd be interested about the PyTorch version you're using

Best,
Felix

f-dangel · 2021-11-08T10:42:33Z

As BackPACK was originally designed to work with loss.backward() without any arguments, you can try circumventing your issue by setting requires_grad=False for all parameters except those you are interested in, then running loss.backward() without specifying inputs=....

cwognum · 2021-11-08T14:56:32Z

Hi Felix,

Thank you for the quick response and the proposed workaround.

First of all: I actually think I made an error somewhere when I tried changing list(model[-1].parameters()) to list(model.parameters())[2:]. I can no longer reproduce this discrepancy. With version 1.4.0 both give the same AttributeError for me. With regards to your questions:

I am using torch==1.10.0

With backpack-for-pytorch==1.3.0, the debug information is:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7ffa5cb79070> on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7ffa5cb79070> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7ffa5cb79070> on Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)

With backpack-for-pytorch==1.4.0, the debug information is:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fb880926dc0> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()

The hooks do not seem to be called at all for the module in this case. If I change list(model[-1].parameters()) to simply list(model.parameters()), it gives:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f4cb80b0df0> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f4cb80b0df0> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension hook on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f4cb80b0df0> on Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Running extension hook on Linear(in_features=784, out_features=128, bias=True)

You specified that BackPACK was originally designed to work with loss.backward() without any arguments. Could it be that when extending the model, the first level of recursion gets "special treatment"? Any change between the two versions that would explain this behavior? And would you argue that this is expected behavior?

f-dangel · 2021-11-09T13:59:48Z

Hi,
thanks for your clarifications.

Could it be that when extending the model, the first level of recursion gets "special treatment"?

There's no special treatment of the first hierarchy when extending a model. extend is called recursively on the submodules, as is indicated by the DEBUG messages you posted.

Any change between the two versions that would explain this behavior?

From the DEBUG messages, I still believe the different behavior results from full_backward_hook (1.4.0) versus backward_hook (1.3.0). I don't know how to further boil down the cause, but maybe backward works differently when inputs=... is specified.

I would recommend to try the above workaround. Let me know if it works.

ngonthier · 2021-12-21T10:10:10Z

The above workaround doesn't seem to work.

f-dangel · 2021-12-23T02:45:58Z

Hi @ngonthier,

can you describe in more detail how/why the workaround does not seem to work?

ngonthier · 2022-01-04T09:33:17Z

Hi,
Even if I set requires_grad=False for all parameters except the one that I are interested in (namely Var1) and then run loss.backward(). The gradient will be computed for all the parameters of the model and not only for Var1.
I am using the version 1.4.0

f-dangel · 2022-01-06T02:56:15Z

Hi,

that indeed sounds like unintended behavior.
Could you provide a minimum working example that reproduces this issue?

cwognum · 2022-01-13T17:09:07Z

Hi @ngonthier and @f-dangel,

Sorry for not replying any sooner. I believe @ngonthier his observation is correct. See the minimal working example below:

from torch.nn import CrossEntropyLoss, Flatten, Linear, Sequential

from backpack import backpack, extend
from backpack.extensions import BatchGrad
from backpack.utils.examples import load_one_batch_mnist

X, y = load_one_batch_mnist(batch_size=512)

l1 = Linear(784, 128)
l1.requires_grad = False
l2 = Linear(128, 10)

model = Sequential(Flatten(), l1, l2)
lossfunc = CrossEntropyLoss()

model = extend(model, debug=True)
lossfunc = extend(lossfunc, debug=True)

loss = lossfunc(model(X), y)
with backpack(BatchGrad(), debug=True):
    loss.backward()

# This should fail for the first layer, right? It doesn't!
for name, param in model.named_parameters():
    print(name)
    print(".grad.shape:             ", param.grad.shape)
    print(".grad_batch.shape:       ", param.grad_batch.shape)

This is the DEBUG output:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fe05e8db0d0> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fe05e8db0d0> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension hook on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7fe05e8db0d0> on Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Running extension hook on Linear(in_features=784, out_features=128, bias=True)
1.weight
.grad.shape:              torch.Size([128, 784])
.grad_batch.shape:        torch.Size([512, 128, 784])
1.bias
.grad.shape:              torch.Size([128])
.grad_batch.shape:        torch.Size([512, 128])
2.weight
.grad.shape:              torch.Size([10, 128])
.grad_batch.shape:        torch.Size([512, 10, 128])
2.bias
.grad.shape:              torch.Size([10])
.grad_batch.shape:        torch.Size([512, 10])

f-dangel · 2022-01-13T20:35:35Z

Hi,

thanks for providing a script to reproduce the issue.

I think you're incorrectly setting requires_grad: It's an attribute of the module's parameters, not the module itself (correct me if I'm wrong).

The correct way to disable gradients is

for p in l1.parameters():
    p.requires_grad = False

instead of

l1.requires_grad = False

cwognum · 2022-01-13T21:26:40Z

You're right! I was under the impression that this would recursively disable grad for all parameters... 👀

With the suggested change it does work. I also checked if the output is the same for these two methods in version 1.3.0 once seeded and that is indeed the case:

[DEBUG] Extending Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=128, bias=True)
  (2): Linear(in_features=128, out_features=10, bias=True)
)
[DEBUG] Extending Flatten(start_dim=1, end_dim=-1)
[DEBUG] Extending Linear(in_features=784, out_features=128, bias=True)
[DEBUG] Extending Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Extending CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f070e920f10> on CrossEntropyLoss()
[DEBUG] Running extension hook on CrossEntropyLoss()
[DEBUG] Running extension <backpack.extensions.firstorder.batch_grad.BatchGrad object at 0x7f070e920f10> on Linear(in_features=128, out_features=10, bias=True)
[DEBUG] Running extension hook on Linear(in_features=128, out_features=10, bias=True)
weight
.grad.shape:              torch.Size([10, 128])
.grad_batch.shape:        torch.Size([512, 10, 128])
bias
.grad.shape:              torch.Size([10])
.grad_batch.shape:        torch.Size([512, 10])

I think that leaves me with a last question before closing the issue: Should there be a more informative error / warning on Backpack's side when using the inputs argument?

f-dangel · 2022-01-14T12:33:34Z

Should there be a more informative error / warning on Backpack's side when using the inputs argument?

I'm not sure how one would detect that backward was called with the inputs argument from within BackPACK. Do you have an idea how to do that?

cwognum · 2022-01-14T20:42:37Z

No I'm not sure. I'm not familiar enough with the backpack codebase I'm afraid... I'll close this issue then. Thank you for thinking along these last couple of weeks. 🙂 👍

cwognum changed the title ~~v1.4.0 no longers support backward with inputs~~ v1.4.0 no longers support backward with inputs specified Nov 6, 2021

cwognum changed the title ~~v1.4.0 no longers support backward with inputs specified~~ v1.4.0 no longer seems to support backward() with the inputs parameter specified Nov 6, 2021

cwognum changed the title ~~v1.4.0 no longer seems to support backward() with the inputs parameter specified~~ v1.4.0 no longer seems to support backward() with the inputs parameter referencing a sub-module's parameters Nov 6, 2021

cwognum closed this as completed Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.4.0 no longer seems to support `backward()` with the `inputs` parameter referencing a sub-module's parameters #233

v1.4.0 no longer seems to support `backward()` with the `inputs` parameter referencing a sub-module's parameters #233

cwognum commented Nov 6, 2021 •

edited

Loading

cwognum commented Nov 6, 2021

f-dangel commented Nov 8, 2021

f-dangel commented Nov 8, 2021

cwognum commented Nov 8, 2021

f-dangel commented Nov 9, 2021

ngonthier commented Dec 21, 2021

f-dangel commented Dec 23, 2021

ngonthier commented Jan 4, 2022 •

edited

Loading

f-dangel commented Jan 6, 2022

cwognum commented Jan 13, 2022

f-dangel commented Jan 13, 2022

cwognum commented Jan 13, 2022 •

edited

Loading

f-dangel commented Jan 14, 2022

cwognum commented Jan 14, 2022

v1.4.0 no longer seems to support backward() with the inputs parameter referencing a sub-module's parameters #233

v1.4.0 no longer seems to support backward() with the inputs parameter referencing a sub-module's parameters #233

Comments

cwognum commented Nov 6, 2021 • edited Loading

cwognum commented Nov 6, 2021

f-dangel commented Nov 8, 2021

f-dangel commented Nov 8, 2021

cwognum commented Nov 8, 2021

f-dangel commented Nov 9, 2021

ngonthier commented Dec 21, 2021

f-dangel commented Dec 23, 2021

ngonthier commented Jan 4, 2022 • edited Loading

f-dangel commented Jan 6, 2022

cwognum commented Jan 13, 2022

f-dangel commented Jan 13, 2022

cwognum commented Jan 13, 2022 • edited Loading

f-dangel commented Jan 14, 2022

cwognum commented Jan 14, 2022

v1.4.0 no longer seems to support `backward()` with the `inputs` parameter referencing a sub-module's parameters #233

v1.4.0 no longer seems to support `backward()` with the `inputs` parameter referencing a sub-module's parameters #233

cwognum commented Nov 6, 2021 •

edited

Loading

ngonthier commented Jan 4, 2022 •

edited

Loading

cwognum commented Jan 13, 2022 •

edited

Loading