Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Intialize Differentiable Optimizer with non-leaf tensort #133

Open
andrearosasco opened this issue Aug 8, 2022 · 1 comment
Open

Intialize Differentiable Optimizer with non-leaf tensort #133

andrearosasco opened this issue Aug 8, 2022 · 1 comment

Comments

@andrearosasco
Copy link

andrearosasco commented Aug 8, 2022

Hello, I used higher a while ago and if I remember correctly you could create a differentiable optimizer starting from a normal one.
Now I need to optimize non-leaf tensors (my model g weights are generated from another model f). The problem with that is that, apparently, I cannot optimize them because they are not leaf tensors.

Technically I could generate new leaf tensors starting from them but then I wouldn't be able to backpropagate back to the model f that generated them.

Does anyone have a solution?
Thanks

@ChrisGeishauser
Copy link

ChrisGeishauser commented Aug 15, 2022

Hi there!

I am not sure it is the exact problem that I faced but I will just drop what I did:

My use case is that I have a model g that should be updated using another model f. After n update steps of g (using f), I want to update f using a meta-update. Like you said, if you only put the model g into the higher context, it will not track the dependence on f and accordingly you can't backpropagate through the n updates to get the meta-gradient. It is thus important that the model you put into the higher context has all parameters for both g and f in it, so that everything is tracked throughout updates.

I have the following toy example below. The model x is your g model and model y is your f model that is used to update x.
I combine the two models into model z that now has parameters of both models in it. I then pass the model z to the higher context. Note that the differential optimizer only optimizes the parameters z.model1.

`import higher
class ComBModel(nn.Module):

def __init__(self, model_1, model_2):
    super().__init__()
    self.model1 = model_1
    self.model2 = model_2

def forward(self, input):
    w = self.model2(input)
    return self.model1(input) * w

def forward_only_1(self, input):
    return self.model1(input)

x = Model(2, 1)
y = Model(2, 1)
z = ComBModel(x, y)
z_copy = deepcopy(z)

print("X parameters", list(x.parameters()))
print("Y parameters", list(y.parameters()))

in_ = torch.Tensor([1, 1])

#print("Y output", y(in_))

for p in z_copy.model1.parameters():
p.data = p.data - y(in_) * in_
print("P DATA NEW", p.data)

z_optimizer = optim.SGD(z.model1.parameters(), lr=1.0)
y_optimizer = optim.SGD(z.model2.parameters(), lr=1.0)

with higher.innerloop_ctx(z, z_optimizer, copy_initial_weights=False) as (fnet, diffopt):

for p in z.model2.parameters():
    print("GRRADS MODEL 2 BEFORE")
    print(p.grad)

for i in range(1):
    loss = fnet(in_)
    diffopt.step(loss)

for p in fnet.model1.parameters():
    print("FNET PARAMETERS", p)

z.model1.load_state_dict(fnet.model1.state_dict())

for p in z.model1.parameters():
    print("MODEL 1 PARAMS", p)

for p in z.model2.parameters():
    print("GRRRAD MODEL 2 AFTER")
    print(p.grad)

meta_loss = fnet.forward_only_1(in_)
meta_loss.backward()

for p in z.model2.parameters():
    print("parameters", p)
    print("grads", p.grad)

y_optimizer.step()
for p in z.model2.parameters():
    print("parameters", p)
    print("grads", p.grad)`

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants