Intialize Differentiable Optimizer with non-leaf tensort #133

andrearosasco · 2022-08-08T17:03:27Z

Hello, I used higher a while ago and if I remember correctly you could create a differentiable optimizer starting from a normal one.
Now I need to optimize non-leaf tensors (my model g weights are generated from another model f). The problem with that is that, apparently, I cannot optimize them because they are not leaf tensors.

Technically I could generate new leaf tensors starting from them but then I wouldn't be able to backpropagate back to the model f that generated them.

Does anyone have a solution?
Thanks

ChrisGeishauser · 2022-08-15T10:09:30Z

Hi there!

I am not sure it is the exact problem that I faced but I will just drop what I did:

My use case is that I have a model g that should be updated using another model f. After n update steps of g (using f), I want to update f using a meta-update. Like you said, if you only put the model g into the higher context, it will not track the dependence on f and accordingly you can't backpropagate through the n updates to get the meta-gradient. It is thus important that the model you put into the higher context has all parameters for both g and f in it, so that everything is tracked throughout updates.

I have the following toy example below. The model x is your g model and model y is your f model that is used to update x.
I combine the two models into model z that now has parameters of both models in it. I then pass the model z to the higher context. Note that the differential optimizer only optimizes the parameters z.model1.

`import higher
class ComBModel(nn.Module):

def __init__(self, model_1, model_2):
    super().__init__()
    self.model1 = model_1
    self.model2 = model_2

def forward(self, input):
    w = self.model2(input)
    return self.model1(input) * w

def forward_only_1(self, input):
    return self.model1(input)

x = Model(2, 1)
y = Model(2, 1)
z = ComBModel(x, y)
z_copy = deepcopy(z)

print("X parameters", list(x.parameters()))
print("Y parameters", list(y.parameters()))

in_ = torch.Tensor([1, 1])

#print("Y output", y(in_))

for p in z_copy.model1.parameters():
p.data = p.data - y(in_) * in_
print("P DATA NEW", p.data)

z_optimizer = optim.SGD(z.model1.parameters(), lr=1.0)
y_optimizer = optim.SGD(z.model2.parameters(), lr=1.0)

with higher.innerloop_ctx(z, z_optimizer, copy_initial_weights=False) as (fnet, diffopt):

for p in z.model2.parameters():
    print("GRRADS MODEL 2 BEFORE")
    print(p.grad)

for i in range(1):
    loss = fnet(in_)
    diffopt.step(loss)

for p in fnet.model1.parameters():
    print("FNET PARAMETERS", p)

z.model1.load_state_dict(fnet.model1.state_dict())

for p in z.model1.parameters():
    print("MODEL 1 PARAMS", p)

for p in z.model2.parameters():
    print("GRRRAD MODEL 2 AFTER")
    print(p.grad)

meta_loss = fnet.forward_only_1(in_)
meta_loss.backward()

for p in z.model2.parameters():
    print("parameters", p)
    print("grads", p.grad)

y_optimizer.step()
for p in z.model2.parameters():
    print("parameters", p)
    print("grads", p.grad)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intialize Differentiable Optimizer with non-leaf tensort #133

Intialize Differentiable Optimizer with non-leaf tensort #133

andrearosasco commented Aug 8, 2022 •

edited

ChrisGeishauser commented Aug 15, 2022 •

edited

Intialize Differentiable Optimizer with non-leaf tensort #133

Intialize Differentiable Optimizer with non-leaf tensort #133

Comments

andrearosasco commented Aug 8, 2022 • edited

ChrisGeishauser commented Aug 15, 2022 • edited

andrearosasco commented Aug 8, 2022 •

edited

ChrisGeishauser commented Aug 15, 2022 •

edited