You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Hello, I used higher a while ago and if I remember correctly you could create a differentiable optimizer starting from a normal one.
Now I need to optimize non-leaf tensors (my model g weights are generated from another model f). The problem with that is that, apparently, I cannot optimize them because they are not leaf tensors.
Technically I could generate new leaf tensors starting from them but then I wouldn't be able to backpropagate back to the model f that generated them.
Does anyone have a solution?
Thanks
The text was updated successfully, but these errors were encountered:
I am not sure it is the exact problem that I faced but I will just drop what I did:
My use case is that I have a model g that should be updated using another model f. After n update steps of g (using f), I want to update f using a meta-update. Like you said, if you only put the model g into the higher context, it will not track the dependence on f and accordingly you can't backpropagate through the n updates to get the meta-gradient. It is thus important that the model you put into the higher context has all parameters for both g and f in it, so that everything is tracked throughout updates.
I have the following toy example below. The model x is your g model and model y is your f model that is used to update x.
I combine the two models into model z that now has parameters of both models in it. I then pass the model z to the higher context. Note that the differential optimizer only optimizes the parameters z.model1.
with higher.innerloop_ctx(z, z_optimizer, copy_initial_weights=False) as (fnet, diffopt):
for p in z.model2.parameters():
print("GRRADS MODEL 2 BEFORE")
print(p.grad)
for i in range(1):
loss = fnet(in_)
diffopt.step(loss)
for p in fnet.model1.parameters():
print("FNET PARAMETERS", p)
z.model1.load_state_dict(fnet.model1.state_dict())
for p in z.model1.parameters():
print("MODEL 1 PARAMS", p)
for p in z.model2.parameters():
print("GRRRAD MODEL 2 AFTER")
print(p.grad)
meta_loss = fnet.forward_only_1(in_)
meta_loss.backward()
for p in z.model2.parameters():
print("parameters", p)
print("grads", p.grad)
y_optimizer.step()
for p in z.model2.parameters():
print("parameters", p)
print("grads", p.grad)`
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello, I used higher a while ago and if I remember correctly you could create a differentiable optimizer starting from a normal one.
Now I need to optimize non-leaf tensors (my model g weights are generated from another model f). The problem with that is that, apparently, I cannot optimize them because they are not leaf tensors.
Technically I could generate new leaf tensors starting from them but then I wouldn't be able to backpropagate back to the model f that generated them.
Does anyone have a solution?
Thanks
The text was updated successfully, but these errors were encountered: