Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to apply several SGD steps within the ineer loop? #3

Open
davidjimenezphd opened this issue Sep 20, 2019 · 13 comments
Open

How to apply several SGD steps within the ineer loop? #3

davidjimenezphd opened this issue Sep 20, 2019 · 13 comments

Comments

@davidjimenezphd
Copy link

Hi @mari-linhares , thanks for the repo!
We are building on your code to implement a bit more general version of MAML that includes a batch of tasks within the inner loop and several steps of gradient descent wrt the parameters of each task. However, we are stuck in how to add several steps of SGD within your code using tensorflow 2.0. Do you have any idea of how to do that?

@Alekxos
Copy link

Alekxos commented Dec 2, 2019

I've also been trying to build off this repo, but have encountered the same issue. It seems that updating the weights manually as done here makes them non-trainable. @davidjimenezphd Have you found a workaround? Without multiple inner loop SGD steps, this repo doesn't actually run the full version of MAML.

@davidjimenezphd
Copy link
Author

Hi Alekxos. Yes we find a solution based on "watch"-ing some variables in the gradient tape. Give me some time, and I'll try to upload the solution.

@shufflebyte
Copy link

It's definitely a bug in Tensorflow. We worked around it by doing following:

  • build copy of meta-network and train a step (inner training), now the weights of the copy are not trainable (N=1)
    Here is where the patch begins:
  • make a new copy of meta model and initialize the new copy
  • manually set the weights (you need to manually iterate through the layers) of the new copy with the weights from the trained copy, now you can use this copy and train it again (N>1). You need to repeat this for every training step...

This is a bit hacky and needs some extra calculation (for copying and forwarding through the net, but Tensorflow has so many open issues, that we use this as long the bug exists ;-) - and I think it will be there for a while..)

See our Tensorflow issue: tensorflow/tensorflow#34335

@llan-ml
Copy link

llan-ml commented Feb 21, 2020

Hi @shufflebyte

This is actually not a tensorflow bug.

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:

                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

@HilbertXu
Copy link

Hi @llan-ml
I have been stuck in this issue some while.
Surely we can update the parameters of copied model manually, but if we need to add several steps of SGD in our inner loop to update the copied model serveral times, we need to compute the gradients on the copied model. But there is no trainable variables in the copied model, GradientTape cannot compute the gradients.

Actually i tried to directly apply a tf.keras.optimizers.SGD() for updating the fast weights, this can keep the variables in the copied model trainable.

@HilbertXu
Copy link

HilbertXu commented Feb 24, 2020

Hi @davidjimenezphd

Have you found out how to add batch and serveral SGD steps?

I have ben stuck in this problem some days, I tried to use two tapes to watch the whole batch
process, and use stop_recording() function during the batch process to control it. It seems i can add
several SGD steps to update fast weights several times. But i failed to compute the gradients of the
whole batch, it returns a list of None. Could you please tell me how do you solve this problem.

@llan-ml
Copy link

llan-ml commented Feb 24, 2020

Hi @HilbertXu

In the case of multiple inner gradient steps, you need to manually watch the weight tensors (they are already not tf.Variable) and the tape could compute their gradients.

@HilbertXu
Copy link

Hi llan-ml

Thanks for your help, i will try it later

@llan-ml
Copy link

llan-ml commented Feb 24, 2020

Hi @HilbertXu

I wrote a toy MAML-like script, which may be helpful for you. Please let me know if you find that the implementation is correct and works in more practical situations.

@HilbertXu
Copy link

HilbertXu commented Feb 24, 2020

Hi @llan-ml

It shows that i dont have the access to your files. Could u please help me with this?

Maybe we can chat on wechat or email? My ss server has been blocked so it's hard for me to
access to the colab.

@llan-ml
Copy link

llan-ml commented Feb 24, 2020

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

@Runist
Copy link

Runist commented Jul 15, 2020

Hi @shufflebyte

This is actually not a tensorflow bug.

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:

                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

But I also have error, why model.get_weight() return empty list.

        with tf.GradientTape() as support_tape:
            support_tape.watch(model.trainable_variables)
            y_pred = model.forward(x1[i])
            support_loss = compute_loss(y1, y_pred)

        gradients = support_tape.gradient(support_loss, model.trainable_variables)
        # inner_optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        k = 0
        for j in range(len(model.layers)):
            model.layers[j].kernel = tf.subtract(model.layers[j].kernel, tf.multiply(lr_inner, gradients[k]))
            model.layers[j].bias = tf.subtract(model.layers[j].bias, tf.multiply(lr_inner, gradients[k + 1]))
            k += 2
        print(model.get_weights())`

@Runist
Copy link

Runist commented Jul 15, 2020

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

with tf.GradientTape() as outer_tape:
  copied_model = model
  for _ in range(2):
    with tf.GradientTape(watch_accessed_variables=False) as inner_tape:
      inner_tape.watch(copied_model.inner_weights)
      inner_loss = compute_loss(copied_model, x, y)
    inner_grads = inner_tape.gradient(inner_loss, copied_model.inner_weights)
    # print(inner_grads)
    # print("================")
    copied_model = MetaModel.copy_from(copied_model, inner_grads)
  outer_loss = compute_loss(copied_model, x, y)
outer_grads = outer_tape.gradient(outer_loss, model.inner_weights)
optimizer.apply_gradients(zip(outer_grads, model.inner_weights))

And I try your code, the model and copied_model is the same object.When you update copied_model,it also update model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants