How to apply several SGD steps within the ineer loop? #3

davidjimenezphd · 2019-09-20T08:02:03Z

Hi @mari-linhares , thanks for the repo!
We are building on your code to implement a bit more general version of MAML that includes a batch of tasks within the inner loop and several steps of gradient descent wrt the parameters of each task. However, we are stuck in how to add several steps of SGD within your code using tensorflow 2.0. Do you have any idea of how to do that?

Alekxos · 2019-12-02T01:31:09Z

I've also been trying to build off this repo, but have encountered the same issue. It seems that updating the weights manually as done here makes them non-trainable. @davidjimenezphd Have you found a workaround? Without multiple inner loop SGD steps, this repo doesn't actually run the full version of MAML.

davidjimenezphd · 2019-12-02T07:30:49Z

Hi Alekxos. Yes we find a solution based on "watch"-ing some variables in the gradient tape. Give me some time, and I'll try to upload the solution.

shufflebyte · 2019-12-02T10:19:15Z

It's definitely a bug in Tensorflow. We worked around it by doing following:

build copy of meta-network and train a step (inner training), now the weights of the copy are not trainable (N=1)
Here is where the patch begins:
make a new copy of meta model and initialize the new copy
manually set the weights (you need to manually iterate through the layers) of the new copy with the weights from the trained copy, now you can use this copy and train it again (N>1). You need to repeat this for every training step...

This is a bit hacky and needs some extra calculation (for copying and forwarding through the net, but Tensorflow has so many open issues, that we use this as long the bug exists ;-) - and I think it will be there for a while..)

See our Tensorflow issue: tensorflow/tensorflow#34335

llan-ml · 2020-02-21T18:04:07Z

Hi @shufflebyte

This is actually not a tensorflow bug.

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:

                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

HilbertXu · 2020-02-24T09:08:16Z

Hi @llan-ml
I have been stuck in this issue some while.
Surely we can update the parameters of copied model manually, but if we need to add several steps of SGD in our inner loop to update the copied model serveral times, we need to compute the gradients on the copied model. But there is no trainable variables in the copied model, GradientTape cannot compute the gradients.

Actually i tried to directly apply a tf.keras.optimizers.SGD() for updating the fast weights, this can keep the variables in the copied model trainable.

HilbertXu · 2020-02-24T09:21:06Z

Hi @davidjimenezphd

Have you found out how to add batch and serveral SGD steps?

I have ben stuck in this problem some days, I tried to use two tapes to watch the whole batch
process, and use stop_recording() function during the batch process to control it. It seems i can add
several SGD steps to update fast weights several times. But i failed to compute the gradients of the
whole batch, it returns a list of None. Could you please tell me how do you solve this problem.

llan-ml · 2020-02-24T10:12:12Z

Hi @HilbertXu

In the case of multiple inner gradient steps, you need to manually watch the weight tensors (they are already not tf.Variable) and the tape could compute their gradients.

HilbertXu · 2020-02-24T10:38:46Z

Hi llan-ml

Thanks for your help, i will try it later

llan-ml · 2020-02-24T13:31:13Z

Hi @HilbertXu

I wrote a toy MAML-like script, which may be helpful for you. Please let me know if you find that the implementation is correct and works in more practical situations.

HilbertXu · 2020-02-24T14:24:47Z

Hi @llan-ml

It shows that i dont have the access to your files. Could u please help me with this?

Maybe we can chat on wechat or email? My ss server has been blocked so it's hard for me to
access to the colab.

llan-ml · 2020-02-24T15:35:21Z

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

Runist · 2020-07-15T04:40:24Z

Hi @shufflebyte

This is actually not a tensorflow bug.
def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model
In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:
                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

But I also have error, why model.get_weight() return empty list.

        with tf.GradientTape() as support_tape:
            support_tape.watch(model.trainable_variables)
            y_pred = model.forward(x1[i])
            support_loss = compute_loss(y1, y_pred)

        gradients = support_tape.gradient(support_loss, model.trainable_variables)
        # inner_optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        k = 0
        for j in range(len(model.layers)):
            model.layers[j].kernel = tf.subtract(model.layers[j].kernel, tf.multiply(lr_inner, gradients[k]))
            model.layers[j].bias = tf.subtract(model.layers[j].bias, tf.multiply(lr_inner, gradients[k + 1]))
            k += 2
        print(model.get_weights())`

Runist · 2020-07-15T05:17:21Z

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

with tf.GradientTape() as outer_tape:
  copied_model = model
  for _ in range(2):
    with tf.GradientTape(watch_accessed_variables=False) as inner_tape:
      inner_tape.watch(copied_model.inner_weights)
      inner_loss = compute_loss(copied_model, x, y)
    inner_grads = inner_tape.gradient(inner_loss, copied_model.inner_weights)
    # print(inner_grads)
    # print("================")
    copied_model = MetaModel.copy_from(copied_model, inner_grads)
  outer_loss = compute_loss(copied_model, x, y)
outer_grads = outer_tape.gradient(outer_loss, model.inner_weights)
optimizer.apply_gradients(zip(outer_grads, model.inner_weights))

And I try your code, the model and copied_model is the same object.When you update copied_model,it also update model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to apply several SGD steps within the ineer loop? #3

How to apply several SGD steps within the ineer loop? #3

davidjimenezphd commented Sep 20, 2019

Alekxos commented Dec 2, 2019 •

edited

davidjimenezphd commented Dec 2, 2019

shufflebyte commented Dec 2, 2019

llan-ml commented Feb 21, 2020

HilbertXu commented Feb 24, 2020

HilbertXu commented Feb 24, 2020 •

edited

llan-ml commented Feb 24, 2020

HilbertXu commented Feb 24, 2020

llan-ml commented Feb 24, 2020

HilbertXu commented Feb 24, 2020 •

edited

llan-ml commented Feb 24, 2020

Runist commented Jul 15, 2020 •

edited

Runist commented Jul 15, 2020 •

edited

How to apply several SGD steps within the ineer loop? #3

How to apply several SGD steps within the ineer loop? #3

Comments

davidjimenezphd commented Sep 20, 2019

Alekxos commented Dec 2, 2019 • edited

davidjimenezphd commented Dec 2, 2019

shufflebyte commented Dec 2, 2019

llan-ml commented Feb 21, 2020

HilbertXu commented Feb 24, 2020

HilbertXu commented Feb 24, 2020 • edited

llan-ml commented Feb 24, 2020

HilbertXu commented Feb 24, 2020

llan-ml commented Feb 24, 2020

HilbertXu commented Feb 24, 2020 • edited

llan-ml commented Feb 24, 2020

Runist commented Jul 15, 2020 • edited

Runist commented Jul 15, 2020 • edited

Alekxos commented Dec 2, 2019 •

edited

HilbertXu commented Feb 24, 2020 •

edited

HilbertXu commented Feb 24, 2020 •

edited

Runist commented Jul 15, 2020 •

edited

Runist commented Jul 15, 2020 •

edited