Model weights are not de-duplicated on save #18419

mattdangerw · 2023-08-08T22:13:31Z

When a model shares a single variable across many layers, model.weights is deduped. Only one weight will be returned no matter how many times it is referenced.

However when a model is saved, every layer that has a ref to variable will save their own copy. Weights are not deduped on save.

This can lead to a funny flow where things appear to be working, but the saved weights are actually duplicated on disk, and a single variable is assigned multiple times during load. This colab shows the issue ->
https://colab.research.google.com/gist/mattdangerw/2fabb09d53bd19cb6407d219e34155ab/weight-sharing-weirdness.ipynb

The text was updated successfully, but these errors were encountered:

mattdangerw · 2023-08-08T22:18:34Z

This seems like a bug, and is at very least confusing. Unsure what the fix should be, particularly with compat concerns in mind.

fchollet · 2023-08-09T02:26:31Z

The level at which sharing is supported is at the layer instance level. You may reuse layer instances multiple times. However you are not supposed to share a variable instance across multiple independent layer instances. Each variable must be owned by exactly one layer.

What happens if you try to share a layer instance instead?

mattdangerw · 2023-08-09T22:41:15Z

Doing this at the layer level does look like it works in keras-core, thought no fully in tf.keras. Sounds like overall this is "works as intended."

I do worry that there are not guardrails for this though. It's confusing that shared weights "just work" for training, with proper deduplication, and only start misbehaving on save.

Can we just warn if we see duplicate weights on a save call? Is there ever a valid reason for it?

fchollet · 2023-08-10T18:49:21Z

Can we just warn if we see duplicate weights on a save call

How would we detect duplicate weights?

abuelnasr0 · 2023-08-11T03:21:25Z

I have some comments for this issue. There is some cases where the layers can have only one shared weight but each layer has its own other weights. An example of that is TransformerXL model where each decoder layer has its own weights but the relative position bias can be shared between them.
I think saving duplicate weights can be avoided by checking the id of the weight before you save it. if a weight with the same id has been saved before we don't save it. I have implemented it locally and I think it works fine.
may be there is something I am missing about this, but this is a solution that might help.

innat · 2023-08-30T08:36:41Z

Hi @mattdangerw
I've just tried to run the above colab gist.

Considering that scenarios , have you encouontered any impact in model performance by doing model.save and model.save_weights? In my case, reloading the saved model with keras.models.load_model API, the memory gets fill up much higher than manually create and load weight with .load_weights API. Could it be the reason of weights not being deduped in model saving time (model.save)?

innat · 2023-09-06T14:41:42Z

might be related case
keras-team/tf-keras#139

This is a priori now not possible by design in keras 3 after the layer is built. See for instance this issue: keras-team/keras#18419 (comment) where it is advised to embed the layer whose weights we want to share. In our usecase (reproduce a given model by splitting the activations into separate layers but keeping the weights to get synchronized with original model), this is not a solution. We implement this workaround, even though using a private method for that. A feature request has been done on keras 3 repo: keras-team/keras#18821

mattdangerw mentioned this issue Aug 10, 2023

Add support for "untied" embedding weights in language models keras-team/keras-nlp#1201

Merged

innat mentioned this issue Aug 30, 2023

Memory leak using SavedModel format tensorflow/tensorflow#61753

Closed

sachinprasadhs added the type:Bug label Sep 13, 2023

fchollet transferred this issue from keras-team/keras-core Sep 22, 2023

shivance mentioned this issue Oct 13, 2023

MaskedLMHead embeddings keras-team/keras-nlp#1270

Open

This was referenced Nov 23, 2023

[keras3] TrackedList.remove() / Tracker.untrack() bugged #18811

Closed

Sharing weights across layers in keras 3 [feature request] #18821

Open

nhuet mentioned this issue Dec 14, 2023

Share weights when splitting activation from layers airbus/decomon#123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model weights are not de-duplicated on save #18419

Model weights are not de-duplicated on save #18419

mattdangerw commented Aug 8, 2023 •

edited

mattdangerw commented Aug 8, 2023 •

edited

fchollet commented Aug 9, 2023 •

edited

mattdangerw commented Aug 9, 2023 •

edited

fchollet commented Aug 10, 2023

abuelnasr0 commented Aug 11, 2023 •

edited

innat commented Aug 30, 2023

innat commented Sep 6, 2023

Model weights are not de-duplicated on save #18419

Model weights are not de-duplicated on save #18419

Comments

mattdangerw commented Aug 8, 2023 • edited

mattdangerw commented Aug 8, 2023 • edited

fchollet commented Aug 9, 2023 • edited

mattdangerw commented Aug 9, 2023 • edited

fchollet commented Aug 10, 2023

abuelnasr0 commented Aug 11, 2023 • edited

innat commented Aug 30, 2023

innat commented Sep 6, 2023

mattdangerw commented Aug 8, 2023 •

edited

mattdangerw commented Aug 8, 2023 •

edited

fchollet commented Aug 9, 2023 •

edited

mattdangerw commented Aug 9, 2023 •

edited

abuelnasr0 commented Aug 11, 2023 •

edited