Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model weights are not de-duplicated on save #18419

Open
mattdangerw opened this issue Aug 8, 2023 · 7 comments
Open

Model weights are not de-duplicated on save #18419

mattdangerw opened this issue Aug 8, 2023 · 7 comments
Labels

Comments

@mattdangerw
Copy link
Member

mattdangerw commented Aug 8, 2023

When a model shares a single variable across many layers, model.weights is deduped. Only one weight will be returned no matter how many times it is referenced.

However when a model is saved, every layer that has a ref to variable will save their own copy. Weights are not deduped on save.

This can lead to a funny flow where things appear to be working, but the saved weights are actually duplicated on disk, and a single variable is assigned multiple times during load. This colab shows the issue ->
https://colab.research.google.com/gist/mattdangerw/2fabb09d53bd19cb6407d219e34155ab/weight-sharing-weirdness.ipynb

@mattdangerw
Copy link
Member Author

mattdangerw commented Aug 8, 2023

This seems like a bug, and is at very least confusing. Unsure what the fix should be, particularly with compat concerns in mind.

@fchollet
Copy link
Member

fchollet commented Aug 9, 2023

The level at which sharing is supported is at the layer instance level. You may reuse layer instances multiple times. However you are not supposed to share a variable instance across multiple independent layer instances. Each variable must be owned by exactly one layer.

What happens if you try to share a layer instance instead?

@mattdangerw
Copy link
Member Author

mattdangerw commented Aug 9, 2023

Doing this at the layer level does look like it works in keras-core, thought no fully in tf.keras. Sounds like overall this is "works as intended."

I do worry that there are not guardrails for this though. It's confusing that shared weights "just work" for training, with proper deduplication, and only start misbehaving on save.

Can we just warn if we see duplicate weights on a save call? Is there ever a valid reason for it?

@fchollet
Copy link
Member

Can we just warn if we see duplicate weights on a save call

How would we detect duplicate weights?

@abuelnasr0
Copy link
Contributor

abuelnasr0 commented Aug 11, 2023

I have some comments for this issue. There is some cases where the layers can have only one shared weight but each layer has its own other weights. An example of that is TransformerXL model where each decoder layer has its own weights but the relative position bias can be shared between them.
I think saving duplicate weights can be avoided by checking the id of the weight before you save it. if a weight with the same id has been saved before we don't save it. I have implemented it locally and I think it works fine.
may be there is something I am missing about this, but this is a solution that might help.

@innat
Copy link

innat commented Aug 30, 2023

Hi @mattdangerw
I've just tried to run the above colab gist.

Considering that scenarios , have you encouontered any impact in model performance by doing model.save and model.save_weights? In my case, reloading the saved model with keras.models.load_model API, the memory gets fill up much higher than manually create and load weight with .load_weights API. Could it be the reason of weights not being deduped in model saving time (model.save)?

@innat
Copy link

innat commented Sep 6, 2023

might be related case
keras-team/tf-keras#139

@fchollet fchollet transferred this issue from keras-team/keras-core Sep 22, 2023
nhuet added a commit to nhuet/decomon that referenced this issue Dec 14, 2023
This is a priori now not possible by design in keras 3 after the layer
is built.
See for instance this issue: keras-team/keras#18419 (comment)
where it is advised to embed the layer whose weights we want to share.

In our usecase (reproduce a given model by splitting the activations
into separate layers but keeping the weights to get synchronized with
original model), this is not a solution.

We implement this workaround, even though using a private method for
that. A feature request has been done on keras 3 repo:
keras-team/keras#18821
nhuet added a commit to nhuet/decomon that referenced this issue Jan 8, 2024
This is a priori now not possible by design in keras 3 after the layer
is built.
See for instance this issue: keras-team/keras#18419 (comment)
where it is advised to embed the layer whose weights we want to share.

In our usecase (reproduce a given model by splitting the activations
into separate layers but keeping the weights to get synchronized with
original model), this is not a solution.

We implement this workaround, even though using a private method for
that. A feature request has been done on keras 3 repo:
keras-team/keras#18821
ducoffeM pushed a commit to airbus/decomon that referenced this issue Jan 8, 2024
This is a priori now not possible by design in keras 3 after the layer
is built.
See for instance this issue: keras-team/keras#18419 (comment)
where it is advised to embed the layer whose weights we want to share.

In our usecase (reproduce a given model by splitting the activations
into separate layers but keeping the weights to get synchronized with
original model), this is not a solution.

We implement this workaround, even though using a private method for
that. A feature request has been done on keras 3 repo:
keras-team/keras#18821
ducoffeM pushed a commit to ducoffeM/decomon that referenced this issue Jan 17, 2024
This is a priori now not possible by design in keras 3 after the layer
is built.
See for instance this issue: keras-team/keras#18419 (comment)
where it is advised to embed the layer whose weights we want to share.

In our usecase (reproduce a given model by splitting the activations
into separate layers but keeping the weights to get synchronized with
original model), this is not a solution.

We implement this workaround, even though using a private method for
that. A feature request has been done on keras 3 repo:
keras-team/keras#18821
ducoffeM pushed a commit to airbus/decomon that referenced this issue Jan 17, 2024
This is a priori now not possible by design in keras 3 after the layer
is built.
See for instance this issue: keras-team/keras#18419 (comment)
where it is advised to embed the layer whose weights we want to share.

In our usecase (reproduce a given model by splitting the activations
into separate layers but keeping the weights to get synchronized with
original model), this is not a solution.

We implement this workaround, even though using a private method for
that. A feature request has been done on keras 3 repo:
keras-team/keras#18821
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants