Deal with weight tying in transformers >=5 #2922

githubnemo · 2025-11-20T10:54:15Z

While we already implemented forward compatibility with the way transformers>=5 handles weight tying, there was an issue with weight tying of trainable tokens wrappers.

Before, we simply got fixed strings of which modules are tied to the embeddings, e.g. "lm_head" - this never changed since it was just a static property of the respective PretrainedModel class. However, with the new way get_tied_weights_keys is implemented, the names of the tied-to-embeddings modules change if they are moved around. So if we wrap the lm_head once in a trainable tokens wrapper, it'll become lm_head.token_adapter.base_layer instead of lm_head. That means that the check to see if we already wrapped the tied layer needs to look at the grand-parent instead of the target layer.

This obviously assumes that we always have a nesting level of two which is true for TrainableTokensWrapper.

While we already implemented forward compatibility with the way transformers>=5 handles weight tying, there was an issue with weight tying of trainable tokens wrappers. Before, we simply got fixed strings of which modules are tied to the embeddings, e.g. `"lm_head"` - this never changed since it was just a static property of the respective PretrainedModel class. However, with the new way `get_tied_weights_keys` is implemented, the names of the tied-to-embeddings modules change if they are moved around. So if we wrap the `lm_head` once in a trainable tokens wrapper, it'll become `lm_head.token_adapter.base_layer` instead of `lm_head`. That means that the check to see if we already wrapped the tied layer needs to look at the grand-parent instead of the target layer. This obviously assumes that we always have a nesting level of two which is true for TrainableTokensWrapper.

githubnemo · 2025-11-20T10:56:43Z

I think ModulesToSaveWrapper is not relevant here since it only handles weight tying explicitly using PeftConfig.modules_to_tie.

HuggingFaceDocBuilderDev · 2025-11-20T10:57:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks, LGTM.

While we already implemented forward compatibility with the way transformers>=5 handles weight tying, there was an issue with weight tying of trainable tokens wrappers. Before, we simply got fixed strings of which modules are tied to the embeddings, e.g. `"lm_head"` - this never changed since it was just a static property of the respective PretrainedModel class. However, with the new way `get_tied_weights_keys` is implemented, the names of the tied-to-embeddings modules change if they are moved around. So if we wrap the `lm_head` once in a trainable tokens wrapper, it'll become `lm_head.token_adapter.base_layer` instead of `lm_head`. That means that the check to see if we already wrapped the tied layer needs to look at the grand-parent instead of the target layer. This obviously assumes that we always have a nesting level of two which is true for TrainableTokensWrapper.

githubnemo requested a review from BenjaminBossan November 20, 2025 10:54

BenjaminBossan approved these changes Nov 20, 2025

View reviewed changes

Merge branch 'main' into issue/weight-tying-changes-5.0

362b2ea

githubnemo merged commit 708d69d into huggingface:main Nov 20, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deal with weight tying in transformers >=5 #2922

Deal with weight tying in transformers >=5 #2922

Uh oh!

githubnemo commented Nov 20, 2025

Uh oh!

githubnemo commented Nov 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 20, 2025

Uh oh!

BenjaminBossan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Deal with weight tying in transformers >=5 #2922

Deal with weight tying in transformers >=5 #2922

Uh oh!

Conversation

githubnemo commented Nov 20, 2025

Uh oh!

githubnemo commented Nov 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 20, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants