Skip to content

fix(camembert): add tie_word_embeddings=True to CamembertConfig#44931

Merged
Cyrilvallez merged 1 commit intohuggingface:mainfrom
r266-tech:fix/camembert-tie-word-embeddings
Mar 23, 2026
Merged

fix(camembert): add tie_word_embeddings=True to CamembertConfig#44931
Cyrilvallez merged 1 commit intohuggingface:mainfrom
r266-tech:fix/camembert-tie-word-embeddings

Conversation

@r266-tech
Copy link
Contributor

What does this PR do?

Fixes a v5 regression where CamembertForMaskedLM (and all CamemBERT masked-LM tasks) produces near-zero, near-uniform logits, making the model completely non-functional.

Root cause

In v5, modeling_utils.get_expanded_tied_weights_keys() gates all weight tying on config.tie_word_embeddings:

tie_word_embeddings = getattr(self.config, "tie_word_embeddings", False)
if not tie_word_embeddings:
    return {}   # <-- skips ALL tying when attribute is missing

CamembertConfig was missing tie_word_embeddings: bool = True, so the method returned {} and lm_head.decoder.weight was randomly initialized instead of being tied to roberta.embeddings.word_embeddings.weight.

Sibling configs RobertaConfig and BertConfig already declare tie_word_embeddings: bool = True. CamemBERT is RoBERTa-based and its modeling code already defines _tied_weights_keys mapping lm_head.decoder.weight → roberta.embeddings.word_embeddings.weight — but that mapping was silently ignored.

Before fix (transformers v5.3.0)

>>> pipeline("fill-mask", model="camembert-base")("Le camembert est un délicieux fromage <mask>.")
# score=0.000108  totalité    ← near-uniform logits (broken)
# score=0.000106  Mat
# score=0.000104  Populaire

The LOAD REPORT also showed lm_head.decoder.weight: MISSING (randomly initialized).

After fix

>>> pipeline("fill-mask", model="camembert-base")("Le camembert est un délicieux fromage <mask>.")
# score=0.1819  suisse    ← matches v4 exactly ✅
# score=0.0937  français
# score=0.0495  italien

Change

One line added to CamembertConfig:

tie_word_embeddings: bool = True

Fixes #44671

Before submitting

Who can review?

@ArthurZucker @Cyrilvallez

In v5, `modeling_utils.get_expanded_tied_weights_keys()` checks
`config.tie_word_embeddings` and returns an empty dict (skipping all
weight tying) when the attribute is absent or False.

`CamembertConfig` was missing `tie_word_embeddings: bool = True`,
causing `lm_head.decoder.weight` to be randomly initialized instead of
being tied to `roberta.embeddings.word_embeddings.weight`. This
produced near-uniform, near-zero logits for fill-mask (and all
masked-LM tasks), making the model completely non-functional in v5.

Sibling configs `RobertaConfig` and `BertConfig` already declare
`tie_word_embeddings: bool = True` — this commit brings CamemBERT in
line with them.

Fixes huggingface#44671
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: camembert

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44931&sha=c35409

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member

run-slow: camembert

@github-actions
Copy link
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/camembert"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 7f72b3d2 workflow commit (merge commit)
PR c354098a branch commit (from PR)
main 55cc1a7f base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@Cyrilvallez
Copy link
Member

cc @tarekziade!! This has been forgotten in a precedent refactor for sure. COuld we add the following rule for our modeling format linter: if _tied_weights_keys is present and non-empty in modeling -> Config MUST contain the tie_word_embeddings field?

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we need the flag! Thanks!

@Cyrilvallez
Copy link
Member

This was forgotten in #41541 @zucchini-nlp! With the linter rule, @tarekziade will be able to determine if we missed it at other locations as well!

@Cyrilvallez Cyrilvallez merged commit 9dc8d8a into huggingface:main Mar 23, 2026
21 of 23 checks passed
@tarekziade
Copy link
Collaborator

cc @tarekziade!! This has been forgotten in a precedent refactor for sure. COuld we add the following rule for our modeling format linter: if _tied_weights_keys is present and non-empty in modeling -> Config MUST contain the tie_word_embeddings field?

#44988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CamemBERT produces incorrect masked LM predictions in v5

5 participants