fix(camembert): add tie_word_embeddings=True to CamembertConfig#44931
Conversation
In v5, `modeling_utils.get_expanded_tied_weights_keys()` checks `config.tie_word_embeddings` and returns an empty dict (skipping all weight tying) when the attribute is absent or False. `CamembertConfig` was missing `tie_word_embeddings: bool = True`, causing `lm_head.decoder.weight` to be randomly initialized instead of being tied to `roberta.embeddings.word_embeddings.weight`. This produced near-uniform, near-zero logits for fill-mask (and all masked-LM tasks), making the model completely non-functional in v5. Sibling configs `RobertaConfig` and `BertConfig` already declare `tie_word_embeddings: bool = True` — this commit brings CamemBERT in line with them. Fixes huggingface#44671
|
[For maintainers] Suggested jobs to run (before merge) run-slow: camembert |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44931&sha=c35409 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: camembert |
|
This comment contains models: ["models/camembert"] |
|
cc @tarekziade!! This has been forgotten in a precedent refactor for sure. COuld we add the following rule for our modeling format linter: if |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Indeed we need the flag! Thanks!
|
This was forgotten in #41541 @zucchini-nlp! With the linter rule, @tarekziade will be able to determine if we missed it at other locations as well! |
|
What does this PR do?
Fixes a v5 regression where
CamembertForMaskedLM(and all CamemBERT masked-LM tasks) produces near-zero, near-uniform logits, making the model completely non-functional.Root cause
In v5,
modeling_utils.get_expanded_tied_weights_keys()gates all weight tying onconfig.tie_word_embeddings:CamembertConfigwas missingtie_word_embeddings: bool = True, so the method returned{}andlm_head.decoder.weightwas randomly initialized instead of being tied toroberta.embeddings.word_embeddings.weight.Sibling configs
RobertaConfigandBertConfigalready declaretie_word_embeddings: bool = True. CamemBERT is RoBERTa-based and its modeling code already defines_tied_weights_keysmappinglm_head.decoder.weight → roberta.embeddings.word_embeddings.weight— but that mapping was silently ignored.Before fix (transformers v5.3.0)
The LOAD REPORT also showed
lm_head.decoder.weight: MISSING(randomly initialized).After fix
Change
One line added to
CamembertConfig:Fixes #44671
Before submitting
Who can review?
@ArthurZucker @Cyrilvallez