Fix T5 v1.1 detection#43681
Merged
ArthurZucker merged 11 commits intohuggingface:mainfrom Feb 5, 2026
Merged
Conversation
PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5).
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
approved these changes
Feb 2, 2026
448f5b0 to
a44b3c1
Compare
a44b3c1 to
54f31a2
Compare
…o/transformers into issue/broken-t5-v1.1-detection
zucchini-nlp
reviewed
Feb 3, 2026
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: mt5, t5 |
tarekziade
pushed a commit
that referenced
this pull request
Feb 5, 2026
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
tarekziade
pushed a commit
that referenced
this pull request
Feb 5, 2026
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
ArthurZucker
pushed a commit
that referenced
this pull request
Feb 5, 2026
* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #41541 refactored
tie_word_embeddingshandling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version.This is resolved by using the correct value for
tie_word_embeddings.Testing:
This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in
T5Config.__init__. This was addressed by having a specificget_config_v1_1method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5).