Skip to content

Fix T5 v1.1 detection#43681

Merged
ArthurZucker merged 11 commits intohuggingface:mainfrom
githubnemo:issue/broken-t5-v1.1-detection
Feb 5, 2026
Merged

Fix T5 v1.1 detection#43681
ArthurZucker merged 11 commits intohuggingface:mainfrom
githubnemo:issue/broken-t5-v1.1-detection

Conversation

@githubnemo
Copy link
Contributor

PR #41541 refactored tie_word_embeddings handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version.

This is resolved by using the correct value for tie_word_embeddings.

Testing:

This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in T5Config.__init__. This was addressed by having a specific get_config_v1_1 method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5).

PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! I guess slow tests are just broken which is why we didn't notice the bug it earlier

@githubnemo githubnemo force-pushed the issue/broken-t5-v1.1-detection branch from 448f5b0 to a44b3c1 Compare February 2, 2026 14:45
@githubnemo githubnemo force-pushed the issue/broken-t5-v1.1-detection branch from a44b3c1 to 54f31a2 Compare February 2, 2026 15:05
@zucchini-nlp zucchini-nlp enabled auto-merge (squash) February 3, 2026 11:14
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: mt5, t5

@ArthurZucker ArthurZucker disabled auto-merge February 5, 2026 11:01
@ArthurZucker ArthurZucker merged commit b8a1c69 into huggingface:main Feb 5, 2026
16 of 25 checks passed
tarekziade pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
tarekziade pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
ArthurZucker pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants