Skip to content

🚨 T5Gemma2 model structure#43633

Merged
zucchini-nlp merged 14 commits intohuggingface:mainfrom
zucchini-nlp:attn-impl-resursive-setter
Feb 4, 2026
Merged

🚨 T5Gemma2 model structure#43633
zucchini-nlp merged 14 commits intohuggingface:mainfrom
zucchini-nlp:attn-impl-resursive-setter

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Jan 30, 2026

What does this PR do?

Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.__init__. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code

Also deleted __setattr__, not sure what is the reason behind having them. Composite configs usually don't need to force-set the same attr in all subconfigs magically

Comment on lines 807 to 811
# Set attn implementation manually because `text-config` is never passed to `super()`
self.text_config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
self.text_config._attn_implementation, is_init_check=True
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomaarsen here is why the attn implementation was not being set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure: this is not also necessary for the vision_config, right?
Will review in more details next week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, the vision config is used a few lines above to init a PreTrainedModel and thus it is passed to PreTrainedModel.__init__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, could we refactor with the conversion mapping / checkpoint conversion mapping? Iiuc, if we refactor the text specific things into its own module then we won't have this issue

I think it needs conversion mapping because you also want to use the encoder as standalone model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be a huge change though, let me see

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ran some more extensive tests, this seems to work for me now. The attn_implementation is set automatically and my ST training works as expected.
This PR should supersede parts of #43559 now, which can be closed once this is merged.

@zucchini-nlp
Copy link
Member Author

Will finish up in a while and ask for review

@tomaarsen
Copy link
Member

Somewhat related: If I train a T5GemmaEncoder, I end up with a repository like: https://huggingface.co/tomaarsen/t5gemma2-270m-gooaq-cmnrl/tree/main
Note that the model_type is t5gemma2_encoder, which I can't load with AutoConfig like other architectures. Perhaps I should still expand on #43559 to turn t5gemma2_encoder into an architecture with AutoConfig and AutoModel support? Otherwise I still can't conveniently load https://huggingface.co/tomaarsen/t5gemma2-270m-gooaq-cmnrl.

  • Tom Aarsen

@zucchini-nlp
Copy link
Member Author

@tomaarsen yes, it's expected that AutoConfig will not work on the encoder-part. If it's needed for ST, we need the other PR you had as well. I was assuming ST will load directly with T5GemmaEncoderConfig in a similar fashion to old T5 family

@tomaarsen
Copy link
Member

If it's needed for ST, we need the other PR you had as well.

Agreed, will re-add it.

I was assuming ST will load directly with T5GemmaEncoderConfig in a similar fashion to old T5 family

I will, but the old T5 family is also loaded with:
load AutoConfig -> Check config type -> Check for edge cases (T5, MT5, T5Gemma, T5Gemma2), otherwise AutoModel

And then I need to be able to use

from transformers import AutoConfig

config = AutoConfig.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
  • Tom Aarsen

@zucchini-nlp
Copy link
Member Author

Failing test is flaky, ready for review

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config changes are good!

I just think we could maybe convert the model instead via conversion_mapping or similar? This is a quick and dirty workaround so would be pro making this properly converted into its own encoder text module if possible

Comment on lines 807 to 811
# Set attn implementation manually because `text-config` is never passed to `super()`
self.text_config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
self.text_config._attn_implementation, is_init_check=True
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, could we refactor with the conversion mapping / checkpoint conversion mapping? Iiuc, if we refactor the text specific things into its own module then we won't have this issue

I think it needs conversion mapping because you also want to use the encoder as standalone model

@zucchini-nlp
Copy link
Member Author

run-slow: t5gemma2

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/t5gemma2"]
quantizations: []

@zucchini-nlp
Copy link
Member Author

run-slow: t5gemma2

@zucchini-nlp zucchini-nlp changed the title T5Gemma2 🚨 T5Gemma2 Feb 3, 2026
@zucchini-nlp zucchini-nlp changed the title 🚨 T5Gemma2 🚨 T5Gemma2 model structure Feb 3, 2026
@zucchini-nlp
Copy link
Member Author

run-slow: t5gemma2

@zucchini-nlp zucchini-nlp requested a review from vasqu February 3, 2026 13:01
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 705fc028 merge commit
PR e98f4bc0 branch commit
main b6a202f8 base commit

⚠️ No test being reported (jobs are skipped or cancelled)!

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

@zucchini-nlp
Copy link
Member Author

run-slow: t5gemma2

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/t5gemma2"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3aa9ea02 merge commit
PR 958df6e0 branch commit
main affcf459 base commit

✅ No failing test specific to this PR 🎉 👏 !

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot. Much better this way! Just one nit on the test to maybe add cleanup on setup as well

@zucchini-nlp
Copy link
Member Author

Checked with rebase, everything is still working and tests are passing. Will merge after CI turns green

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: t5gemma, t5gemma2

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) February 4, 2026 14:16
@zucchini-nlp zucchini-nlp merged commit d75266f into huggingface:main Feb 4, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants