[`feat`] Allow loading T5Gemma2Encoder with AutoModel by tomaarsen · Pull Request #43559 · huggingface/transformers

tomaarsen · 2026-01-28T14:41:34Z

What does this PR do?

Allow the encoder of T5Gemma2 to be loaded standalone

Details

This is valuable for Sentence Transformers, which may want to load the encoder only (see huggingface/sentence-transformers#3604). Here, we grab and train the encoder only, resulting in e.g.: https://huggingface.co/tomaarsen/t5gemma2-270m-gooaq-cmnrl

Usage:

from transformers import T5Gemma2Encoder, AutoTokenizer
import torch

model_name = "tomaarsen/t5gemma2-270m-gooaq-cmnrl"
model = T5Gemma2Encoder.from_pretrained(model_name)
processor = AutoTokenizer.from_pretrained(model_name)

queries = [
    "Which planet is known as the Red Planet?",
]
documents = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]

query_inputs = processor(text=queries, truncation=True, padding=True, return_tensors="pt")
document_inputs = processor(text=documents, truncation=True, padding=True, return_tensors="pt")

with torch.no_grad():
    query_embeddings = model(**query_inputs).last_hidden_state[:, 0, :]
    document_embeddings = model(**document_inputs).last_hidden_state[:, 0, :]

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output.last_hidden_state
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

query_embeddings = mean_pooling(model(**query_inputs), query_inputs['attention_mask'])
document_embeddings = mean_pooling(model(**document_inputs), document_inputs['attention_mask'])

similarities = torch.nn.functional.cosine_similarity(
    query_embeddings.unsqueeze(1), document_embeddings.unsqueeze(0), dim=-1
)
print(similarities.tolist())
# [[0.37183186411857605, 0.8092442750930786, 0.6081508994102478, 0.7218592762947083]]
# As expected: The second document is most similar to the query.

I've not added the decoder as I only have weights for the encoder.

P.s., equivalent in Sentence Transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
queries = [
    "Which planet is known as the Red Planet?",
]
documents = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
query_embedding = model.encode_query(queries)
document_embeddings = model.encode_document(documents)

similarities = model.similarity(query_embedding, document_embeddings)
print(similarities)
# tensor([[0.3722, 0.8101, 0.6088, 0.7216]])

This also relies on this PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

cc @Cyrilvallez @zucchini-nlp

P.s. let me know if you'd like to see new tests or docs for this.

Tom Aarsen

This is valuable for Sentence Transformers, which may want to load the encoder only. I've added the decoder only to mirror the changes I need for the encoder.

HuggingFaceDocBuilderDev · 2026-01-28T14:54:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

IIUC we want to be able to load a complete T5Gemma or only its encoder module in ST therefore we can't do the same as in T5 with self._load_t5_module?

In any case, I think there is no strong objection to keep the module private, and we can make it available through the Auto-API. Let's also see if core maintainers agree

tomaarsen · 2026-01-28T16:06:36Z

IIUC we want to be able to load a complete T5Gemma or only its encoder module in ST therefore we can't do the same as in T5 with self._load_t5_module?

Hmm, looks like I might be able to change some things for T5Gemma2 on ST's side. With T5 I can import T5EncoderModel as it's exported in the __all__ :

from transformers import T5EncoderModel

model = T5EncoderModel.from_pretrained("sentence-transformers/gtr-t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>

T5EncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.*"]
model = T5EncoderModel.from_pretrained("google-t5/t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>

If I can import T5Gemma2Encoder from transformers, then perhaps I can load it directly like that without having to update AutoModel. I'll run some tests. It seems that T5Gemma also wasn't nicely supported in ST.

Will send more details when I know what'll work best.

Tom Aarsen

tomaarsen · 2026-01-28T16:36:02Z

Update: I can get it working like so for the various T5 variants:

from transformers import T5EncoderModel, T5Gemma2Encoder, T5GemmaEncoderModel, AutoConfig

# T5:
T5EncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.*"]
# Encoder only:
model = T5EncoderModel.from_pretrained("sentence-transformers/gtr-t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>

# Encoder-decoder:
model = T5EncoderModel.from_pretrained("google-t5/t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>

# T5Gemma:
config = AutoConfig.from_pretrained("google/t5gemma-s-s-prefixlm")
config.is_encoder_decoder = False
T5GemmaEncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.*"]
# Encoder only (still training)
# model = T5GemmaEncoderModel.from_pretrained("tomaarsen/t5gemma-s-gooaq-cmnrl")
model = T5GemmaEncoderModel.from_pretrained(r"C:\code\sentence-transformers\models\t5gemma-s-gooaq-cmnrl\checkpoint-27")
print(type(model))
# <class 'transformers.models.t5gemma.modeling_t5gemma.T5GemmaEncoderModel'>

# Encoder-decoder
model = T5GemmaEncoderModel.from_pretrained("google/t5gemma-s-s-prefixlm", config=config)
print(type(model))
# <class 'transformers.models.t5gemma.modeling_t5gemma.T5GemmaEncoderModel'>



T5Gemma2Encoder._keys_to_ignore_on_load_unexpected = ["decoder.*"]
T5Gemma2Encoder.base_model_prefix = "model.encoder"
model = T5Gemma2Encoder.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>

T5Gemma2Encoder._keys_to_ignore_on_load_unexpected = ["decoder.*"]
T5Gemma2Encoder.base_model_prefix = "model.encoder"
model = T5Gemma2Encoder.from_pretrained("google/t5gemma-2-270m-270m")
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>

This even works on main if I import T5Gemma2Encoder like

from transformers.models.t5gemma2.modeling_t5gemma2 import T5Gemma2Encoder

In short, I reverted the t5gemma2_encoder changes on this PR.

However, one issue does remain: the T5Gemma2Config its __setattr__ is responsible for tying some attributes between the text and vision configs in T5Gemma2EncoderConfig, but it would be much preferable if T5Gemma2EncoderConfig is responsible for this itself. Without this fix, e.g. the config.text_config._attn_implementation is None because the config (T5Gemma2EncoderConfig) is updated, but it's not correctly propagated.

from transformers.models.t5gemma2.modeling_t5gemma2 import T5Gemma2Encoder

encoder = T5Gemma2Encoder.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
print(f"{encoder.config._attn_implementation=}")
print(f"{encoder.config.text_config._attn_implementation=}")
print(f"{encoder.config.vision_config._attn_implementation=}")

Main:

Loading weights: 100%|████████████████████████████████| 676/676 [00:00<00:00, 4255.35it/s, Materializing param=vision_tower.vision_model.post_layernorm.weight]
encoder.config._attn_implementation='sdpa'
encoder.config.text_config._attn_implementation=None
encoder.config.vision_config._attn_implementation='sdpa'

This PR:

Loading weights: 100%|████████████████████████████████| 676/676 [00:00<00:00, 4324.80it/s, Materializing param=vision_tower.vision_model.post_layernorm.weight]
encoder.config._attn_implementation='sdpa'
encoder.config.text_config._attn_implementation='sdpa'
encoder.config.vision_config._attn_implementation='sdpa

(I do feel like there should be a more fundamental solution to this, multi-configs are pretty common and it seems important to propagate them correctly).

I think this is ready for review again - I'm getting awkward issues with modular_model_converter.py:

  File "/mnt/c/code/transformers/./utils/modular_model_converter.py", line 365, in leave_FunctionDef
    original_modeling_method_body = self.original_modeling_methods[func_name].body.body
                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: '__setattr__'

Will have to look into it later.

Tom Aarsen

zucchini-nlp · 2026-01-28T17:14:39Z

Attn implementation usually gets propagated in PreTrainedConfig but we do it only for 1-level of nested configs. It doesn't run recursively until all subconfigs are updated, so imo we need to fix that part of code
Also I am surprised to see __setattr__ overriden, in other nested models we don't do such a thing because each config is responsible for its own "model" and thus has its own set of fields. Lemme check why it appeared in the first place

transformers/src/transformers/configuration_utils.py

Lines 327 to 343 in d79e85d

    
           @_attn_implementation.setter 
        
           def _attn_implementation(self, value: str | dict | None): 
        
               """We set it recursively on the sub-configs as well""" 
        
               # Set if for current config 
        
               current_attn = getattr(self, "_attn_implementation", None) 
        
               attn_implementation = value if not isinstance(value, dict) else value.get("", current_attn) 
        
               self._attn_implementation_internal = attn_implementation 
        
               # Set it recursively on the subconfigs 
        
               for subconfig_key in self.sub_configs: 
        
                   subconfig = getattr(self, subconfig_key, None) 
        
                   if subconfig is not None: 
        
                       current_subconfig_attn = getattr(subconfig, "_attn_implementation", None) 
        
                       sub_implementation = ( 
        
                           value if not isinstance(value, dict) else value.get(subconfig_key, current_subconfig_attn) 
        
                       ) 
        
                       subconfig._attn_implementation = sub_implementation

tomaarsen · 2026-01-30T14:58:03Z

Any luck @zucchini-nlp?
I'll start preparing Sentence Transformers for this config-propagation issue to be resolved, so that I can import transformers.models.t5gemma.modeling_t5gemma2.T5Gemma2Encoder and use it in ST.

Tom Aarsen

tomaarsen · 2026-02-02T11:07:49Z

#43633 has superseded part of this PR. I'll instead focus on allowing t5gemma2_encoder to work with AutoConfig and AutoModel.

Tom Aarsen

tomaarsen · 2026-02-02T13:41:46Z

I've removed some commits that mirrorred #43633. Now, all this PR does is allow for:

from transformers import T5Gemma2Encoder, AutoModel, AutoConfig

model_name = "tomaarsen/t5gemma2-270m-gooaq-cmnrl"
config = AutoConfig.from_pretrained(model_name)
print(type(config))
# <class 'transformers.models.t5gemma2.configuration_t5gemma2.T5Gemma2EncoderConfig'>
model = AutoModel.from_pretrained(model_name)
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>
model = T5Gemma2Encoder.from_pretrained(model_name)
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>

Especially the first one is required in Sentence Transformers.

Tom Aarsen

zucchini-nlp

LGTM!

vasqu

I think t5gemma2 is a special case so it's fine. However, I'm a bit concerned whether this will become a recurring pattern and we should update in a general pattern for all encoder-decoder models instead, e.g. Bart, T5

If this is indeed a unique one, I'm fine with making an exception - just wanna hear your opinion on this and whether we should focus on generalizing instead

vasqu · 2026-02-02T16:47:24Z

src/transformers/models/t5gemma2/modular_t5gemma2.py

+    "T5Gemma2Decoder",
+    "T5Gemma2Encoder",


We allow both encoder and decoder, is this intentional? Looking at the auto mappings it is only focussing on the encoder

I can also exclusively allow importing the Encoder, that's also fine, but I imagined it might be useful to allow importing the decoder perhaps? That's not really my field, for the most part, so I can't say for sure.

Wouldnt we need to update the auto mappings for the decoder as well?

If we want to support loading a decoder with AutoModel/AutoConfig as well, but I'm not sure if that happens. I do know that it happens for encoders, so I added deecb86 to not update it for decoders. I'm fine to exclude or include them either way.

Let's keep it encoder only then. We should not add more than we need to

Agreed, done in e3e1f0f

tomaarsen · 2026-02-02T17:02:04Z

I think t5gemma2 is a special case so it's fine. However, I'm a bit concerned whether this will become a recurring pattern and we should update in a general pattern for all encoder-decoder models instead, e.g. Bart, T5

If this is indeed a unique one, I'm fine with making an exception - just wanna hear your opinion on this and whether we should focus on generalizing instead

Looking into this now. My understanding was that T5Gemma2 was the only architecture at this time that has different configs that have different model_type's, but apparently that's not the case: T5Gemma also has a subconfig with model_type = "t5_gemma_module". I think T5Gemma would then have the same issue as the issue that this PR is fixing for T5Gemma2.

That does make things a bit more awkward, perhaps there's more encoder-decoder architectures whose encoders can't be separately loaded with Auto... due to this. I think I'll need to do more research re. T5Gemma and its model_type = "t5_gemma_module".

Tom Aarsen

tomaarsen · 2026-02-03T10:08:24Z

Okay, I've figured out why T5Gemma works fine without the changes in this PR:
With T5Gemma I'm loading a T5GemmaEncoderModel, a class that accepts a T5GemmaConfig, and then initializes a T5GemmaEncoder(config.encoder). For Sentence Transformers, I can AutoConfig.from_pretrained the config, recognize that it's T5GemmaConfig, and choose to load with T5GemmaEncoderModel instead of AutoModel. This gives me the encoder nicely.

With T5Gemma2 I'm loading the T5Gemma2Encoder directly, a class that accepts the T5Gemma2EncoderConfig. There is no T5Gemma2EncoderModel class that nicely wraps the T5Gemma2Encoder and accepts a T5Gemma2Config. For Sentence Transformers, I cannot use AutoConfig.from_pretrained to get the config, because it isn't registered.

In short, I think the T5Gemma2 is a bit of an edge case, and it should be fine to register the encoder (and decoder?) as well.

Tom Aarsen

vasqu

LGTM, should we update the title too? It's only about the config now

Also, let's wait for #43633 first? Kinda dependent on that one 👀 but feel free to merge if not

vasqu · 2026-02-03T11:36:27Z

src/transformers/models/t5gemma2/modular_t5gemma2.py

+    "T5Gemma2Decoder",
+    "T5Gemma2Encoder",


Let's keep it encoder only then. We should not add more than we need to

github-actions · 2026-02-03T11:48:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, t5gemma2

tomaarsen · 2026-02-03T11:50:40Z

LGTM, should we update the title too? It's only about the config now

PR title is still correct as it stands, this PR currently allows for loading the AutoConfig, AutoProcessor, AutoModel for t5gemma2_encoder:

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
Loading weights: 100%|████████████████████████████████| 676/676 [00:00<00:00, 2783.63it/s, Materializing param=vision_tower.vision_model.post_layernorm.weight]
>>> type(model)
<class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>

Also, let's wait for #43633 first? Kinda dependent on that one 👀 but feel free to merge if not

They're related, but I think it doesn't matter which is merged first.
Seems fine to wait, though. P.s. I can't merge PRs with failing tests (anymore?)

Tom Aarsen

vasqu · 2026-02-03T12:00:46Z

No worries, rerunning CI (it's a flaky test). And you should not be able to merge without green CI 😬

tomaarsen · 2026-02-03T15:29:00Z

Makes sense! Especially with the weekly releases. I'll wait for #43633 as it grew quite a bit larger and I'll have to test to see whether it still works with my T5Gemma2 integration from huggingface/sentence-transformers#3644

Tom Aarsen

tomaarsen added 2 commits January 28, 2026 14:22

Allow loading T5GemmaEncoder/T5GemmaDecoder

bf0ba13

This is valuable for Sentence Transformers, which may want to load the encoder only. I've added the decoder only to mirror the changes I need for the encoder.

Remove t5gemma2_decoder as I only have checkpoints for the encoder

deecb86

tomaarsen mentioned this pull request Jan 28, 2026

T5Gemma2 Support huggingface/sentence-transformers#3604

Closed

zucchini-nlp approved these changes Jan 28, 2026

View reviewed changes

tomaarsen marked this pull request as draft January 28, 2026 15:56

tomaarsen marked this pull request as ready for review January 28, 2026 16:36

zucchini-nlp mentioned this pull request Jan 30, 2026

🚨 T5Gemma2 model structure #43633

Merged

This was referenced Feb 1, 2026

[PR] [feat] Allow loading T5Gemma2Encoder with AutoModel Sandgarden-Demo/transformers#40

Open

[PR] T5Gemma2 Sandgarden-Demo/transformers#74

Open

tomaarsen force-pushed the feat/t5gemma2_encoder branch from 45bc000 to deecb86 Compare February 2, 2026 13:36

tomaarsen requested a review from zucchini-nlp February 2, 2026 13:41

zucchini-nlp approved these changes Feb 2, 2026

View reviewed changes

tomaarsen mentioned this pull request Feb 2, 2026

[feat] Add support for T5Gemma and T5Gemma2 models huggingface/sentence-transformers#3644

Merged

tomaarsen requested a review from vasqu February 2, 2026 15:32

Merge branch 'main' into feat/t5gemma2_encoder

79bbaec

vasqu reviewed Feb 2, 2026

View reviewed changes

vasqu approved these changes Feb 3, 2026

View reviewed changes

tomaarsen added 2 commits February 3, 2026 12:43

Merge branch 'main' into feat/t5gemma2_encoder

b062df7

Don't allow importing Decoder; exclusively focus on encoders

e3e1f0f

tomaarsen merged commit 8099619 into huggingface:main Feb 3, 2026
25 checks passed

Conversation

tomaarsen commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Details

Usage:

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 28, 2026

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomaarsen commented Jan 28, 2026

Uh oh!

tomaarsen commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Jan 28, 2026

Uh oh!

tomaarsen commented Jan 30, 2026

Uh oh!

tomaarsen commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Feb 2, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

tomaarsen Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

tomaarsen Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

tomaarsen Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

tomaarsen commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

tomaarsen commented Feb 3, 2026

Uh oh!

vasqu commented Feb 3, 2026

Uh oh!

tomaarsen commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

tomaarsen commented Jan 28, 2026 •

edited

Loading

zucchini-nlp left a comment •

edited

Loading

tomaarsen commented Jan 28, 2026 •

edited

Loading

tomaarsen commented Feb 2, 2026 •

edited

Loading

tomaarsen Feb 3, 2026 •

edited

Loading

tomaarsen commented Feb 2, 2026 •

edited

Loading

tomaarsen commented Feb 3, 2026 •

edited

Loading