[feat] Allow loading T5Gemma2Encoder with AutoModel#43559
[feat] Allow loading T5Gemma2Encoder with AutoModel#43559tomaarsen merged 5 commits intohuggingface:mainfrom
feat] Allow loading T5Gemma2Encoder with AutoModel#43559Conversation
This is valuable for Sentence Transformers, which may want to load the encoder only. I've added the decoder only to mirror the changes I need for the encoder.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
IIUC we want to be able to load a complete T5Gemma or only its encoder module in ST therefore we can't do the same as in T5 with self._load_t5_module?
In any case, I think there is no strong objection to keep the module private, and we can make it available through the Auto-API. Let's also see if core maintainers agree
Hmm, looks like I might be able to change some things for T5Gemma2 on ST's side. With T5 I can import from transformers import T5EncoderModel
model = T5EncoderModel.from_pretrained("sentence-transformers/gtr-t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>
T5EncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.*"]
model = T5EncoderModel.from_pretrained("google-t5/t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>If I can import Will send more details when I know what'll work best.
|
|
Update: I can get it working like so for the various T5 variants: from transformers import T5EncoderModel, T5Gemma2Encoder, T5GemmaEncoderModel, AutoConfig
# T5:
T5EncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.*"]
# Encoder only:
model = T5EncoderModel.from_pretrained("sentence-transformers/gtr-t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>
# Encoder-decoder:
model = T5EncoderModel.from_pretrained("google-t5/t5-base")
print(type(model))
# <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>
# T5Gemma:
config = AutoConfig.from_pretrained("google/t5gemma-s-s-prefixlm")
config.is_encoder_decoder = False
T5GemmaEncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.*"]
# Encoder only (still training)
# model = T5GemmaEncoderModel.from_pretrained("tomaarsen/t5gemma-s-gooaq-cmnrl")
model = T5GemmaEncoderModel.from_pretrained(r"C:\code\sentence-transformers\models\t5gemma-s-gooaq-cmnrl\checkpoint-27")
print(type(model))
# <class 'transformers.models.t5gemma.modeling_t5gemma.T5GemmaEncoderModel'>
# Encoder-decoder
model = T5GemmaEncoderModel.from_pretrained("google/t5gemma-s-s-prefixlm", config=config)
print(type(model))
# <class 'transformers.models.t5gemma.modeling_t5gemma.T5GemmaEncoderModel'>
T5Gemma2Encoder._keys_to_ignore_on_load_unexpected = ["decoder.*"]
T5Gemma2Encoder.base_model_prefix = "model.encoder"
model = T5Gemma2Encoder.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>
T5Gemma2Encoder._keys_to_ignore_on_load_unexpected = ["decoder.*"]
T5Gemma2Encoder.base_model_prefix = "model.encoder"
model = T5Gemma2Encoder.from_pretrained("google/t5gemma-2-270m-270m")
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>This even works on from transformers.models.t5gemma2.modeling_t5gemma2 import T5Gemma2EncoderIn short, I reverted the However, one issue does remain: the from transformers.models.t5gemma2.modeling_t5gemma2 import T5Gemma2Encoder
encoder = T5Gemma2Encoder.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
print(f"{encoder.config._attn_implementation=}")
print(f"{encoder.config.text_config._attn_implementation=}")
print(f"{encoder.config.vision_config._attn_implementation=}")Main: This PR: (I do feel like there should be a more fundamental solution to this, multi-configs are pretty common and it seems important to propagate them correctly). I think this is ready for review again - I'm getting awkward issues with Will have to look into it later.
|
|
Attn implementation usually gets propagated in transformers/src/transformers/configuration_utils.py Lines 327 to 343 in d79e85d |
|
Any luck @zucchini-nlp?
|
|
#43633 has superseded part of this PR. I'll instead focus on allowing
|
45bc000 to
deecb86
Compare
|
I've removed some commits that mirrorred #43633. Now, all this PR does is allow for: from transformers import T5Gemma2Encoder, AutoModel, AutoConfig
model_name = "tomaarsen/t5gemma2-270m-gooaq-cmnrl"
config = AutoConfig.from_pretrained(model_name)
print(type(config))
# <class 'transformers.models.t5gemma2.configuration_t5gemma2.T5Gemma2EncoderConfig'>
model = AutoModel.from_pretrained(model_name)
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>
model = T5Gemma2Encoder.from_pretrained(model_name)
print(type(model))
# <class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>Especially the first one is required in Sentence Transformers.
|
vasqu
left a comment
There was a problem hiding this comment.
I think t5gemma2 is a special case so it's fine. However, I'm a bit concerned whether this will become a recurring pattern and we should update in a general pattern for all encoder-decoder models instead, e.g. Bart, T5
If this is indeed a unique one, I'm fine with making an exception - just wanna hear your opinion on this and whether we should focus on generalizing instead
| "T5Gemma2Decoder", | ||
| "T5Gemma2Encoder", |
There was a problem hiding this comment.
We allow both encoder and decoder, is this intentional? Looking at the auto mappings it is only focussing on the encoder
There was a problem hiding this comment.
I can also exclusively allow importing the Encoder, that's also fine, but I imagined it might be useful to allow importing the decoder perhaps? That's not really my field, for the most part, so I can't say for sure.
There was a problem hiding this comment.
Wouldnt we need to update the auto mappings for the decoder as well?
There was a problem hiding this comment.
If we want to support loading a decoder with AutoModel/AutoConfig as well, but I'm not sure if that happens. I do know that it happens for encoders, so I added deecb86 to not update it for decoders. I'm fine to exclude or include them either way.
There was a problem hiding this comment.
Let's keep it encoder only then. We should not add more than we need to
Looking into this now. My understanding was that T5Gemma2 was the only architecture at this time that has different configs that have different That does make things a bit more awkward, perhaps there's more encoder-decoder architectures whose encoders can't be separately loaded with Auto... due to this. I think I'll need to do more research re. T5Gemma and its
|
|
Okay, I've figured out why T5Gemma works fine without the changes in this PR: With T5Gemma2 I'm loading the In short, I think the T5Gemma2 is a bit of an edge case, and it should be fine to register the encoder (and decoder?) as well.
|
| "T5Gemma2Decoder", | ||
| "T5Gemma2Encoder", |
There was a problem hiding this comment.
Let's keep it encoder only then. We should not add more than we need to
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, t5gemma2 |
PR title is still correct as it stands, this PR currently allows for loading the AutoConfig, AutoProcessor, AutoModel for >>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
Loading weights: 100%|████████████████████████████████| 676/676 [00:00<00:00, 2783.63it/s, Materializing param=vision_tower.vision_model.post_layernorm.weight]
>>> type(model)
<class 'transformers.models.t5gemma2.modeling_t5gemma2.T5Gemma2Encoder'>
They're related, but I think it doesn't matter which is merged first.
|
|
No worries, rerunning CI (it's a flaky test). And you should not be able to merge without green CI 😬 |
|
Makes sense! Especially with the weekly releases. I'll wait for #43633 as it grew quite a bit larger and I'll have to test to see whether it still works with my T5Gemma2 integration from huggingface/sentence-transformers#3644
|
What does this PR do?
Details
This is valuable for Sentence Transformers, which may want to load the encoder only (see huggingface/sentence-transformers#3604). Here, we grab and train the encoder only, resulting in e.g.: https://huggingface.co/tomaarsen/t5gemma2-270m-gooaq-cmnrl
Usage:
I've not added the decoder as I only have weights for the encoder.
P.s., equivalent in Sentence Transformers:
This also relies on this PR.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
cc @Cyrilvallez @zucchini-nlp
P.s. let me know if you'd like to see new tests or docs for this.