🚨 T5Gemma2 model structure#43633
Conversation
| # Set attn implementation manually because `text-config` is never passed to `super()` | ||
| self.text_config._attn_implementation_internal = self._check_and_adjust_attn_implementation( | ||
| self.text_config._attn_implementation, is_init_check=True | ||
| ) | ||
|
|
There was a problem hiding this comment.
@tomaarsen here is why the attn implementation was not being set
There was a problem hiding this comment.
Just making sure: this is not also necessary for the vision_config, right?
Will review in more details next week.
There was a problem hiding this comment.
nope, the vision config is used a few lines above to init a PreTrainedModel and thus it is passed to PreTrainedModel.__init__
There was a problem hiding this comment.
Hmm, could we refactor with the conversion mapping / checkpoint conversion mapping? Iiuc, if we refactor the text specific things into its own module then we won't have this issue
I think it needs conversion mapping because you also want to use the encoder as standalone model
There was a problem hiding this comment.
will be a huge change though, let me see
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Just ran some more extensive tests, this seems to work for me now. The attn_implementation is set automatically and my ST training works as expected.
This PR should supersede parts of #43559 now, which can be closed once this is merged.
|
Will finish up in a while and ask for review |
|
Somewhat related: If I train a T5GemmaEncoder, I end up with a repository like: https://huggingface.co/tomaarsen/t5gemma2-270m-gooaq-cmnrl/tree/main
|
|
@tomaarsen yes, it's expected that |
Agreed, will re-add it.
I will, but the old T5 family is also loaded with: And then I need to be able to use from transformers import AutoConfig
config = AutoConfig.from_pretrained("tomaarsen/t5gemma2-270m-gooaq-cmnrl")
|
|
Failing test is flaky, ready for review |
vasqu
left a comment
There was a problem hiding this comment.
The config changes are good!
I just think we could maybe convert the model instead via conversion_mapping or similar? This is a quick and dirty workaround so would be pro making this properly converted into its own encoder text module if possible
| # Set attn implementation manually because `text-config` is never passed to `super()` | ||
| self.text_config._attn_implementation_internal = self._check_and_adjust_attn_implementation( | ||
| self.text_config._attn_implementation, is_init_check=True | ||
| ) | ||
|
|
There was a problem hiding this comment.
Hmm, could we refactor with the conversion mapping / checkpoint conversion mapping? Iiuc, if we refactor the text specific things into its own module then we won't have this issue
I think it needs conversion mapping because you also want to use the encoder as standalone model
|
run-slow: t5gemma2 |
|
This comment contains models: ["models/t5gemma2"] |
|
run-slow: t5gemma2 |
|
run-slow: t5gemma2 |
|
💔 This comment contains |
|
run-slow: t5gemma2 |
|
This comment contains models: ["models/t5gemma2"] |
vasqu
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot. Much better this way! Just one nit on the test to maybe add cleanup on setup as well
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
|
Checked with rebase, everything is still working and tests are passing. Will merge after CI turns green |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: t5gemma, t5gemma2 |
What does this PR do?
Makes sure that the attn implementation is set to all sub-configs. The
config.encoder.text_configwas not getting its attn set because we aren't passing it toPreTrainedModel.__init__. We can't change the model structure without breaking so I manually re-added a call toself.adjust_attn_implemetationin modeling codeAlso deleted
__setattr__, not sure what is the reason behind having them. Composite configs usually don't need to force-set the same attr in all subconfigs magically