-
Notifications
You must be signed in to change notification settings - Fork 30.6k
Fix dtype in Paligemma #40912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dtype in Paligemma #40912
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just remove the attr when not used!
self.pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else -1 | ||
self.text_config_dtype = self.config.get_text_config().dtype or self.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those artefacts of modular in the two Gemma3? If so, let's quickly remove it with a del
as it's not used anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the modular copy ig, Yeah, we can 'del' explictly
[For maintainers] Suggested jobs to run (before merge) run-slow: colpali, colqwen2, gemma3, gemma3n, paligemma |
you're the best @zucchini-nlp 💗 |
* fix dtypes * fix copies * delete unused attr
* fix dtypes * fix copies * delete unused attr
* fix dtypes * fix copies * delete unused attr
What does this PR do?
Fixes #40875 and makes sure we have the same dtype before operations. The attention mask is used by LM and has to be the same dtype. Then we also need to cast VLM outputs because VLM and projection can be of different dtypes in configs
I still wonder if we're supposed to load the models with auto-dtype when doing
from_pretrained
, I found that it changed between 4.53 and 4.54 for now