Skip to content

Conversation

zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Sep 16, 2025

What does this PR do?

Fixes #40875 and makes sure we have the same dtype before operations. The attention mask is used by LM and has to be the same dtype. Then we also need to cast VLM outputs because VLM and projection can be of different dtypes in configs

I still wonder if we're supposed to load the models with auto-dtype when doing from_pretrained, I found that it changed between 4.53 and 4.54 for now

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title Fix dtypes in Paligemma Fix dtype in Paligemma Sep 16, 2025
Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove the attr when not used!

Comment on lines 778 to 779
self.pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else -1
self.text_config_dtype = self.config.get_text_config().dtype or self.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those artefacts of modular in the two Gemma3? If so, let's quickly remove it with a del as it's not used anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the modular copy ig, Yeah, we can 'del' explictly

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) September 16, 2025 15:52
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: colpali, colqwen2, gemma3, gemma3n, paligemma

@zucchini-nlp zucchini-nlp merged commit cccef4b into huggingface:main Sep 16, 2025
17 checks passed
@merveenoyan
Copy link
Contributor

you're the best @zucchini-nlp 💗

ErfanBaghaei pushed a commit to ErfanBaghaei/transformers that referenced this pull request Sep 25, 2025
* fix dtypes

* fix copies

* delete unused attr
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
* fix dtypes

* fix copies

* delete unused attr
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
* fix dtypes

* fix copies

* delete unused attr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ColPaliForRetrieval errors out when loaded in half precision dtypes
4 participants