🚨 Allow `check_model_inputs` in core VLMs #40342

zucchini-nlp · 2025-08-21T10:04:28Z

What does this PR do?

Unblocks #39722 so we can use check_model_inputs in VLMs copied from llava. Otehrwise #39722 would need to re-define model classes to change forward pass and delete output_xxx kwargs

HuggingFaceDocBuilderDev · 2025-08-21T10:14:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel

Thanks, looks very nice! Did you check if that's breaking in terms of actually returned hidden_states? They might be different, especially for the last item in the list. Still worth unbloating, but we might need to add 🚨

src/transformers/models/internvl/modeling_internvl.py

src/transformers/utils/generic.py

src/transformers/models/llava/modeling_llava.py

zucchini-nlp · 2025-08-21T15:49:32Z

Did you check if that's breaking in terms of actually returned hidden_states

Checked the test_hidden_states and test_attentions which checks only sizes and lengths. I will trigger check with an actual model for values as well, to be sure it works

zucchini-nlp · 2025-08-22T11:26:20Z

run-slow: aya_vision, got_ocr2, internvl, llava, mistral3, perception_lm, vipllava

github-actions · 2025-08-22T11:27:45Z

This comment contains run-slow, running the specified jobs:

models: ['models/aya_vision', 'models/got_ocr2', 'models/internvl', 'models/llava', 'models/mistral3', 'models/perception_lm', 'models/vipllava']
quantizations: [] ...

qubvel

Thanks! I still see output_attentions/output_hidden_states/return_dict in modeling files, but suppose they should gone entirely in case you enable check_model_inputs. Am I missing smth?

src/transformers/models/aya_vision/modeling_aya_vision.py

src/transformers/models/got_ocr2/modeling_got_ocr2.py

tests/models/efficientloftr/test_modeling_efficientloftr.py

tests/models/idefics/test_modeling_idefics.py

zucchini-nlp · 2025-08-29T10:37:14Z

src/transformers/utils/generic.py

-                collected_outputs[key] = collected_outputs[key][:-1]
                if hasattr(outputs, "vision_hidden_states"):
+                    collected_outputs[key] = collected_outputs[key][:-1]
                    collected_outputs[key] += (outputs.vision_hidden_states,)
                elif hasattr(outputs, "last_hidden_state"):
+                    collected_outputs[key] = collected_outputs[key][:-1]


not all models have a last_hidden_state. In some multimodals, we return only vision_tower.hidden_states or similar and thus cropping the last hidden is not correct. We want to crop it only if it is being replaced right away with a hidden state after final "layer_norm" (e.g llama)

zucchini-nlp · 2025-08-29T10:38:53Z

run-slow: aimv2, aya_vision, blip, blip_2, got_ocr2, idefics, idefics2, idefics3, instructblip, instructblipvideo, internvl, janus, llava, mistral3, ovis2, phi4_multimodal

github-actions · 2025-08-29T10:40:18Z

This comment contains run-slow, running the specified jobs:

models: ['models/aimv2', 'models/aya_vision', 'models/blip', 'models/blip_2', 'models/got_ocr2', 'models/idefics', 'models/idefics2', 'models/idefics3', 'models/instructblip', 'models/instructblipvideo', 'models/internvl', 'models/janus', 'models/llava', 'models/mistral3', 'models/ovis2', 'models/phi4_multimodal']
quantizations: [] ...

zucchini-nlp · 2025-08-29T11:18:49Z

Should be ready now, updated all VLMs (+ had to update Siglip from which some model copy). In base models which consist of backbones, I didn't add any check_model_inputs decorator since outputs will be handled by each backbone and we just need to pass forward what the backbone returned

Tests are very flaky, so CI will be red

zucchini-nlp · 2025-08-29T11:19:22Z

run-slow: aimv2, aya_vision, blip, blip_2, got_ocr2, idefics, idefics2, idefics3, instructblip, instructblipvideo, internvl, janus, llava, mistral3, ovis2, smolvlm

github-actions · 2025-08-29T11:20:54Z

This comment contains run-slow, running the specified jobs:

models: ['models/aimv2', 'models/aya_vision', 'models/blip', 'models/blip_2', 'models/got_ocr2', 'models/idefics', 'models/idefics2', 'models/idefics3', 'models/instructblip', 'models/instructblipvideo', 'models/internvl', 'models/janus', 'models/llava', 'models/mistral3', 'models/ovis2', 'models/smolvlm']
quantizations: [] ...

qubvel

Very nice! I did not review every line, but the main idea is to double-check return type hints to make sure they match with the changed outputs

src/transformers/models/aimv2/modular_aimv2.py

src/transformers/models/blip/modeling_blip.py

src/transformers/models/idefics/modeling_idefics.py

qubvel

Thanks for the update, looks good to me!

This one might be relevant to merge to make sure users are aware if they request output_attentions=True

#40597

Also, for Siglip (and later for CLIP), it might be worthwhile to keep hidden_states collection explicit because that's something natural for vision models to output for many downstream tasks, such as segmentation. As far as I understand, CLIP hidden_states are used in diffusers as well. I did it partially in

#40505

can rebase on your version if it would be merged earlier

zucchini-nlp · 2025-09-04T15:16:15Z

Oke, ig all loading issues are resolved in main, so rebase will help with flakiness once and for all

zucchini-nlp · 2025-09-05T08:24:47Z

Why is it flaky, it downloads with hf hub 😭

github-actions · 2025-09-05T09:58:02Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, altclip, aya_vision, blip, blip_2, clip, clipseg, git, got_ocr2, idefics, idefics2, idefics3, instructblip, instructblipvideo, internvl, janus

allow check_model_inputs in core VLMs

5ec3789

zucchini-nlp requested a review from qubvel August 21, 2025 10:05

qubvel reviewed Aug 21, 2025

View reviewed changes

src/transformers/models/internvl/modeling_internvl.py Show resolved Hide resolved

src/transformers/utils/generic.py Outdated Show resolved Hide resolved

src/transformers/models/llava/modeling_llava.py Outdated Show resolved Hide resolved

zucchini-nlp changed the title ~~Allow check_model_inputs in core VLMs~~ 🚨 Allow check_model_inputs in core VLMs Aug 21, 2025

zucchini-nlp added 3 commits August 22, 2025 12:50

address comments

6db7a44

fix style

7e1d1a5

why this didnt fail prev?

268bd49

zucchini-nlp requested a review from qubvel August 22, 2025 11:26

zucchini-nlp added 2 commits August 22, 2025 14:02

chec for Noneness instead

0be7f5e

Merge branch 'main' into check-model-inputs

9696c83

qubvel reviewed Aug 27, 2025

View reviewed changes

src/transformers/models/aya_vision/modeling_aya_vision.py Show resolved Hide resolved

src/transformers/models/got_ocr2/modeling_got_ocr2.py Show resolved Hide resolved

zucchini-nlp added 8 commits August 28, 2025 16:18

batch update vlms

ae9c66a

fix some tests

c6ee459

Merge remote-tracking branch 'upstream/main' into check-model-inputs

84c4178

fix copies

9a3d9bd

oops delete

fb59341

fix efficientloftr

05104d9

fix copies

202bf6b

i am stupid, fix idefics

59895d8

zucchini-nlp commented Aug 29, 2025

View reviewed changes

tests/models/efficientloftr/test_modeling_efficientloftr.py Show resolved Hide resolved

zucchini-nlp commented Aug 29, 2025

View reviewed changes

tests/models/idefics/test_modeling_idefics.py Show resolved Hide resolved

zucchini-nlp commented Aug 29, 2025

View reviewed changes

fix GC

20ae443

zucchini-nlp requested a review from qubvel August 29, 2025 11:16

qubvel reviewed Aug 29, 2025

View reviewed changes

zucchini-nlp added 6 commits September 1, 2025 14:55

return type and other comments

7e64094

we shouldn't manually change attention anymore

5929681

Merge remote-tracking branch 'upstream/main' into check-model-inputs

41d8f92

fix style

b8648c3

fix copies

59b74c8

fix the test

fe5b522

zucchini-nlp requested a review from qubvel September 1, 2025 15:13

qubvel approved these changes Sep 2, 2025

View reviewed changes

Merge branch 'main' into check-model-inputs

208424f

zucchini-nlp enabled auto-merge (squash) September 2, 2025 11:06

zucchini-nlp added 5 commits September 2, 2025 16:49

Merge branch 'main' into check-model-inputs

e22da39

Merge branch 'main' into check-model-inputs

ae302ea

Merge branch 'main' into check-model-inputs

9175d99

Merge branch 'main' into check-model-inputs

db58424

vision model shouldn't need attention, see e.g. CLIP/Siglip

a753e86

zucchini-nlp added 2 commits September 4, 2025 17:16

Merge branch 'main' into check-model-inputs

3de818f

Merge branch 'main' into check-model-inputs

3613648

Merge branch 'main' into check-model-inputs

37d4cbd

zucchini-nlp merged commit 4e195f1 into huggingface:main Sep 5, 2025
23 checks passed

jiqing-feng mentioned this pull request Sep 18, 2025

llava model compile output regression caused by check_model_inputs #40964

Open

4 tasks

🚨 Allow check_model_inputs in core VLMs #40342

🚨 Allow check_model_inputs in core VLMs #40342

Conversation

zucchini-nlp commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 21, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Aug 21, 2025

Uh oh!

zucchini-nlp commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

zucchini-nlp commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Sep 4, 2025

Uh oh!

zucchini-nlp commented Sep 5, 2025

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

🚨 Allow `check_model_inputs` in core VLMs #40342

🚨 Allow `check_model_inputs` in core VLMs #40342

zucchini-nlp commented Aug 21, 2025 •

edited

Loading

zucchini-nlp commented Aug 29, 2025 •

edited

Loading