Fix dtype mismatch in in modeling_llava_next by Godkunn · Pull Request #42979 · huggingface/transformers

Godkunn · 2025-12-21T05:40:32Z

What does this PR do?

This PR fixes a RuntimeError when running LlavaNextForConditionalGeneration with mixed precision (e.g., BFloat16 vs Float16).

The Fix:
I added a cast .to(self.lm_head.weight.dtype) to hidden_states before passing it to lm_head. This ensures the input tensor matches the linear layer's weight type.

Verification:
Verified locally on T4 GPU using the reproduction script from the issue. The forward pass now completes without crashing.

Who can review?

@zucchini-nlp

Ensure logits are computed with the correct dtype.

github-actions · 2025-12-21T05:41:39Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: llava_next

qgallouedec · 2025-12-21T21:10:27Z

I don't think this is the right fix for this issue. It could actually come from the loading logic:

>>> from transformers import AutoModelForImageTextToText
>>> model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", dtype="auto")
>>> model.language_model.dtype, model.vision_tower.dtype
(torch.bfloat16, torch.float16)

but we expect (torch.float16, torch.float16)

qgallouedec · 2025-12-21T21:30:14Z

this is probably the issue https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/blob/main/config.json#L43

qgallouedec · 2025-12-21T21:33:41Z

Yep, this works:

import torch
from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", dtype="auto", revision="refs/pr/46")
input_ids = torch.randint(0, model.config.text_config.vocab_size, (2, 8))
model(input_ids=input_ids)

zucchini-nlp

Thanks @Godkunn though this is a general issue with how dtypes are dispatched when loading. It should not be fixed per each model class, but rather in base modeling class. I will make a fix, we have discussed internally and decided to deprecate out different dtypes per backbone

Godkunn · 2025-12-22T12:01:31Z

Thanks @zucchini-nlp and @qgallouedec for the deep dive!

That makes total sense regarding the dispatching logic in the base modeling class. Since you've identified the root cause and will be implementing a central fix (and deprecating the mixed backbone dtypes), I'll close this PR so it doesn't clutter the queue.

Glad I could help bring attention to the issue! Happy to help verify the fix once it's live if needed...

Fix dtype mismatch in in modeling_llava_next

d281a31

Ensure logits are computed with the correct dtype.

zucchini-nlp reviewed Dec 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dtype mismatch in in modeling_llava_next#42979

Fix dtype mismatch in in modeling_llava_next#42979
Godkunn wants to merge 1 commit intohuggingface:mainfrom
Godkunn:main

Godkunn commented Dec 21, 2025

Uh oh!

github-actions bot commented Dec 21, 2025

Uh oh!

qgallouedec commented Dec 21, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Dec 21, 2025

Uh oh!

qgallouedec commented Dec 21, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Godkunn commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Godkunn commented Dec 21, 2025

What does this PR do?

Who can review?

Uh oh!

github-actions bot commented Dec 21, 2025

Uh oh!

qgallouedec commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Dec 21, 2025

Uh oh!

qgallouedec commented Dec 21, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Godkunn commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Dec 21, 2025 •

edited

Loading