Llava onevision: output align for tests and add `image_sizes` input param by kaixuanliu · Pull Request #43678 · huggingface/transformers

kaixuanliu · 2026-02-02T10:31:21Z

In this PR, we do several things for llava_onevision model:

skip torch_exportable tests as it does not support it
unify expected output for cuda and xpu
add image_sizes param in flash_attn_inference_equivalence func to support VLM models like lighton_ocr and llava_onevision

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu · 2026-02-02T10:32:06Z

@vasqu pls help review, thx!

kaixuanliu · 2026-02-02T10:35:30Z

tests/models/lighton_ocr/test_modeling_lighton_ocr.py

Here I revert the changes in #43403, as it seems image_sizes input param is a common param for VLM models. As both lighton_ocr and llava_onevision models need it.

Gotcha, fair enough then if it's not unique anymore. Just see my comment to add examples to that comments

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

vasqu

LGTM, see my smaller comments + I'd like to wait for @ydshieh on the cuda 7 comment (if we still use it)

tests/test_modeling_common.py

vasqu · 2026-02-02T12:01:35Z

tests/models/lighton_ocr/test_modeling_lighton_ocr.py

Gotcha, fair enough then if it's not unique anymore. Just see my comment to add examples to that comments

tests/models/llava_onevision/test_modeling_llava_onevision.py

vasqu · 2026-02-02T12:03:05Z

tests/models/llava_onevision/test_modeling_llava_onevision.py

-        )
-        EXPECTED_DECODED_TEXT = EXPECTED_DECODED_TEXTS.get_expectation()
-        # fmt: on
+        EXPECTED_DECODED_TEXT = [


Interesting, so the results are the same again 👀

vasqu · 2026-02-02T12:08:12Z

tests/models/llava_onevision/test_modeling_llava_onevision.py

+                ("xpu", 3): 'user\n\nWhat do you see in this image?\nassistant\nThe image is a radar chart that compares the performance of different models in a specific task, likely related to natural language processing or machine learning. The chart is divided into several axes, each representing a different model or method. The models are color-coded and labeled with their respective names. The axes are labeled with terms such as "VQA," "GQA," "MQA," "VIZ," "TextVQA," "SQA-IMG," and "MQE." The radar chart shows',
                ("cuda", 7): 'user\n\nWhat do you see in this image?\nassistant\nThe image is a radar chart that compares the performance of different models in a specific task, likely related to natural language processing or machine learning. The chart is divided into several axes, each representing a different model or method. The models are color-coded and labeled with their respective names. The axes are labeled with terms such as "VQA," "GQA," "MQA," "VQAv2," "MM-Vet," "LLaVA-Bench," "LLaVA-1',


It looks like xpu and cuda (8) are the same again. Only cuda 7 differs but iirc that's for the old T4 GPUs so IMO it could be merged again too because T4 GPUs are no longer used on our CI

cc @ydshieh wdyt?

Yes, I upgraded Pytorch to 2.10, and the output for XPU and A100 are the same now. But I do not have cuda 7 device, so I keep the result for cuda 7 here.

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

github-actions · 2026-02-03T10:08:06Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: lighton_ocr, llava_onevision

vasqu

LGTM, just running the slow test to be sure

vasqu · 2026-02-03T11:27:51Z

run-slow: lighton_ocr, llava_onevision

github-actions · 2026-02-03T11:29:10Z

This comment contains run-slow, running the specified jobs:

models: ["models/lighton_ocr", "models/llava_onevision"]
quantizations: []

github-actions · 2026-02-03T11:46:13Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	804d5548	merge commit
PR	7bf284b0	branch commit
main	b6a202f8	base commit

✅ No failing test specific to this PR 🎉 👏 !

HuggingFaceDocBuilderDev · 2026-02-03T14:30:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kaixuanliu added 4 commits February 2, 2026 09:42

unify expected output for llava_onevision model

7eafdf8

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

add image_sizes in test common

b4b8b00

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

change back to original code for lighton_ocr test file

157a96c

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge branch 'main' into llava_onevision_output

fabd98f

kaixuanliu commented Feb 2, 2026

View reviewed changes

fix LINT issue

2df36d1

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu changed the title ~~Llava onevision output~~ Llava onevision: output align for tests and add image_sizes input param Feb 2, 2026

vasqu reviewed Feb 2, 2026

View reviewed changes

kaixuanliu added 3 commits February 2, 2026 14:30

update code

4a9ff44

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

use seperate expectations

0be62ed

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

update

7461af0

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge branch 'main' into llava_onevision_output

7bf284b

vasqu approved these changes Feb 3, 2026

View reviewed changes

vasqu enabled auto-merge (squash) February 3, 2026 14:21

vasqu merged commit 71956a8 into huggingface:main Feb 3, 2026
26 checks passed

		("xpu", 3): 'user\n\nWhat do you see in this image?\nassistant\nThe image is a radar chart that compares the performance of different models in a specific task, likely related to natural language processing or machine learning. The chart is divided into several axes, each representing a different model or method. The models are color-coded and labeled with their respective names. The axes are labeled with terms such as "VQA," "GQA," "MQA," "VIZ," "TextVQA," "SQA-IMG," and "MQE." The radar chart shows',
		("cuda", 7): 'user\n\nWhat do you see in this image?\nassistant\nThe image is a radar chart that compares the performance of different models in a specific task, likely related to natural language processing or machine learning. The chart is divided into several axes, each representing a different model or method. The models are color-coded and labeled with their respective names. The axes are labeled with terms such as "VQA," "GQA," "MQA," "VQAv2," "MM-Vet," "LLaVA-Bench," "LLaVA-1',

Conversation

kaixuanliu commented Feb 2, 2026

Uh oh!

kaixuanliu commented Feb 2, 2026

Uh oh!

kaixuanliu Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Feb 3, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

CI Results

Commit Info

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaixuanliu Feb 2, 2026 •

edited

Loading