Skip to content

Llava onevision: output align for tests and add image_sizes input param#43678

Merged
vasqu merged 9 commits intohuggingface:mainfrom
kaixuanliu:llava_onevision_output
Feb 3, 2026
Merged

Llava onevision: output align for tests and add image_sizes input param#43678
vasqu merged 9 commits intohuggingface:mainfrom
kaixuanliu:llava_onevision_output

Conversation

@kaixuanliu
Copy link
Contributor

In this PR, we do several things for llava_onevision model:

  1. skip torch_exportable tests as it does not support it
  2. unify expected output for cuda and xpu
  3. add image_sizes param in flash_attn_inference_equivalence func to support VLM models like lighton_ocr and llava_onevision

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@kaixuanliu
Copy link
Contributor Author

@vasqu pls help review, thx!

Copy link
Contributor Author

@kaixuanliu kaixuanliu Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I revert the changes in #43403, as it seems image_sizes input param is a common param for VLM models. As both lighton_ocr and llava_onevision models need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, fair enough then if it's not unique anymore. Just see my comment to add examples to that comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@kaixuanliu kaixuanliu changed the title Llava onevision output Llava onevision: output align for tests and add image_sizes input param Feb 2, 2026
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see my smaller comments + I'd like to wait for @ydshieh on the cuda 7 comment (if we still use it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, fair enough then if it's not unique anymore. Just see my comment to add examples to that comments

)
EXPECTED_DECODED_TEXT = EXPECTED_DECODED_TEXTS.get_expectation()
# fmt: on
EXPECTED_DECODED_TEXT = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so the results are the same again 👀

Comment on lines +360 to 361
("xpu", 3): 'user\n\nWhat do you see in this image?\nassistant\nThe image is a radar chart that compares the performance of different models in a specific task, likely related to natural language processing or machine learning. The chart is divided into several axes, each representing a different model or method. The models are color-coded and labeled with their respective names. The axes are labeled with terms such as "VQA," "GQA," "MQA," "VIZ," "TextVQA," "SQA-IMG," and "MQE." The radar chart shows',
("cuda", 7): 'user\n\nWhat do you see in this image?\nassistant\nThe image is a radar chart that compares the performance of different models in a specific task, likely related to natural language processing or machine learning. The chart is divided into several axes, each representing a different model or method. The models are color-coded and labeled with their respective names. The axes are labeled with terms such as "VQA," "GQA," "MQA," "VQAv2," "MM-Vet," "LLaVA-Bench," "LLaVA-1',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like xpu and cuda (8) are the same again. Only cuda 7 differs but iirc that's for the old T4 GPUs so IMO it could be merged again too because T4 GPUs are no longer used on our CI

cc @ydshieh wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I upgraded Pytorch to 2.10, and the output for XPU and A100 are the same now. But I do not have cuda 7 device, so I keep the result for cuda 7 here.

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: lighton_ocr, llava_onevision

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just running the slow test to be sure

@vasqu
Copy link
Contributor

vasqu commented Feb 3, 2026

run-slow: lighton_ocr, llava_onevision

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/lighton_ocr", "models/llava_onevision"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 804d5548 merge commit
PR 7bf284b0 branch commit
main b6a202f8 base commit

✅ No failing test specific to this PR 🎉 👏 !

@vasqu vasqu enabled auto-merge (squash) February 3, 2026 14:21
@vasqu vasqu merged commit 71956a8 into huggingface:main Feb 3, 2026
26 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants