fix_image_processing_fast_for_glm4v #40483

lambertwjh · 2025-08-27T10:49:55Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

lambertwjh · 2025-08-27T10:50:35Z

Implement a fast GLM-4.1V image processor with shape-based grouping, ensuring per-sub-batch size consistency and improving mixed-size batch efficiency.

Rocketknight1 · 2025-08-27T11:30:56Z

cc @yonigozlan

yonigozlan

Thanks for working on this @lambertwjh, very nice update! We just have to ensure we're not breaking backward compatibility

yonigozlan · 2025-09-02T21:50:10Z

src/transformers/models/glm4v/image_processing_glm4v_fast.py

+                )
+                stacked_images = self.resize(
+                    stacked_images,
+                    size=SizeDict(height=resized_height, width=resized_width),


Looks like this might be breaking backward compatibility, as resized size used to be computed as the max of all target sizes in the batch. Not exactly sure why this was the case in the first place, but let's make sure we don't have edge cases here that would make this a breaking change. In particular, having the same resized size for all images in the batch ensured that we could stack the images in the end, not sure this is the case now.

The "backward compatibility" concern is fundamentally invalid because the current behavior is already broken—it fails to process mixed-size batches entirely, while same-size batches remain completely unaffected since identical input dimensions produce identical output dimensions, and the Fast version has already proven the safety of this fix through successful implementation and testing.
The Fast version's proven approach:
group images by their original dimensions, process each group independently by applying smart_resize per group, maintain the original batch sequence order through proper reconstruction, and only stack dimensionally compatible tensors.

I am not sure I follow you, when you say the current behavior is broken, do you mean it's incorrect because the images are not resized to the correct size, or because it crashes?
A big part of the issue is that there is not image processing tests for this model for some reason. adding a test file would make it clearer what works and what doesn't.
Would you mind adding this test file? you can look at other test_image_processing_....py files to see how they should be written.
If you don't have the bandwidth for that, we can open a separate PR.
Thanks a lot!

lambertwjh · 2025-09-08T04:41:57Z

1.What exactly is broken?
Incorrectness vs crash: it’s about a runtime concatenation failure on mixed-size batches. When images of different original sizes are processed, smart_resize produces different (H, W) per shape group in the Fast processor. After patchification, that yields different sequence lengths (grid_t * grid_h * grid_w) across groups, so the final torch.cat along the batch dimension fails because non-batch dimensions don’t match.
Same-size batches are unaffected because all groups end up with identical resized (H, W), thus identical grid sizes.
2. Why did this not show up in the “standard” processor?
The standard processor computes the resize target from the first image and applies the same target size to the whole batch. That avoids the final stacking error but is semantically inconsistent for mixed-size batches (the first image dictates the grid for all images).

github-actions · 2025-09-10T20:49:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4v

yonigozlan

Hi again @lambertwjh, thanks for the explanation, after reading the code more thoroughly I agree with your changes. I added tests to make sure we have equivalent slow and fast processors and so we don't break this in the future.
Thanks for contributing LGTM!

HuggingFaceDocBuilderDev · 2025-09-10T21:06:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* fix_image_processing_fast_for_glm4v * fix(format): auto-ruff format * add test image processing glm4v * fix quality --------- Co-authored-by: Your Name <you@example.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

Your Name and others added 2 commits August 27, 2025 18:44

fix_image_processing_fast_for_glm4v

6ce0b54

Merge branch 'main' into main

964a0ab

Your Name added 2 commits August 27, 2025 18:58

fix(format): auto-ruff format

6a5e3ad

Merge branch 'main' of https://github.com/lambertwjh/transformers

6130119

yonigozlan reviewed Sep 2, 2025

View reviewed changes

yonigozlan added 2 commits September 10, 2025 20:15

Merge remote-tracking branch 'upstream/main' into lambertwjh-main

a84917d

add test image processing glm4v

a932c70

yonigozlan approved these changes Sep 10, 2025

View reviewed changes

fix quality

c8d1dc3

yonigozlan enabled auto-merge (squash) September 10, 2025 20:56

yonigozlan merged commit dae1ccf into huggingface:main Sep 10, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix_image_processing_fast_for_glm4v #40483

fix_image_processing_fast_for_glm4v #40483

Uh oh!

lambertwjh commented Aug 27, 2025

Uh oh!

lambertwjh commented Aug 27, 2025

Uh oh!

Rocketknight1 commented Aug 27, 2025

Uh oh!

yonigozlan left a comment

Uh oh!

yonigozlan Sep 2, 2025

Uh oh!

lambertwjh Sep 3, 2025

Uh oh!

yonigozlan Sep 4, 2025

Uh oh!

lambertwjh commented Sep 8, 2025

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

yonigozlan left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 10, 2025

Uh oh!

Uh oh!

fix_image_processing_fast_for_glm4v #40483

fix_image_processing_fast_for_glm4v #40483

Uh oh!

Conversation

lambertwjh commented Aug 27, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

lambertwjh commented Aug 27, 2025

Uh oh!

Rocketknight1 commented Aug 27, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

yonigozlan Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

lambertwjh Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

lambertwjh commented Sep 8, 2025

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 10, 2025

Uh oh!

Uh oh!