-
Notifications
You must be signed in to change notification settings - Fork 30.7k
fix_image_processing_fast_for_glm4v #40483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement a fast GLM-4.1V image processor with shape-based grouping, ensuring per-sub-batch size consistency and improving mixed-size batch efficiency. |
cc @yonigozlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @lambertwjh, very nice update! We just have to ensure we're not breaking backward compatibility
) | ||
stacked_images = self.resize( | ||
stacked_images, | ||
size=SizeDict(height=resized_height, width=resized_width), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this might be breaking backward compatibility, as resized size used to be computed as the max of all target sizes in the batch. Not exactly sure why this was the case in the first place, but let's make sure we don't have edge cases here that would make this a breaking change. In particular, having the same resized size for all images in the batch ensured that we could stack the images in the end, not sure this is the case now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "backward compatibility" concern is fundamentally invalid because the current behavior is already broken—it fails to process mixed-size batches entirely, while same-size batches remain completely unaffected since identical input dimensions produce identical output dimensions, and the Fast version has already proven the safety of this fix through successful implementation and testing.
The Fast version's proven approach:
group images by their original dimensions, process each group independently by applying smart_resize per group, maintain the original batch sequence order through proper reconstruction, and only stack dimensionally compatible tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow you, when you say the current behavior is broken, do you mean it's incorrect because the images are not resized to the correct size, or because it crashes?
A big part of the issue is that there is not image processing tests for this model for some reason. adding a test file would make it clearer what works and what doesn't.
Would you mind adding this test file? you can look at other test_image_processing_....py
files to see how they should be written.
If you don't have the bandwidth for that, we can open a separate PR.
Thanks a lot!
1.What exactly is broken? |
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again @lambertwjh, thanks for the explanation, after reading the code more thoroughly I agree with your changes. I added tests to make sure we have equivalent slow and fast processors and so we don't break this in the future.
Thanks for contributing LGTM!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* fix_image_processing_fast_for_glm4v * fix(format): auto-ruff format * add test image processing glm4v * fix quality --------- Co-authored-by: Your Name <you@example.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* fix_image_processing_fast_for_glm4v * fix(format): auto-ruff format * add test image processing glm4v * fix quality --------- Co-authored-by: Your Name <you@example.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.