Consistent naming for images kwargs #40834

zucchini-nlp · 2025-09-11T19:54:42Z

What does this PR do?

it lays ground for my next PR, where I am aiming to unify DefaultImagesKwargsFastInitand ImagesKwargs since they are almost identical except for three kwargs (size_divisor, do_pad and pad_size)

This PR consolidates naming of the above there kwargs. Specifically:

size_divisor is removed from general kwargs as it's used only in ~5 model
pad_size and do_pad are added in default set of fast image processor kwargs along with a default self.pad method
A few deprecations added where the naming was not consistent, e.g. size_divisibility instead of size_divisor

zucchini-nlp · 2025-09-11T19:56:54Z

src/transformers/models/donut/image_processing_donut.py

        return pad(image, padding, data_format=data_format, input_data_format=input_data_format)

-    def pad(self, *args, **kwargs):
-        logger.info("pad is deprecated and will be removed in version 4.27. Please use pad_image instead.")


HuggingFaceDocBuilderDev · 2025-09-11T20:04:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-09-12T15:16:21Z

Done, finally! @yonigozlan , I'd like to ask a first review from you as it touches fast image processors. To keep you aligned, the next PR will remove DefaultFastImageKwargs and replace it with ImagesKwargs everywhere. I have a draft locally, haven't pushed changes to all models yet

yonigozlan

Thanks a lot for the huge work!
First quick review, I didn't check every models, but overall I'm completely on board with this, very much needed.
Cant' wait to unify DefaultFastImageProcessorKwargs and ImagesKwargs

src/transformers/image_processing_utils_fast.py

yonigozlan · 2025-09-12T15:47:48Z

src/transformers/image_processing_utils_fast.py

+    def pad(
+        self,
+        images: "torch.Tensor",
+        pad_size: SizeDict = None,
+        fill_value: Optional[int] = 0,
+        padding_mode: Optional[str] = "constant",
+        return_mask: Optional[bool] = False,
+        disable_grouping: Optional[bool] = False,
+        **kwargs,
+    ) -> "torch.Tensor":
+        """
+        Pads images to `(pad_size["height"], pad_size["width"])` or to the largest size in the batch.
+
+        Args:
+            images (`torch.Tensor`):
+                Images to pad.
+            pad_size (`SizeDict`, *optional*):
+                Dictionary in the format `{"height": int, "width": int}` specifying the size of the output image.
+            fill_value (`int`, *optional*, defaults to `0`):
+                The constant value used to fill the padded area.
+            padding_mode (`str`, *optional*, defaults to "constant"):
+                The padding mode to use. Can be any of the modes supported by
+                `torch.nn.functional.pad` (e.g. constant, reflection, replication).
+            return_mask (`bool`, *optional*, defaults to `False`):
+                Whether to return a pixel mask to denote padded regions.
+            disable_grouping (`bool`, *optional*, defaults to `False`):
+                Whether to disable grouping of images by size.
+
+        Returns:
+            `torch.Tensor`: The resized image.
+        """
+        if pad_size is not None:
+            if not (pad_size.height and pad_size.width):
+                raise ValueError(f"Pad size must contain 'height' and 'width' keys only. Got pad_size={pad_size}.")
+            pad_size = (pad_size.height, pad_size.width)
+        else:
+            pad_size = get_max_height_width(images)
+
+        grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
+        processed_images_grouped = {}
+        processed_masks_grouped = {}
+        for shape, stacked_images in grouped_images.items():
+            image_size = stacked_images.shape[-2:]
+            padding_height = pad_size[0] - image_size[0]
+            padding_width = pad_size[1] - image_size[1]
+            if padding_height < 0 or padding_width < 0:
+                raise ValueError(
+                    f"Padding dimensions are negative. Please make sure that the `pad_size` is larger than the "
+                    f"image size. Got pad_size={pad_size}, image_size={image_size}."
+                )
+            if image_size != pad_size:
+                padding = (0, 0, padding_width, padding_height)
+                stacked_images = F.pad(stacked_images, padding, fill=fill_value, padding_mode=padding_mode)
+            processed_images_grouped[shape] = stacked_images
+
+            if return_mask:
+                # keep only one from the channel dimension in pixel mask
+                stacked_masks = torch.zeros_like(stacked_images, dtype=torch.int64)[..., 0, :, :]
+                stacked_masks[..., : image_size[0], : image_size[1]] = 1
+                processed_masks_grouped[shape] = stacked_masks
+
+        processed_images = reorder_images(processed_images_grouped, grouped_images_index)
+        if return_mask:
+            processed_masks = reorder_images(processed_masks_grouped, grouped_images_index)
+            return processed_images, processed_masks
+
+        return processed_images


I was also wondering if we should include the whole group and reorder process in other functions like resize or rescale and normalize. Bu that means we can't have several processing functions under one grouping loop. And a new grouping is only needed if the previous function changed the shape of the images (not need after a rescale and normalize for example), and they do cost a bit of processing time, so it might be best to leave this to the _preprocess function. What do you think?

The grouping for padding was added only because it's supposed to be called outside of the loop if we want to pad all images to the same size. I see what you mean and it's partially related to above thread. We need to handle two types of paddings. WDYT of adding an arg like group_images=False ?

Oh yes that makes sense then, if we're padding to the maximum size. I don't think we need to add anything, we can just override the function when we have a different type of padding needed

yep, that is what most processors do. Though it's not "overriding", they use fn called pad_to_square which we can actually standardize and add in Base class as well in the next iteration

src/transformers/models/deepseek_vl/image_processing_deepseek_vl_fast.py

src/transformers/utils/auto_docstring.py

Cyrilvallez

Very nice! Aligned with the standardization vision we have for v5! I'm just confused about the default value of None. Why not just set it to False so it's a proper bool, instead of relying on the truth value of None being False?

zucchini-nlp · 2025-09-17T16:03:38Z

Will merge to unblock myself for the next PR

EDIT: I don''t know where my comment is gone with a reply. Anyway, the None is simply for consistency with how fast image processors are implemented and other kwargs. We can batch update the defaults in subsequent PRs

github-actions · 2025-09-17T16:04:28Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bridgetower, cohere2_vision, conditional_detr, convnext, deepseek_vl, deepseek_vl_hybrid, deformable_detr, depth_pro, detr, dinov3_vit, donut, dpt, fuyu, gemma3, got_ocr2, grounding_dino

* use consistent naming for padding * no validation on pad size * add warnings * fix * fox copies * another fix * fix some tests * fix more tests * fix lasts tests * fix copies * better docstring * delete print

zucchini-nlp added 2 commits September 11, 2025 21:45

use consistent naming for padding

db736e2

no validation on pad size

cfd94c3

zucchini-nlp commented Sep 11, 2025

View reviewed changes

zucchini-nlp added 7 commits September 11, 2025 22:08

add warnings

c12e7f4

fix

7bc8f2a

fox copies

d1c2344

another fix

8b6dcb7

fix some tests

48c8179

fix more tests

193524c

fix lasts tests

a863e97

zucchini-nlp requested a review from yonigozlan September 12, 2025 15:16

fix copies

1810171

yonigozlan reviewed Sep 12, 2025

View reviewed changes

better docstring

29c7784

zucchini-nlp requested a review from qubvel September 15, 2025 16:39

Merge branch 'main' into consistent-image-kwargs-naming

8c97603

zucchini-nlp requested a review from Cyrilvallez September 17, 2025 09:00

zucchini-nlp mentioned this pull request Sep 17, 2025

🚨 [unbloating] unify TypedDict usage in processing #40931

Merged

Cyrilvallez approved these changes Sep 17, 2025

View reviewed changes

delete print

75fbe5b

zucchini-nlp enabled auto-merge (squash) September 17, 2025 16:03

zucchini-nlp merged commit 8e837f6 into huggingface:main Sep 17, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistent naming for images kwargs #40834

Consistent naming for images kwargs #40834

Uh oh!

zucchini-nlp commented Sep 11, 2025 •

edited

Loading

Uh oh!

zucchini-nlp Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

zucchini-nlp commented Sep 12, 2025

Uh oh!

yonigozlan left a comment

Uh oh!

Uh oh!

yonigozlan Sep 12, 2025

Uh oh!

yonigozlan Sep 12, 2025

Uh oh!

zucchini-nlp Sep 12, 2025

Uh oh!

yonigozlan Sep 12, 2025

Uh oh!

zucchini-nlp Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez left a comment

Uh oh!

zucchini-nlp commented Sep 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

Consistent naming for images kwargs #40834

Consistent naming for images kwargs #40834

Uh oh!

Conversation

zucchini-nlp commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

zucchini-nlp Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

zucchini-nlp commented Sep 12, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yonigozlan Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Sep 11, 2025 •

edited

Loading

zucchini-nlp commented Sep 17, 2025 •

edited

Loading