-
Notifications
You must be signed in to change notification settings - Fork 30.6k
Consistent naming for images kwargs #40834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent naming for images kwargs #40834
Conversation
return pad(image, padding, data_format=data_format, input_data_format=input_data_format) | ||
|
||
def pad(self, *args, **kwargs): | ||
logger.info("pad is deprecated and will be removed in version 4.27. Please use pad_image instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.27 🤯
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Done, finally! @yonigozlan , I'd like to ask a first review from you as it touches fast image processors. To keep you aligned, the next PR will remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the huge work!
First quick review, I didn't check every models, but overall I'm completely on board with this, very much needed.
Cant' wait to unify DefaultFastImageProcessorKwargs
and ImagesKwargs
def pad( | ||
self, | ||
images: "torch.Tensor", | ||
pad_size: SizeDict = None, | ||
fill_value: Optional[int] = 0, | ||
padding_mode: Optional[str] = "constant", | ||
return_mask: Optional[bool] = False, | ||
disable_grouping: Optional[bool] = False, | ||
**kwargs, | ||
) -> "torch.Tensor": | ||
""" | ||
Pads images to `(pad_size["height"], pad_size["width"])` or to the largest size in the batch. | ||
Args: | ||
images (`torch.Tensor`): | ||
Images to pad. | ||
pad_size (`SizeDict`, *optional*): | ||
Dictionary in the format `{"height": int, "width": int}` specifying the size of the output image. | ||
fill_value (`int`, *optional*, defaults to `0`): | ||
The constant value used to fill the padded area. | ||
padding_mode (`str`, *optional*, defaults to "constant"): | ||
The padding mode to use. Can be any of the modes supported by | ||
`torch.nn.functional.pad` (e.g. constant, reflection, replication). | ||
return_mask (`bool`, *optional*, defaults to `False`): | ||
Whether to return a pixel mask to denote padded regions. | ||
disable_grouping (`bool`, *optional*, defaults to `False`): | ||
Whether to disable grouping of images by size. | ||
Returns: | ||
`torch.Tensor`: The resized image. | ||
""" | ||
if pad_size is not None: | ||
if not (pad_size.height and pad_size.width): | ||
raise ValueError(f"Pad size must contain 'height' and 'width' keys only. Got pad_size={pad_size}.") | ||
pad_size = (pad_size.height, pad_size.width) | ||
else: | ||
pad_size = get_max_height_width(images) | ||
|
||
grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping) | ||
processed_images_grouped = {} | ||
processed_masks_grouped = {} | ||
for shape, stacked_images in grouped_images.items(): | ||
image_size = stacked_images.shape[-2:] | ||
padding_height = pad_size[0] - image_size[0] | ||
padding_width = pad_size[1] - image_size[1] | ||
if padding_height < 0 or padding_width < 0: | ||
raise ValueError( | ||
f"Padding dimensions are negative. Please make sure that the `pad_size` is larger than the " | ||
f"image size. Got pad_size={pad_size}, image_size={image_size}." | ||
) | ||
if image_size != pad_size: | ||
padding = (0, 0, padding_width, padding_height) | ||
stacked_images = F.pad(stacked_images, padding, fill=fill_value, padding_mode=padding_mode) | ||
processed_images_grouped[shape] = stacked_images | ||
|
||
if return_mask: | ||
# keep only one from the channel dimension in pixel mask | ||
stacked_masks = torch.zeros_like(stacked_images, dtype=torch.int64)[..., 0, :, :] | ||
stacked_masks[..., : image_size[0], : image_size[1]] = 1 | ||
processed_masks_grouped[shape] = stacked_masks | ||
|
||
processed_images = reorder_images(processed_images_grouped, grouped_images_index) | ||
if return_mask: | ||
processed_masks = reorder_images(processed_masks_grouped, grouped_images_index) | ||
return processed_images, processed_masks | ||
|
||
return processed_images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering if we should include the whole group and reorder process in other functions like resize or rescale and normalize. Bu that means we can't have several processing functions under one grouping loop. And a new grouping is only needed if the previous function changed the shape of the images (not need after a rescale and normalize for example), and they do cost a bit of processing time, so it might be best to leave this to the _preprocess
function. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grouping for padding was added only because it's supposed to be called outside of the loop if we want to pad all images to the same size. I see what you mean and it's partially related to above thread. We need to handle two types of paddings. WDYT of adding an arg like group_images=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes that makes sense then, if we're padding to the maximum size. I don't think we need to add anything, we can just override the function when we have a different type of padding needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that is what most processors do. Though it's not "overriding", they use fn called pad_to_square
which we can actually standardize and add in Base class as well in the next iteration
src/transformers/models/deepseek_vl/image_processing_deepseek_vl_fast.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Aligned with the standardization vision we have for v5! I'm just confused about the default value of None
. Why not just set it to False so it's a proper bool, instead of relying on the truth value of None being False?
Will merge to unblock myself for the next PR EDIT: I don''t know where my comment is gone with a reply. Anyway, the |
[For maintainers] Suggested jobs to run (before merge) run-slow: bridgetower, cohere2_vision, conditional_detr, convnext, deepseek_vl, deepseek_vl_hybrid, deformable_detr, depth_pro, detr, dinov3_vit, donut, dpt, fuyu, gemma3, got_ocr2, grounding_dino |
* use consistent naming for padding * no validation on pad size * add warnings * fix * fox copies * another fix * fix some tests * fix more tests * fix lasts tests * fix copies * better docstring * delete print
* use consistent naming for padding * no validation on pad size * add warnings * fix * fox copies * another fix * fix some tests * fix more tests * fix lasts tests * fix copies * better docstring * delete print
* use consistent naming for padding * no validation on pad size * add warnings * fix * fox copies * another fix * fix some tests * fix more tests * fix lasts tests * fix copies * better docstring * delete print
What does this PR do?
it lays ground for my next PR, where I am aiming to unify
DefaultImagesKwargsFastInit
andImagesKwargs
since they are almost identical except for three kwargs (size_divisor, do_pad and pad_size)This PR consolidates naming of the above there kwargs. Specifically:
size_divisor
is removed from general kwargs as it's used only in ~5 modelpad_size
anddo_pad
are added in default set of fast image processor kwargs along with a defaultself.pad
methodsize_divisibility
instead ofsize_divisor