-
Notifications
You must be signed in to change notification settings - Fork 30.7k
🚨 [unbloating] unify TypedDict
usage in processing
#40931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 [unbloating] unify TypedDict
usage in processing
#40931
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
""" | ||
|
||
do_reduce_labels: Optional[bool] | ||
BeitFastImageProcessorKwargs = BeitImageProcessorKwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept for BC as a reference, ideally should be deleted. Maybe in v5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can do it in v5, especially as I don't expect anyone importing this class! Let's add a clear comment to say it's scheduled to be deleted and is only there for BC in the meantime!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!
# Some preprocessors define a set of accepted "valid_kwargs" (currently only vision). | ||
# In those cases, we don’t declare a `ModalityKwargs` attribute in the TypedDict. | ||
# Instead, we dynamically obtain the kwargs from the preprocessor and merge them | ||
# with the general kwargs set. This ensures consistency between preprocessor and | ||
# processor classes, and helps prevent accidental mismatches. | ||
modality_valid_kwargs = set(ModelProcessorKwargs.__annotations__[modality].__annotations__) | ||
if modality in map_preprocessor_kwargs: | ||
preprocessor = getattr(self, map_preprocessor_kwargs[modality], None) | ||
preprocessor_valid_kwargs = ( | ||
getattr(preprocessor, "valid_kwargs", None) if preprocessor is not None else None | ||
) | ||
modality_valid_kwargs.update( | ||
set(preprocessor_valid_kwargs.__annotations__ if preprocessor_valid_kwargs is not None else []) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this allows us to not define images_kwargs: SiglipImagesKwargs
in processing file. Instead we check if the image processor has valid_kwargs
and obtain kwarg names from it
# For `common_kwargs` just update all modality-specific kwargs with same key/values | ||
common_kwargs = kwargs.get("common_kwargs", {}) | ||
common_kwargs.update(ModelProcessorKwargs._defaults.get("common_kwargs", {})) | ||
if common_kwargs: | ||
for kwarg in output_kwargs.values(): | ||
kwarg.update(common_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't return out["common_kwarg"]
anymore and instead just merge common kwargs for each modality. In all models the only common kwarg is return_tensors
so we don't really need a separate field for it
do_resize=True, | ||
size=None, | ||
do_normalize=True, | ||
do_pad=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the processor does not use do_pad
key
"ImageProcessorFast": "image_processing*_fast", # "*" indicates where to insert the model name before the "_fast" suffix | ||
"ImageProcessorFast": "image_processing.*_fast", # "*" indicates where to insert the model name before the "_fast" suffix | ||
"VideoProcessor": "video_processing", | ||
"VideoProcessorInitKwargs": "video_processing", | ||
"FastImageProcessorKwargs": "image_processing*_fast", | ||
"ImageProcessorKwargs": "image_processing", | ||
"FastImageProcessorKwargs": "image_processing.*_fast", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was not working properly for fast image processors because match_patterns
below does not match the file name. This fixes it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! It's very nice that we try to unify those, but I'm overall still having a hard time understanding the logic.
Why do we need valid_kwargs
? If I check for example cohere2 fast image processor, all the Kwargs class is completely redundant with all the class attributes... This makes it hard to read and understand what are the class used for. IMO, we should either drop the Kwargs class completely and use only the class attributes, and drop the class attributes and only use the Kwargs class. But having both is a struggle IMO
src/transformers/models/bridgetower/image_processing_bridgetower_fast.py
Outdated
Show resolved
Hide resolved
annotations: Optional[Union[AnnotationType, list[AnnotationType]]] = None, | ||
masks_path: Optional[Union[str, pathlib.Path]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we lose this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because they are in kwargs already (due to TypedDict
), and otherwise it is duplicated from args and from kwargs
The "valid_kwargs" field is used by processing class. Previously we had to define The class attributes define defaults for the model, while the TypedDict is for typing hints and to show users (and us) what can be passed as a kwarg You can look at it as having a config class, but in our case "TypedDict" is not meant to hold defaults. It is not a dataclass. For why not use a dataclass - see #40793 |
d6418f9
to
c0bfac7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's merge then! I did not follow all the conversation on why a dataclass was not good, but we may think about a way to unify both in the future! Would simplify quite a bit!
This PR is still very nice though 🤗
TypedDict
usage in processingTypedDict
usage in processing
[For maintainers] Suggested jobs to run (before merge) run-slow: aria, aya_vision, beit, blip, blip_2, bridgetower, chameleon, cohere2_vision, colpali, colqwen2, conditional_detr, convnext, csm, deepseek_vl, deepseek_vl_hybrid |
* just squash commits into one * fix style
What does this PR do?
This PR refactors how
TypedDicts
are used across processing classes to cut down duplication and avoid mismatches. Key updates:processing
, one infast image processing
). They were identical, both defining the same kwargs. Now we keep a single copy and just import it where needed.ModelVideosKwargs
/ModelImagesKwargs
. They should exist to properly merge kwargs withModelProcessingKwargs
. This PR removes the manual step, we now dynamically obtain them at runtime from the preprocessor class attrbiutes.valid_kwargs
attribute as a typed dict. With this, both slow and fast image processors share a consistent view of available kwargs