ProcessorMixin doesn't properly instantiate image processors #29414

NielsRogge · 2024-03-03T11:00:53Z

System Info

Transformers dev version v4.38.2

Who can help?

Reproduction

Let's say you define your processor as follows:

from transformers.processing_utils import ProcessorMixin

class LlavaProcessor(ProcessorMixin):

    attributes = ["image_processor", "tokenizer"]
    image_processor_class = ("CLIPImageProcessor", "ViTImageProcessor")
    tokenizer_class = ("LlamaTokenizer", "LlamaTokenizerFast")

(this is mainly for demo purposes, since for PR #29012 I'd like to have LlavaProcessor work with 2 different image processor classes)

Then, even though you create a processor as follows:

from transformers import ViTImageProcessor, LlavaProcessor, LlamaTokenizer

image_processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

processor = LlavaProcessor(image_processor=image_processor, tokenizer=tokenizer)

processor.push_to_hub("nielsr/my-awesome-processor")

Reloading it:

from transformers import LlavaProcessor

processor = LlavaProcessor.from_pretrained("nielsr/my-awesome-processor")
print(type(processor.image_processor))

This is still going to be of type CLIPImageProcessor, even though we want to load ViTImageProcessor.

This is because of the way we decide which class to load here. Namely, if one defines a tuple for the image_processor_class attribute of the processor, then always the first class is used.

Expected behavior

The processor should reload the ViTImageProcessor instead of CLIPImageProcessor.

The current workaround is to do this:

from transformers.processing_utils import ProcessorMixin

class LlavaProcessor(ProcessorMixin):

    attributes = ["image_processor", "tokenizer"]
    image_processor_class = "AutoImageProcessor"
    tokenizer_class = ("LlamaTokenizer", "LlamaTokenizerFast")

This correctly instantiates the image processor. However, in PR #29012, @amyeroberts suggested that the use of the Auto class is discouraged and it might be more appropriate to define the specific classes.

I remember from the past that Sylvain had no problem regarding the use of the Auto class, but I'm up for discussion. It's definitely a bit inconsistent that we define explicit classes for the tokenizers, but not for the image processor.

Looking at it, I think the Auto class serve the exact purpose of what we're trying to achieve: loading the proper image processor class based on the preprocessor_config.json. Hence I'm wondering whether we shouldn't be leveraging the Auto classes by default for the image_processor_class and tokenizer_class attributes of multimodal processors.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-03-03T19:14:37Z

@NielsRogge Two things:

Can you link directly to the relevant comments, rather than the PR? If there are new commits, many comments - most of which are marked as resolved - it's very difficult to track what's being referred to
wrt my comment, what I'm referring to is using the autoclasses, but specifying the exact classes expected in the docstring and performing checks on the types, as done in the linked-to 28706 PR.

NielsRogge · 2024-03-03T19:40:19Z

Ok, I've updated the PR description to link to the original comment.

I see, so the use of autoclasses is allowed, as long as we raise an error? Would be great if we can make the verification of this general in the ProcessorMixin class so that we don't need to manually define the ValueErrors (cfr. my comment here).

amyeroberts · 2024-03-04T13:24:19Z

Copying the comment I made on the PR here, as my reply is basically the same but it's more relevant here:

I think Auto classes are only used for two processors at the moment (neither merged)? If we enable using autoclasses and then add authentication into the ProcessorMixin, then we're just doing a more round-about alternative to just being able to specify more than one class for feature_extractor_class, which I think would be preferable.

If you've saved a processor out with a specific image processor. Then it should be possible to load with that one and then verify its one of the "allowed" processors.

Part of the reason for having this protection at the moment is to try and make the limits of the processors clear: the LlavaProcessor isn't compatible with all image processors or all tokenizers, so indicating that in the docstring is misleading and unhelpful to users. Adding verification makes sure it fails early and warns users instead of things quietly behaving weirdly and us ending up with issues being raised.

NielsRogge mentioned this issue Mar 3, 2024

Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM #28706

Merged

huggingface deleted a comment from github-actions bot Apr 3, 2024

amyeroberts added the Feature request Request for a new feature label Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProcessorMixin doesn't properly instantiate image processors #29414

ProcessorMixin doesn't properly instantiate image processors #29414

NielsRogge commented Mar 3, 2024 •

edited

Loading

amyeroberts commented Mar 3, 2024 •

edited

Loading

NielsRogge commented Mar 3, 2024 •

edited

Loading

amyeroberts commented Mar 4, 2024 •

edited

Loading

ProcessorMixin doesn't properly instantiate image processors #29414

ProcessorMixin doesn't properly instantiate image processors #29414

Comments

NielsRogge commented Mar 3, 2024 • edited Loading

System Info

Who can help?

Reproduction

Expected behavior

amyeroberts commented Mar 3, 2024 • edited Loading

NielsRogge commented Mar 3, 2024 • edited Loading

amyeroberts commented Mar 4, 2024 • edited Loading

NielsRogge commented Mar 3, 2024 •

edited

Loading

amyeroberts commented Mar 3, 2024 •

edited

Loading

NielsRogge commented Mar 3, 2024 •

edited

Loading

amyeroberts commented Mar 4, 2024 •

edited

Loading