Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessorMixin doesn't properly instantiate image processors #29414

Open
NielsRogge opened this issue Mar 3, 2024 · 3 comments
Open

ProcessorMixin doesn't properly instantiate image processors #29414

NielsRogge opened this issue Mar 3, 2024 · 3 comments
Labels
Feature request Request for a new feature

Comments

@NielsRogge
Copy link
Contributor

NielsRogge commented Mar 3, 2024

System Info

Transformers dev version v4.38.2

Who can help?

@ArthurZucker @amyeroberts

Reproduction

Let's say you define your processor as follows:

from transformers.processing_utils import ProcessorMixin

class LlavaProcessor(ProcessorMixin):

    attributes = ["image_processor", "tokenizer"]
    image_processor_class = ("CLIPImageProcessor", "ViTImageProcessor")
    tokenizer_class = ("LlamaTokenizer", "LlamaTokenizerFast")

(this is mainly for demo purposes, since for PR #29012 I'd like to have LlavaProcessor work with 2 different image processor classes)

Then, even though you create a processor as follows:

from transformers import ViTImageProcessor, LlavaProcessor, LlamaTokenizer

image_processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

processor = LlavaProcessor(image_processor=image_processor, tokenizer=tokenizer)

processor.push_to_hub("nielsr/my-awesome-processor")

Reloading it:

from transformers import LlavaProcessor

processor = LlavaProcessor.from_pretrained("nielsr/my-awesome-processor")
print(type(processor.image_processor))

This is still going to be of type CLIPImageProcessor, even though we want to load ViTImageProcessor.

This is because of the way we decide which class to load here. Namely, if one defines a tuple for the image_processor_class attribute of the processor, then always the first class is used.

Expected behavior

The processor should reload the ViTImageProcessor instead of CLIPImageProcessor.

The current workaround is to do this:

from transformers.processing_utils import ProcessorMixin

class LlavaProcessor(ProcessorMixin):

    attributes = ["image_processor", "tokenizer"]
    image_processor_class = "AutoImageProcessor"
    tokenizer_class = ("LlamaTokenizer", "LlamaTokenizerFast")

This correctly instantiates the image processor. However, in PR #29012, @amyeroberts suggested that the use of the Auto class is discouraged and it might be more appropriate to define the specific classes.

I remember from the past that Sylvain had no problem regarding the use of the Auto class, but I'm up for discussion. It's definitely a bit inconsistent that we define explicit classes for the tokenizers, but not for the image processor.

Looking at it, I think the Auto class serve the exact purpose of what we're trying to achieve: loading the proper image processor class based on the preprocessor_config.json. Hence I'm wondering whether we shouldn't be leveraging the Auto classes by default for the image_processor_class and tokenizer_class attributes of multimodal processors.

@amyeroberts
Copy link
Collaborator

amyeroberts commented Mar 3, 2024

@NielsRogge Two things:

  1. Can you link directly to the relevant comments, rather than the PR? If there are new commits, many comments - most of which are marked as resolved - it's very difficult to track what's being referred to
  2. wrt my comment, what I'm referring to is using the autoclasses, but specifying the exact classes expected in the docstring and performing checks on the types, as done in the linked-to 28706 PR.

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Mar 3, 2024

Ok, I've updated the PR description to link to the original comment.

I see, so the use of autoclasses is allowed, as long as we raise an error? Would be great if we can make the verification of this general in the ProcessorMixin class so that we don't need to manually define the ValueErrors (cfr. my comment here).

@amyeroberts
Copy link
Collaborator

amyeroberts commented Mar 4, 2024

Copying the comment I made on the PR here, as my reply is basically the same but it's more relevant here:

I think Auto classes are only used for two processors at the moment (neither merged)? If we enable using autoclasses and then add authentication into the ProcessorMixin, then we're just doing a more round-about alternative to just being able to specify more than one class for feature_extractor_class, which I think would be preferable.

If you've saved a processor out with a specific image processor. Then it should be possible to load with that one and then verify its one of the "allowed" processors.

Part of the reason for having this protection at the moment is to try and make the limits of the processors clear: the LlavaProcessor isn't compatible with all image processors or all tokenizers, so indicating that in the docstring is misleading and unhelpful to users. Adding verification makes sure it fails early and warns users instead of things quietly behaving weirdly and us ending up with issues being raised.

@huggingface huggingface deleted a comment from github-actions bot Apr 3, 2024
@amyeroberts amyeroberts added the Feature request Request for a new feature label Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants