Skip to content

Error when passing a tensor of images to CLIPProcessor #21142

@AntreasAntoniou

Description

@AntreasAntoniou

System Info

  • huggingface_hub version: 0.11.1
  • Platform: Linux-4.19.0-23-cloud-amd64-x86_64-with-glibc2.31
  • Python version: 3.10.8
  • Running in iPython ?: No
  • Running in notebook ?: No
  • Running in Google Colab ?: No
  • Token path ?: /root/.huggingface/token
  • Has saved token ?: False
  • Configured git credential helpers: !f()
  • FastAI: N/A
  • Tensorflow: 2.11.0
  • Torch: 1.13.1
  • Jinja2: 3.1.2
  • Graphviz: N/A
  • Pydot: N/A

Who can help?

@ArthurZucker @amyeroberts @NielsRogge

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Run the following script:
from transformers import CLIPProcessor
import torch
model_name_or_path = "openai/clip-vit-large-patch14"
processor: CLIPProcessor = CLIPProcessor.from_pretrained(
            model_name_or_path
        )

dummy_input = torch.randn(10, 3, 224, 224)

dummy_output = processor(images=dummy_input, return_tensors="pt")
  1. Look at the monitor to see the error:
 ---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/conda/envs/main/lib/python3.10/site-packages/PIL/Image.py:2953, in fromarray(obj, mode)
   2952 try:
-> 2953     mode, rawmode = _fromarray_typemap[typekey]
   2954 except KeyError as e:

KeyError: ((1, 1, 224, 224), '|u1')

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[2], line 10
      4 processor: CLIPProcessor = CLIPProcessor.from_pretrained(
      5             model_name_or_path
      6         )
      8 dummy_input = torch.randn(10, 3, 224, 224)
---> 10 dummy_output = processor(images=dummy_input, return_tensors="pt")

File /opt/conda/envs/main/lib/python3.10/site-packages/transformers/models/clip/processing_clip.py:85, in CLIPProcessor.__call__(self, text, images, return_tensors, **kwargs)
     82     encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
     84 if images is not None:
---> 85     image_features = self.feature_extractor(images, return_tensors=return_tensors, **kwargs)
     87 if text is not None and images is not None:
     88     encoding["pixel_values"] = image_features.pixel_values
...
-> 2955         raise TypeError("Cannot handle this data type: %s, %s" % typekey) from e
   2956 else:
   2957     rawmode = mode

TypeError: Cannot handle this data type: (1, 1, 224, 224), |u1

Expected behavior

The function should return a preprocessed tensor containing a batch of images.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions