Error when passing a tensor of images to CLIPProcessor

### System Info

- huggingface_hub version: 0.11.1
- Platform: Linux-4.19.0-23-cloud-amd64-x86_64-with-glibc2.31
- Python version: 3.10.8
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /root/.huggingface/token
- Has saved token ?: False
- Configured git credential helpers: !f()
- FastAI: N/A
- Tensorflow: 2.11.0
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A

### Who can help?

@ArthurZucker @amyeroberts @NielsRogge 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

1. Run the following script:

```python
from transformers import CLIPProcessor
import torch
model_name_or_path = "openai/clip-vit-large-patch14"
processor: CLIPProcessor = CLIPProcessor.from_pretrained(
            model_name_or_path
        )

dummy_input = torch.randn(10, 3, 224, 224)

dummy_output = processor(images=dummy_input, return_tensors="pt")
```

2. Look at the monitor to see the error: 
```
 ---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/conda/envs/main/lib/python3.10/site-packages/PIL/Image.py:2953, in fromarray(obj, mode)
   2952 try:
-> 2953     mode, rawmode = _fromarray_typemap[typekey]
   2954 except KeyError as e:

KeyError: ((1, 1, 224, 224), '|u1')

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[2], line 10
      4 processor: CLIPProcessor = CLIPProcessor.from_pretrained(
      5             model_name_or_path
      6         )
      8 dummy_input = torch.randn(10, 3, 224, 224)
---> 10 dummy_output = processor(images=dummy_input, return_tensors="pt")

File /opt/conda/envs/main/lib/python3.10/site-packages/transformers/models/clip/processing_clip.py:85, in CLIPProcessor.__call__(self, text, images, return_tensors, **kwargs)
     82     encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
     84 if images is not None:
---> 85     image_features = self.feature_extractor(images, return_tensors=return_tensors, **kwargs)
     87 if text is not None and images is not None:
     88     encoding["pixel_values"] = image_features.pixel_values
...
-> 2955         raise TypeError("Cannot handle this data type: %s, %s" % typekey) from e
   2956 else:
   2957     rawmode = mode

TypeError: Cannot handle this data type: (1, 1, 224, 224), |u1
```

### Expected behavior

The function should return a preprocessed tensor containing a batch of images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when passing a tensor of images to CLIPProcessor #21142

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when passing a tensor of images to CLIPProcessor #21142

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions