processor_nougat has wrong default data type #26597

NormXU · 2023-10-04T15:03:31Z

System Info

transformers version: 4.34.0
Platform: Linux-6.2.0-26-generic-x86_64-with-glibc2.27
Python version: 3.8.0
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.3-post.1
Accelerate version: 0.22.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
Jax version: 0.4.13

Who can help?

@amyeroberts @ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

The nougat processor fails to work. The test code I run is pasted as below:

PRETRAINED_PATH_TO_NOUGAT = ""
processor = NougatProcessor.from_pretrained(PRETRAINED_PATH_TO_NOUGAT)
model = VisionEncoderDecoderModel.from_pretrained(PRETRAINED_PATH_TO_NOUGAT")

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model.to(device)
# prepare PDF image for the model
filepath = "/path/to/dummy/image.png"
image = Image.open(filepath)
pixel_values = processor(image, return_tensors="pt").pixel_values

# generate transcription (here we only generate 30 tokens)
outputs = model.generate(
    pixel_values.to(device),
    min_length=1,
    max_new_tokens=512,
    bad_words_ids=[[processor.tokenizer.unk_token_id]],
)

sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = processor.post_process_generation(sequence, fix_markdown=False)

The error log is as below:

Traceback (most recent call last):
  File "/home/ysocr/tests/test_generate.py", line 15, in <module>
    pixel_values = processor(image, return_tensors="pt").pixel_values
  File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/processing_nougat.py", line 91, in __call__
    inputs = self.image_processor(
  File "/home/venv/lib/python3.8/site-packages/transformers/image_processing_utils.py", line 546, in __call__
    return self.preprocess(images, **kwargs)
  File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/image_processing_nougat.py", line 505, in preprocess
    images = [
  File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/image_processing_nougat.py", line 506, in <listcomp>
    to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
  File "/home/venv/lib/python3.8/site-packages/transformers/image_transforms.py", line 78, in to_channel_dimension_format
    target_channel_dim = ChannelDimension(channel_dim)
  File "/usr/lib/python3.8/enum.py", line 304, in __call__
    return cls.__new__(cls, value)
  File "/usr/lib/python3.8/enum.py", line 595, in __new__
    raise exc
  File "/usr/lib/python3.8/enum.py", line 579, in __new__
    result = cls._missing_(value)
  File "/home/venv/lib/python3.8/site-packages/transformers/utils/generic.py", line 433, in _missing_
    raise ValueError(
ValueError: ChannelDimension.FIRST is not a valid ChannelDimension, please select one of ['channels_first', 'channels_last']

After checking the codes, I found it is the default data type of data_format that leads to this error. I believe the expected data type of data_format should be Optional[ChannelDimension] = ChannelDimension.FIRST rather than Optional["ChannelDimension"] = "ChannelDimension.FIRST". Besides, it is weird that default datatype of resampleand input_data_format is "PILImageResampling" and "ChannelDimension" respectively. See line 55, line 64 and line 65.

transformers/src/transformers/models/nougat/processing_nougat.py

Lines 55 to 66 in 6015f91

    
           resample: "PILImageResampling" = None,  # noqa: F821 
        
           do_thumbnail: bool = None, 
        
           do_align_long_axis: bool = None, 
        
           do_pad: bool = None, 
        
           do_rescale: bool = None, 
        
           rescale_factor: Union[int, float] = None, 
        
           do_normalize: bool = None, 
        
           image_mean: Optional[Union[float, List[float]]] = None, 
        
           image_std: Optional[Union[float, List[float]]] = None, 
        
           data_format: Optional["ChannelDimension"] = "ChannelDimension.FIRST",  # noqa: F821 
        
           input_data_format: Optional[Union[str, "ChannelDimension"]] = None,  # noqa: F821 
        
           text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,

I notice @ArthurZucker made such changes and added some comments. It could be a bug or maybe it is just some design I misunderstand?

Expected behavior

Ensure the nougat example works.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-10-04T15:36:32Z

Thanks for reporting I'll open a PR for a fix asap

ArthurZucker mentioned this issue Oct 5, 2023

[ NougatProcessor] Fix the default channel #26608

Merged

LysandreJik closed this as completed in #26608 Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

processor_nougat has wrong default data type #26597

processor_nougat has wrong default data type #26597

NormXU commented Oct 4, 2023 •

edited

Loading

ArthurZucker commented Oct 4, 2023

processor_nougat has wrong default data type #26597

processor_nougat has wrong default data type #26597

Comments

NormXU commented Oct 4, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Oct 4, 2023

NormXU commented Oct 4, 2023 •

edited

Loading