Add Image To Text Generation pipeline #18821

OlivierDehaene · 2022-08-30T16:32:28Z

What does this PR do?

Add Image To Text Generation pipeline. The pipeline currently defaults to nlpconnect/vit-gpt2-image-captioning.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

This features was asked for by @Narsil.

src/transformers/models/auto/feature_extraction_auto.py

src/transformers/pipelines/__init__.py

tests/pipelines/test_pipelines_image2text_generation.py

src/transformers/pipelines/__init__.py

HuggingFaceDocBuilderDev · 2022-08-30T17:07:09Z

The documentation is not available anymore as the PR was closed or merged.

Narsil

Very nice first PR.

It looks very nice already.

@mishig25 I don't think we have a widget for text-to-image, do we ? Do you think we could add one ?

@sgugger for a second opinion on this pipeline.

src/transformers/pipelines/image2text_generation.py

tests/pipelines/test_pipelines_image2text_generation.py

src/transformers/pipelines/__init__.py

OlivierDehaene · 2022-08-31T17:31:38Z

src/transformers/pipelines/image2text_generation.py

+        #  parse inputs. In the Tensorflow version, `generate` raises an error if we don't use `input_ids` whereas
+        #  the PyTorch version matches it with `self.model.main_input_name` or `self.model.encoder.main_input_name`
+        #  in the `_prepare_model_inputs` method.
+        inputs = model_inputs.pop(self.model.main_input_name)


@Narsil,
What do you think of this? The issue is that model_inputs is a dict with only a pixel_values entry. While PyTorch generate utils is able to understand that pixel_values must be matched with the encoder input, Tensorflow's isn't.

It's a nice workaround.

It would be better if it wasn't there, I would ask @gante about this (Worked on TF generate quite extensively).
I expect it's a TF limit that it needs to be consistent in the naming hence the issue about input_ids.

But maybe there's cleaner code overall to be done in the lib.

Hi @OlivierDehaene @Narsil 👋 This seems to be a limitation on the TF generation side. I've taken note of the issue, including the removal of this line :D

tests/pipelines/test_pipelines_image2text_generation.py

sgugger

Thanks for adding this new pipeline!

Fix tests

…flow

Narsil

LGTM

NielsRogge · 2022-09-01T16:39:19Z

cc @mishig25, the inference widgets for models like TrOCR, Donut, image captioning can now be created! 🥳

* Add Image2TextGenerationPipeline to supported pipelines * Add Flax and Tensorflow support * Add Flax and Tensorflow small tests * Add default model for Tensorflow * Add docstring * Fix doc style * Add tiny models for pytorch and flax * Remove flax from pipeline. Fix tests * Use ydshieh/vit-gpt2-coco-en as a default for both PyTorch and Tensorflow * Fix Tensorflow support Co-authored-by: Olivier Dehaene <olivier@huggingface.co>

OlivierDehaene commented Aug 30, 2022

View reviewed changes

src/transformers/models/auto/feature_extraction_auto.py Outdated Show resolved Hide resolved

src/transformers/pipelines/__init__.py Show resolved Hide resolved

tests/pipelines/test_pipelines_image2text_generation.py Outdated Show resolved Hide resolved

OlivierDehaene commented Aug 30, 2022

View reviewed changes

src/transformers/pipelines/__init__.py Show resolved Hide resolved

OlivierDehaene force-pushed the image-2-text-generation-pipeline branch 2 times, most recently from 7eaba04 to c687369 Compare August 31, 2022 13:28

OlivierDehaene marked this pull request as ready for review August 31, 2022 13:31

Narsil reviewed Aug 31, 2022

View reviewed changes

OlivierDehaene force-pushed the image-2-text-generation-pipeline branch from 8bdb156 to ac3dabc Compare August 31, 2022 17:28

OlivierDehaene commented Aug 31, 2022

View reviewed changes

tests/pipelines/test_pipelines_image2text_generation.py Show resolved Hide resolved

sgugger approved these changes Aug 31, 2022

View reviewed changes

OlivierDehaene force-pushed the image-2-text-generation-pipeline branch from ac3dabc to c52c38c Compare September 1, 2022 07:55

OlivierDehaene added 10 commits September 1, 2022 12:15

Add Image2TextGenerationPipeline to supported pipelines

64ae3cd

Add Flax and Tensorflow support

11dcb39

Add Flax and Tensorflow small tests

4874550

Add default model for Tensorflow

4252074

Add docstring

4c9e726

Fix doc style

789268a

Add tiny models for pytorch and flax

aba5b2a

Remove flax from pipeline.

1fa2645

Fix tests

Use ydshieh/vit-gpt2-coco-en as a default for both PyTorch and Tensor…

112ea2f

…flow

Fix Tensorflow support

dc0cc53

OlivierDehaene force-pushed the image-2-text-generation-pipeline branch from c52c38c to dc0cc53 Compare September 1, 2022 10:16

Narsil approved these changes Sep 1, 2022

View reviewed changes

sgugger merged commit ddb69e5 into huggingface:main Sep 1, 2022

mishig25 mentioned this pull request Sep 2, 2022

Implement img2text widget huggingface/hub-docs#290

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Image To Text Generation pipeline #18821

Add Image To Text Generation pipeline #18821

OlivierDehaene commented Aug 30, 2022 •

edited

HuggingFaceDocBuilderDev commented Aug 30, 2022 •

edited

Narsil left a comment

OlivierDehaene Aug 31, 2022

Narsil Sep 1, 2022

gante Sep 2, 2022

sgugger left a comment

Narsil left a comment

NielsRogge commented Sep 1, 2022 •

edited

Add Image To Text Generation pipeline #18821

Add Image To Text Generation pipeline #18821

Conversation

OlivierDehaene commented Aug 30, 2022 • edited

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 30, 2022 • edited

Narsil left a comment

Choose a reason for hiding this comment

OlivierDehaene Aug 31, 2022

Choose a reason for hiding this comment

Narsil Sep 1, 2022

Choose a reason for hiding this comment

gante Sep 2, 2022

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

NielsRogge commented Sep 1, 2022 • edited

OlivierDehaene commented Aug 30, 2022 •

edited

HuggingFaceDocBuilderDev commented Aug 30, 2022 •

edited

NielsRogge commented Sep 1, 2022 •

edited