-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Image To Text Generation pipeline #18821
Add Image To Text Generation pipeline #18821
Conversation
The documentation is not available anymore as the PR was closed or merged. |
7eaba04
to
c687369
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8bdb156
to
ac3dabc
Compare
# parse inputs. In the Tensorflow version, `generate` raises an error if we don't use `input_ids` whereas | ||
# the PyTorch version matches it with `self.model.main_input_name` or `self.model.encoder.main_input_name` | ||
# in the `_prepare_model_inputs` method. | ||
inputs = model_inputs.pop(self.model.main_input_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Narsil,
What do you think of this? The issue is that model_inputs
is a dict with only a pixel_values
entry. While PyTorch generate utils is able to understand that pixel_values
must be matched with the encoder input, Tensorflow's isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nice workaround.
It would be better if it wasn't there, I would ask @gante about this (Worked on TF generate quite extensively).
I expect it's a TF limit that it needs to be consistent in the naming hence the issue about input_ids
.
But maybe there's cleaner code overall to be done in the lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @OlivierDehaene @Narsil 👋 This seems to be a limitation on the TF generation side. I've taken note of the issue, including the removal of this line :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this new pipeline!
ac3dabc
to
c52c38c
Compare
Fix tests
c52c38c
to
dc0cc53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @mishig25, the inference widgets for models like TrOCR, Donut, image captioning can now be created! 🥳 |
* Add Image2TextGenerationPipeline to supported pipelines * Add Flax and Tensorflow support * Add Flax and Tensorflow small tests * Add default model for Tensorflow * Add docstring * Fix doc style * Add tiny models for pytorch and flax * Remove flax from pipeline. Fix tests * Use ydshieh/vit-gpt2-coco-en as a default for both PyTorch and Tensorflow * Fix Tensorflow support Co-authored-by: Olivier Dehaene <olivier@huggingface.co>
What does this PR do?
Add Image To Text Generation pipeline. The pipeline currently defaults to nlpconnect/vit-gpt2-image-captioning.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
This features was asked for by @Narsil.