add Unified-IO #19081

thedarkzeno · 2022-09-17T02:39:28Z

Model description

I'd like to request the addition of the Unified-IO model. It is a multimodal model capable of visual question answering, image generation and more...
the repo is this: https://github.com/allenai/unified-io-inference
the paper: Unified-IO: Sequential Modeling for Generally Applicable Vision Models

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

https://github.com/allenai/unified-io-inference

marinone94 · 2022-09-30T10:38:44Z

Hi, have you started working on the issue? Do you plan to integrate it yourself?

alceballosa · 2022-10-20T15:24:36Z

I'd like to work on this issue, is there any documentation on adding new models that I should follow?

ChanBong · 2023-01-22T18:09:54Z

I would like to work on this one.

kumar-devesh · 2023-03-07T10:57:51Z

@NielsRogge @alaradirik If no one else is currently working on adding this model, I would like to work on it.

alceballosa · 2023-03-07T11:45:59Z

Hi @kumar-devesh , I'm working on it (made some progress toward getting a working version of the Discrete VAE in Torch) but @osanseviero told me that it would be better to verify if there's interest from the development team. If they're ok with it then we could work on it together.

osanseviero · 2023-03-07T19:17:05Z

cc @sgugger @amyeroberts

alaradirik · 2023-03-08T12:26:13Z

Hi @ChanBong @kumar-devesh @alceballosa, Unified-IO would be a great addition to the library.

If you are not familiar with contributing to transformers, you can refer to the guidelines to get started. I'd recommend checking if you can run the original repo without any issues and get the expected results first.

Here are some summarised points that might help with model addition:

Each model, including different checkpoints of the same model, has it's own repo on the Hub (see DETR-ResNet-50 repo as an example). This is basically a git repo that stores the checkpoint specific configuration, preprocessing configuration and the model weights.
The code added to transformers acts as a boilerplate to initialise the model and load different checkpoints - Unified-IO trained on different datasets and/or with different resolution and/or larger / smaller architecture.
configuration_unifiedio.py should contain all the hyperparameters, the input image size and architectural details (e.g. number of hidden layers) to initialize the model.
Multi-modal models (e.g. CLIP, ALIGN) have a Processor class that capsulates Tokenizer and ImageProcessor classes that preprocesses the text and image inputs.
- image_processing_unifiedio.py should contain the ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.)
- tokenizer_unifiedio.py should contain the Tokenizer class that preprocesses the raw input text.
- processor_unifiedio.py combines the two to preprocess image-text pair inputs.
modeling_unifiedio.py should contain the model definition.
The conversion script:
- Loads the pretrained original model and randomly initializes the HF implementation with the corresponding configuration
- Copies the pretrained parameters (weights and biases) of the original model to the corresponding parameters of the randomly initialized HF model (the conversion step)
- Forward propagates an arbitrary input (text + image in this case) through both the original model and converted HF model and checks if the outputs match
- Uploads the converted HF model to the hub
Each model, tokenizer, image processor and processor class is tested with scripts under tests/models/<MODEL_NAME>/ , you can refer to other test files to see what tests to add.

Once you are done, you would need to run the following commands to check the PR passes all CI tests:

make style
make quality
make repo-consistency

RUN_SLOW=TRUE pytest tests/models/unifiedio/test_modeling_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_image_processor_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_tokenizer_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_processor_unifiedio.py

We can do an in-depth review or create a Slack channel to address questions and issues once there is a draft PR.

Hope this helps!

thedarkzeno added the New model label Sep 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Unified-IO #19081

add Unified-IO #19081

thedarkzeno commented Sep 17, 2022

marinone94 commented Sep 30, 2022

alceballosa commented Oct 20, 2022

ChanBong commented Jan 22, 2023

kumar-devesh commented Mar 7, 2023

alceballosa commented Mar 7, 2023

osanseviero commented Mar 7, 2023

alaradirik commented Mar 8, 2023 •

edited

add Unified-IO #19081

add Unified-IO #19081

Comments

thedarkzeno commented Sep 17, 2022

Model description

Open source status

Provide useful links for the implementation

marinone94 commented Sep 30, 2022

alceballosa commented Oct 20, 2022

ChanBong commented Jan 22, 2023

kumar-devesh commented Mar 7, 2023

alceballosa commented Mar 7, 2023

osanseviero commented Mar 7, 2023

alaradirik commented Mar 8, 2023 • edited

alaradirik commented Mar 8, 2023 •

edited