Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Unified-IO #19081

Open
2 tasks done
thedarkzeno opened this issue Sep 17, 2022 · 7 comments
Open
2 tasks done

add Unified-IO #19081

thedarkzeno opened this issue Sep 17, 2022 · 7 comments

Comments

@thedarkzeno
Copy link
Contributor

Model description

I'd like to request the addition of the Unified-IO model. It is a multimodal model capable of visual question answering, image generation and more...
the repo is this: https://github.com/allenai/unified-io-inference
the paper: Unified-IO: Sequential Modeling for Generally Applicable Vision Models

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

https://github.com/allenai/unified-io-inference

@marinone94
Copy link

Hi, have you started working on the issue? Do you plan to integrate it yourself?

@alceballosa
Copy link
Contributor

I'd like to work on this issue, is there any documentation on adding new models that I should follow?

@ChanBong
Copy link

I would like to work on this one.

@kumar-devesh
Copy link

@NielsRogge @alaradirik If no one else is currently working on adding this model, I would like to work on it.

@alceballosa
Copy link
Contributor

Hi @kumar-devesh , I'm working on it (made some progress toward getting a working version of the Discrete VAE in Torch) but @osanseviero told me that it would be better to verify if there's interest from the development team. If they're ok with it then we could work on it together.

@osanseviero
Copy link
Member

cc @sgugger @amyeroberts

@alaradirik
Copy link
Contributor

alaradirik commented Mar 8, 2023

Hi @ChanBong @kumar-devesh @alceballosa, Unified-IO would be a great addition to the library.

If you are not familiar with contributing to transformers, you can refer to the guidelines to get started. I'd recommend checking if you can run the original repo without any issues and get the expected results first.

Here are some summarised points that might help with model addition:

  • Each model, including different checkpoints of the same model, has it's own repo on the Hub (see DETR-ResNet-50 repo as an example). This is basically a git repo that stores the checkpoint specific configuration, preprocessing configuration and the model weights.
  • The code added to transformers acts as a boilerplate to initialise the model and load different checkpoints - Unified-IO trained on different datasets and/or with different resolution and/or larger / smaller architecture.
  • configuration_unifiedio.py should contain all the hyperparameters, the input image size and architectural details (e.g. number of hidden layers) to initialize the model.
  • Multi-modal models (e.g. CLIP, ALIGN) have a Processor class that capsulates Tokenizer and ImageProcessor classes that preprocesses the text and image inputs.
    • image_processing_unifiedio.py should contain the ImageProcessor class that takes in the raw input image and preprocesses it to the format expected as input to the model (resizing to a fixed input size, normalization, cropping, etc.)
    • tokenizer_unifiedio.py should contain the Tokenizer class that preprocesses the raw input text.
    • processor_unifiedio.py combines the two to preprocess image-text pair inputs.
  • modeling_unifiedio.py should contain the model definition.
  • The conversion script:
    • Loads the pretrained original model and randomly initializes the HF implementation with the corresponding configuration
    • Copies the pretrained parameters (weights and biases) of the original model to the corresponding parameters of the randomly initialized HF model (the conversion step)
    • Forward propagates an arbitrary input (text + image in this case) through both the original model and converted HF model and checks if the outputs match
    • Uploads the converted HF model to the hub
  • Each model, tokenizer, image processor and processor class is tested with scripts under tests/models/<MODEL_NAME>/ , you can refer to other test files to see what tests to add.

Once you are done, you would need to run the following commands to check the PR passes all CI tests:

make style
make quality
make repo-consistency

RUN_SLOW=TRUE pytest tests/models/unifiedio/test_modeling_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_image_processor_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_tokenizer_unifiedio.py
RUN_SLOW=TRUE pytest tests/models/unifiedio/test_processor_unifiedio.py

We can do an in-depth review or create a Slack channel to address questions and issues once there is a draft PR.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants