Do DALL-E and Flamingo Understand Each Other?

Project Website | ICCV 2023 Paper

This is the official code repository for the ICCV 2023 paper titled "Do DALL-E and Flamingo Understand Each Other?" Should you have any inquiries or require further assistance, please do not hesitate to reach us out.

Installation Guide

The code base has been verified with python 3.9, CUDA Version 11.7, CUDA Driver Version 515.65.01. To get started, follow the steps below:

Create a new conda environment with Python 3.9:

conda create -n pytorch python=3.9
conda activate pytorch

Install the required dependencies:

git clone git@github.com:hangligit/DalleFlamingo.git
cd DalleFlamingo
pip install -r requirements.txt
pip install -e transformers

Finetuned Weights

You can find our finetuned weights of BLIP and SD available for download:

BLIP-Base | BLIP-Large | BLIP2 | SD-w/-Base | SD-w/-Large

Data Download

To access the training data, please download the dataset from here and organize the data in the following structure. In the "captions" directory, you will find symbolic links to the actual image files, which should be stored in the "images" folder. The actual images can be downloaded from the COCO website.

/root_dir/datasets/coco/
    --captions
        --train
        --val
        --test
    --images
        --train2014
        --val2014
        --test2014

Training Guide

After installation, use the following script to finetune the BLIP and SD model:

python train.py

Evaluation Guide

After completing the training, you can evaluate the performance using the following two scripts:

Evaluate Image-to-Text

python evaluate_blip.py

Evaluate Text-to-Image

python evaluate_sd.py

Citing our work

If you find our work valuable, please consider citing our research as follows:

@InProceedings{Li_2023_ICCV,
        author    = {Li, Hang and Gu, Jindong and Koner, Rajat and Sharifzadeh, Sahand and Tresp, Volker},
        title     = {Do DALL-E and Flamingo Understand Each Other?},
        booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
        month     = {October},
        year      = {2023},
        pages     = {1999-2010}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
annotation		annotation
configs		configs
data		data
libs		libs
models		models
pretrained		pretrained
transform		transform
transformers		transformers
vocabs		vocabs
.gitignore		.gitignore
README.md		README.md
config.py		config.py
evaluate_blip.py		evaluate_blip.py
evaluate_blip_lib.py		evaluate_blip_lib.py
evaluate_sd.py		evaluate_sd.py
evaluate_sd_generation.py		evaluate_sd_generation.py
evaluate_sd_lib.py		evaluate_sd_lib.py
generation_blip_data_lib.py		generation_blip_data_lib.py
generation_blip_lib.py		generation_blip_lib.py
generation_sd_lib.py		generation_sd_lib.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
utils_model.py		utils_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Do DALL-E and Flamingo Understand Each Other?

Project Website | ICCV 2023 Paper

Installation Guide

Finetuned Weights

Data Download

Training Guide

Evaluation Guide

Evaluate Image-to-Text

Evaluate Text-to-Image

Citing our work

About

Releases

Packages

Languages

hangligit/DalleFlamingo

Folders and files

Latest commit

History

Repository files navigation

Do DALL-E and Flamingo Understand Each Other?

Project Website | ICCV 2023 Paper

Installation Guide

Finetuned Weights

Data Download

Training Guide

Evaluation Guide

Evaluate Image-to-Text

Evaluate Text-to-Image

Citing our work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages