This is the official code repository for the ICCV 2023 paper titled "Do DALL-E and Flamingo Understand Each Other?" Should you have any inquiries or require further assistance, please do not hesitate to reach us out.
The code base has been verified with python 3.9, CUDA Version 11.7, CUDA Driver Version 515.65.01. To get started, follow the steps below:
- Create a new conda environment with Python 3.9:
conda create -n pytorch python=3.9
conda activate pytorch
- Install the required dependencies:
git clone git@github.com:hangligit/DalleFlamingo.git
cd DalleFlamingo
pip install -r requirements.txt
pip install -e transformers
You can find our finetuned weights of BLIP and SD available for download:
BLIP-Base | BLIP-Large | BLIP2 | SD-w/-Base | SD-w/-Large
To access the training data, please download the dataset from here and organize the data in the following structure. In the "captions" directory, you will find symbolic links to the actual image files, which should be stored in the "images" folder. The actual images can be downloaded from the COCO website.
/root_dir/datasets/coco/
--captions
--train
--val
--test
--images
--train2014
--val2014
--test2014
After installation, use the following script to finetune the BLIP and SD model:
python train.py
After completing the training, you can evaluate the performance using the following two scripts:
python evaluate_blip.py
python evaluate_sd.py
If you find our work valuable, please consider citing our research as follows:
@InProceedings{Li_2023_ICCV,
author = {Li, Hang and Gu, Jindong and Koner, Rajat and Sharifzadeh, Sahand and Tresp, Volker},
title = {Do DALL-E and Flamingo Understand Each Other?},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {1999-2010}
}