Skip to content

opendatalab/VIGC

Repository files navigation

VIGC: Visual Instruction Generation and Correction

We propose Visual Instruction Generation and Correction (VIGC), a framework capable of autonomously generating high-quality image-text instruction fine-tuning datasets.



Table of Contents

Getting Started

Installation

  1. (Optional) Creating conda environment

    conda create -n vigc python=3.8
    conda activate vigc
  2. Install mmpretrain

    you can follow the tutorial

  3. You may build from source

    git clone https://gitlab.pjlab.org.cn/fdc/mllm/vigc.git
    cd vigc
    pip install -e .

Prepare Models

  1. obtain vicuna model

    Vicuna is an open-source LLAMA-based LLM that has a performance close to ChatGPT. We currently use the v1.1 version of Vicuna-13B and 7B. If you already have the Vicuna weights with correct version, modify the llm_model in Model Config to the folder that contains your Vicuna weights. Otherwise, you can follow this instruction to get them, remenber that modify the config file too.

  2. download pretrain model

    We support two different kinds of pretrain checkpoints to load from: minigpt-4 and instrucblip. You can download them from the table below, more details please visit their original repositories: minigpt-4 and instrucblip.

    Model Type Checkpoint pretrained with Vicuna 7B Checkpoint pretrained with Vicuna 13B
    minigpt-4 Download Download
    instrucblip Download Download

    After download the pretrained checkpoints, please modify the pretrained in Model Config to the folder that contains pretrain weights.

  3. download fintuned vigc model

    Download the pretrained vigc checkpoints according to fintuned dataset and the Vicuna model you prepared.

    Fintuned Dataset Checkpoint Fintuned with Vicuna 7B Checkpoint Fintuned with Vicuna 13B
    LLaVA Download Download
    OKVQA Download /
    A-OKVQA Download /

Launching Demo

To Launch a demo locally, you should:

  1. Download the pretrain weight and finetune weight of minigpt-4 and instructblip to local;

  2. Update MODEL_CKPT in line 9 of vigc_demo.py

  3. Run python vigc_demo.py and then follow the instruction on the prompts to view in browser. Arguments are as follows:

    • device0: The gpu id of the first model

    • device1: The gpu id of the second model

​ You can also visit Open in OpenXLab to play with VIGC online demo.

Tutorials

Generate QA

  1. generate QA based on COCO2017 for Llava

    1. You should first download the finetuned vigc model
    2. Then modify the finetuned in corresponding Inference Config to the path to the checkpoint file.
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna7b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_detail.yaml   # generate detail description data for Llava using MiniGPT4-vicuna7b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna7b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna13b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_detail.yaml   # generate detail description data for Llava using MiniGPT4-vicuna13b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna13b
  2. generate QA based on Object365 for Llava

    1. You should first download the finetuned vigc model
    2. Then modify the finetuned in corresponding Inference Config to the path to the checkpoint file.
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna7b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_detail.yaml  # generate detail description data for Llava using MiniGPT4-vicuna7b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna7b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_conv.yaml   # generate conversation data for Llava using MiniGPT4-vicuna13b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_detail.yaml   # generate detail description data for Llava using MiniGPT4-vicuna13b
    
    torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_complex.yaml   # generate complex reasoning data for Llava using MiniGPT4-vicuna13b
  3. generate QA based on COCO2017 for A-OKVQA or OKVQA

    1. You should first download the finetuned vigc model

    2. Then modify the pretrained in corresponding Inference Config to the path to the checkpoint file.

    3. Generate the question first:

      torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/a-okvqa/generate_question.yaml   # generate questions for A-OKVQA using instruct-blip-vicuna7b
      
      torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/okvqa/generate_question.yaml   # generate questions for OKVQA using instruct-blip-vicuna7b
    4. Modify the annotaion in generate_answer.yaml to the path of the questions generated in the above step, then generate the answers:

      torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/a-okvqa/generate_answer.yaml   # generate answers for A-OKVQA using instruct-blip-vicuna7b
      
      torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/okvqa/generate_answer.yaml   # generate answers for OKVQA using instruct-blip-vicuna7b

Train VIGC Model

  1. Finetune VIGC Model on A-OKVQA Dataset

    1. download our formatted A-OKVQA json files

    2. download iamges follow the original repo, skip this step if you already have them.

    3. modify images and annotation in these configs:train config, val config, with their actual paths.

    4. run finetune script

      torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/instruct_blip_vicuna7b/vigc/a-okvqa/normal_vigc.yaml
  2. Finetune VIGC Model on OKVQA Dataset

    1. download our formatted OKVQA json files
    2. download iamges follow the original repo, skip this step if you already have them.
    3. modify images and annotation in these configs:train config, val config, with their actual paths.
    4. run finetune script
      torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/instruct_blip_vicuna7b/vigc/okvqa/normal_vigc.yaml
  3. Finetune VIGC Model on LLaVA-150k Dataset

    1. download our formatted LLaVA json files
    2. download iamges follow the original repo, skip this step if you already have them.
    3. modify images and annotation in these configs:conversation config, detail config, complex config, val config, with their actual paths.
    4. run finetune script
      torchrun --nproc_per_node=8 train.py  --cfg-path vigc/projects/mini_gpt4_vicuna7b/vigc/llava-150k/normal_vigc.yaml  # using Mini-GPT4 Vicuna7b
      
      torchrun --nproc_per_node=8 train.py  --cfg-path vigc/projects/mini_gpt4_vicuna13b/vigc/llava-150k/normal_vigc.yaml  # using Mini-GPT4 Vicuna13b

Acknowledgement

  • BLIP2. The model architecture of VIGC follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
  • InstrucBlip and MiniGPT-4. The pretrain models of VIGC are come from InstrucBlip and MiniGPT-4.
  • Lavis. This repository is built upon Lavis!
  • Vicuna. The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
  • LLaVA, A-OKVQA, OKVQA. The model of VIGC are finetuned on these datasets.

Paper and Citing VIGC

You can find more details in our paper.

If you're using VIGC in your research or applications, please cite using this BibTeX:

@article{wang2023vigc, 
      title={VIGC: Visual Instruction Generation and Correction},
      author={Wang, Bin and Wu, Fan and Han, Xiao and Peng, Jiahui and Zhong, Huaping and Zhang, Pan and Dong, Xiaoyi and Li, Weijia and Li, Wei and Wang, Jiaqi and He, Conghui},
      journal={arXiv preprint arXiv:2308.12714},
      year={2023}
}

Contact us

If you have any questions, comments or suggestions, please do not hesitate to contact us at wangbin@pjlab.org.cn or wufan@pjlab.org.cn.

License

Apache License 2.0

About

AAAI 2024: Visual Instruction Generation and Correction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published