월간 데이콘 이미지 기반 질의 응답 AI 경진대회

Public 3rd, Private 2nd

1. Introduction

[주제] 이미지 기반 질의 응답 AI 모델 개발

[설명] 이미지의 id, 해당 이미지와 관련된 질문이 담긴 csv 파일이 실제 이미지와 함께 데이터셋으로 제공. 이미지에서 확인할 수 있는 정보들을 바탕으로, 질문에 대해 올바르게 답변할 수 있는 AI 모델을 개발하는 것이 목표.

[링크] https://dacon.io/competitions/official/236118/overview/description

2. Data

data
├─  image
│   ├─  train : 107,231개
│   │   ├─  train_000000.png
│   │   ├─  train_000001.png
│   │   └─  ...
│   └─  test : 11,915개
│       ├─  test_00000.png
│       ├─  test_00001.png
│       └─  ...
├─  train.csv
|    ├─  ID : 질문 ID
|    ├─  image_id : 이미지 ID
|    ├─  question : 이미지 관련 질문
|    └─  answer : 질문에 대한 답변
├─  test.csv
|    ├─  ID : 질문 ID
|    ├─  image_id : 이미지 ID
|    └─  question : 이미지 관련 질문
└─  sample_submission.csv
     ├─  ID : 질문 ID
     └─  *answer : 질문에 대한 답변

3. Setup

In Colab-PRO or PRO+ Users only
Set up for GPU A100

Clone LLaVA

!git clone https://github.com/haotian-liu/LLaVA.git
%cd /content/LLaVA

Install

!pip install --upgrade pip
!pip install -e .
!pip install ninja
!pip install flash-attn --no-build-isolation

Clone Vicuna

!git clone https://huggingface.co/lmsys/vicuna-7b-v1.3

Download Data

# Download directly
!gdown https://drive.google.com/u/0/uc?id=1a9XB3r83ZCFWLOHBp8ooz3zQFl9rEIei&export=download

Preprocessing

You could get 'output.json' and 'test.json' file
If else, download our file and run it in your '/content' directory

%cd /content
!git clone https://github.com/geuk-hub/-Dacon-Multimodal-vqa.git

%cd /content/-Dacon-Multimodal-vqa
!python preprocessing.py

4. Run

For recording wandb
- put your API

%cd /content/LLaVA
!pip install wandb
!wandb login

Train

!python /content/LLaVA/llava/train/train_mem.py \
    --model_name_or_path /content/LLaVA/vicuna-7b-v1.3 \
    --version v1 \
    --data_path /content/output.json \
    --image_folder /content/data/image/train \
    --vision_tower openai/clip-vit-large-patch14 \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end \
    --bf16 True \
    --output_dir /content/drive/MyDrive/LLaVA checkpoint \
    --num_train_epochs 1 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2400 \
    --save_total_limit 1 \
    --learning_rate 2e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 128 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

5. Re-training

You should put 'vicuna' to your model-name
output_dir folder should be contained 'checkpoint-*'
num_train_epochs must have started from 2 or more

!python /content/LLaVA/llava/train/train_mem.py \
    --model_name_or_path /content/LLaVA/vicuna-7b-v1.3\
    --version v1 \
    --data_path /content/output.json \
    --image_folder /content/data/image/train \
    --vision_tower openai/clip-vit-large-patch14 \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end \
    --bf16 True \
    --output_dir /content/drive/MyDrive/LLaVA/checkpoint-2400 \
    --num_train_epochs 2 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2400 \
    --save_total_limit 1 \
    --learning_rate 2e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.00 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 128 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

6. Inference

You should change output_dir name 'checkpoint-' to 'llava-"
- May be you might get a difference whether the name contains 'llava' or not

%cd /content/LLaVA
!python /content/dacon-multimodal-vqa/eval/model_vqa.py \
    --model-path /content/drive/MyDrive/LLaVA checkpoint/LLaVA-7B-v1.3 \
    --model-base lmsys/vicuna-7b-v1.3 \
    --question-file \
    /content/dacon-multimodal-vqa/test.jsonl \
    --image-folder \
   /content/image/test \
    --answers-file \
    /content/result.jsonl \

7. Submission

%cd /content/dacon-multimodal-vqa
!python submission.py

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
eval		eval
model		model
other models		other models
LLaVA_train.ipynb		LLaVA_train.ipynb
README.md		README.md
preprocessing.py		preprocessing.py
submission.py		submission.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval

eval

model

model

other models

other models

LLaVA_train.ipynb

LLaVA_train.ipynb

README.md

README.md

preprocessing.py

preprocessing.py

submission.py

submission.py

Repository files navigation

월간 데이콘 이미지 기반 질의 응답 AI 경진대회

1. Introduction

2. Data

3. Setup

Clone LLaVA

Install

Clone Vicuna

Download Data

Preprocessing

4. Run

5. Re-training

6. Inference

7. Submission

About

Releases

Packages

Languages

geuk-hub/-Dacon-Multimodal-vqa

Folders and files

Latest commit

History

Repository files navigation

월간 데이콘 이미지 기반 질의 응답 AI 경진대회

1. Introduction

2. Data

3. Setup

Clone LLaVA

Install

Clone Vicuna

Download Data

Preprocessing

4. Run

5. Re-training

6. Inference

7. Submission

About

Resources

Stars

Watchers

Forks

Languages