Public 3rd, Private 2nd
[주제] 이미지 기반 질의 응답 AI 모델 개발
[설명] 이미지의 id, 해당 이미지와 관련된 질문이 담긴 csv 파일이 실제 이미지와 함께 데이터셋으로 제공. 이미지에서 확인할 수 있는 정보들을 바탕으로, 질문에 대해 올바르게 답변할 수 있는 AI 모델을 개발하는 것이 목표.
[링크] https://dacon.io/competitions/official/236118/overview/description
data
├─ image
│ ├─ train : 107,231개
│ │ ├─ train_000000.png
│ │ ├─ train_000001.png
│ │ └─ ...
│ └─ test : 11,915개
│ ├─ test_00000.png
│ ├─ test_00001.png
│ └─ ...
├─ train.csv
| ├─ ID : 질문 ID
| ├─ image_id : 이미지 ID
| ├─ question : 이미지 관련 질문
| └─ answer : 질문에 대한 답변
├─ test.csv
| ├─ ID : 질문 ID
| ├─ image_id : 이미지 ID
| └─ question : 이미지 관련 질문
└─ sample_submission.csv
├─ ID : 질문 ID
└─ *answer : 질문에 대한 답변
- In Colab-PRO or PRO+ Users only
- Set up for GPU A100
!git clone https://github.com/haotian-liu/LLaVA.git
%cd /content/LLaVA
!pip install --upgrade pip
!pip install -e .
!pip install ninja
!pip install flash-attn --no-build-isolation
!git clone https://huggingface.co/lmsys/vicuna-7b-v1.3
# Download directly
!gdown https://drive.google.com/u/0/uc?id=1a9XB3r83ZCFWLOHBp8ooz3zQFl9rEIei&export=download
- You could get 'output.json' and 'test.json' file
- If else, download our file and run it in your '/content' directory
%cd /content
!git clone https://github.com/geuk-hub/-Dacon-Multimodal-vqa.git
%cd /content/-Dacon-Multimodal-vqa
!python preprocessing.py
- For recording wandb
- put your API
%cd /content/LLaVA
!pip install wandb
!wandb login
- Train
!python /content/LLaVA/llava/train/train_mem.py \
--model_name_or_path /content/LLaVA/vicuna-7b-v1.3 \
--version v1 \
--data_path /content/output.json \
--image_folder /content/data/image/train \
--vision_tower openai/clip-vit-large-patch14 \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--mm_use_im_start_end \
--bf16 True \
--output_dir /content/drive/MyDrive/LLaVA checkpoint \
--num_train_epochs 1 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 16 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 128 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb
- You should put 'vicuna' to your model-name
- output_dir folder should be contained 'checkpoint-*'
- num_train_epochs must have started from 2 or more
!python /content/LLaVA/llava/train/train_mem.py \
--model_name_or_path /content/LLaVA/vicuna-7b-v1.3\
--version v1 \
--data_path /content/output.json \
--image_folder /content/data/image/train \
--vision_tower openai/clip-vit-large-patch14 \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--mm_use_im_start_end \
--bf16 True \
--output_dir /content/drive/MyDrive/LLaVA/checkpoint-2400 \
--num_train_epochs 2 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 16 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.00 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 128 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb
- You should change output_dir name 'checkpoint-' to 'llava-"
- May be you might get a difference whether the name contains 'llava' or not
%cd /content/LLaVA
!python /content/dacon-multimodal-vqa/eval/model_vqa.py \
--model-path /content/drive/MyDrive/LLaVA checkpoint/LLaVA-7B-v1.3 \
--model-base lmsys/vicuna-7b-v1.3 \
--question-file \
/content/dacon-multimodal-vqa/test.jsonl \
--image-folder \
/content/image/test \
--answers-file \
/content/result.jsonl \
%cd /content/dacon-multimodal-vqa
!python submission.py