Skip to content

Complete codes for Vision Language Model fine-tuning under TIL AI BrainHack - Advanced Track

Notifications You must be signed in to change notification settings

lhurr/GroundingDINO-BrainHack

Repository files navigation

Minimal setup for fine-tuning GroundingDINO (VLM)

  1. Modify config/dataset_OD.json, config/label_map.json and config/annotations.json
  2. Download BERT & GroundingDINO SwinT/ GroundingDINO SwinB
  3. Commands to run & inference with GroundingDINO across various configuration
    • Fine tune: bash train_dist.sh 1 config/<CFG_FILE>.py config/dataset_OD.json logs
    • Inference: python tools/inference_on_a_image.py -c tools/<SWIN T/B>.py -p logs/<CHKPT>.pth -i <IMG_PATH>.jpg -t "<CLASS_NAMES>" -o output

Set up (Tested on python 3.7)

pip install -r requirements.txt 
cd models/GroundingDINO/ops
python3 setup.py build install --user
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install --no-build-isolation -e . 

Potential Errors

The detected CUDA version (11.8) mismatches the version that was used to compile PyTorch (12.1)

References:

About

Complete codes for Vision Language Model fine-tuning under TIL AI BrainHack - Advanced Track

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published