SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Tongtian Yue^1,3* , Jie Cheng^2,3* , Longteng Guo^1,3* , Xingyuan Dai^2,3 , Zijia Zhao^1,3 , Xingjian He^1,3   Gang Xiong^2,3   Yisheng Lv^2,3   Jing Liu^1,3†
¹Laboratory of Cognition and Decision Intelligence for Complex Systems, CASIA
²State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA
³School of Artificial Intelligence, University of Chinese Academy of Sciences

CVPR, 2024

Requirements

Installation

Create a conda environment and install dependencies:

conda create -n sc_tune python=3.10
conda activate sc_tune
pip install -r requirements.txt

Data

Download the Qwen-VL-Chat checkpoint (10 *.bin files in total) to the path Qwen-VL-Chat/ and Object365 images.

Note

We have modified the codes in Qwen-VL-Chat/visual.py. Please replace the original file with the one in this repo if necessary.

Get Started

Configs

Set the path of Object365 images in scripts/finetune_ds.sh. Other hyperparameters can also be found in this file.

Running

sh scripts/finetune_ds.sh

Main codes

The main codes to implement sc-tune method are in transformers/trainer.py and transformers/trainer_utils.py.

Acknowledgement

This repo benefits from Qwen-VL, TRL, and MOSS. Thanks for their wonderful work.

Citation

@article{yue2024sc,
  title={SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models},
  author={Yue, Tongtian and Cheng, Jie and Guo, Longteng and Dai, Xingyuan and Zhao, Zijia and He, Xingjian and Xiong, Gang and Lv, Yisheng and Liu, Jing},
  journal={arXiv preprint arXiv:2403.13263},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Qwen-VL-Chat		Qwen-VL-Chat
data		data
dataset		dataset
scripts		scripts
transformers		transformers
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
finetune.py		finetune.py
requirements.txt		requirements.txt

License

ivattyue/SC-Tune

Folders and files

Latest commit

History

Repository files navigation

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Requirements

Installation

Data

Get Started

Configs

Running

Main codes

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Languages