Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment
In this letter, we propose vision-language consistency guided multi-modal prompt learning for blind AGIQA, dubbed CLIP-AGIQA. Specifically, we introduce learnable textual and visual prompts in language and vision branches of CLIP models, respectively. Moreover, we design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
First, download datasets AGIQA3k and AGIQA2023.
Second, update path of datasets defined in train_test_clip_auxiliary.py
path = {
'AGIQA3k': '/home/fujun/datasets/iqa/AGIQA-3K',
'AGIQA2023': '/home/fujun/datasets/iqa/AIGC2023/DATA/'
}
Third, train and test the model using the following command:
python train_test_clip_auxiliary.py --dataset AGIQA3k --model AGIQA
Finally, check the results in the folder ./log
.
This project is based on MaPLe, DBCNN, and CLIP-IQA. Thanks for these awesome works.
Please cite the following paper if you use this repository in your research.
@article{fu2024vision,
title={Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment},
author={Fu, Jun and Zhou, Wei and Jiang, Qiuping and Liu, Hantao and Zhai, Guangtao},
journal={IEEE Signal Processing Letters},
year={2024},
publisher={IEEE}
}
For any questions, feel free to contact: fujun@mail.ustc.edu.cn