Skip to content

[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Notifications You must be signed in to change notification settings

Mamadou-Keita/VLM-DETECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

This repository is an official implementation of the ICASSP 2024 paper "Harnessing the Power of Large Vision Language Models for Synthetic Image Detection".

☀️ If you find this work useful for your research, please kindly star our repo and cite our paper! ☀️

assets/approach.png

Low Rank Adaptation

assets/LowRank.png

Requirements

pip install -r requirements.txt

SOTA Detection Methods

We use the codes of detection methods provided in the corresponding paper.

Training (Optional)

This step can be skipped, and you can directly test the model in the following section with a pre-trained model.

To train your own model:

python blip2_detect.py --dataset ./data/train.csv --epochs 20 --lr 5e-5 

Evaluation

To run the evaluation, use the following command:

python blip2_test.py --model_path ./SaveFineTune --dataset ./data/test.csv

Performance

After training for 20 epochs, you will obtain accuracy and F1-score scores close to the percentages below:

{'LDM' : 99.12/99.13, 'ADM' : 85.24/82.97, 'DDPM' : 98.47/98.47, 'IDDPM' : 97.02/96.97, 'PNDM' : 99.22/99.23, 'SD v1.4' 77.68/71.79: , 'GLIDE' : 97.09/97.05} 

Dataset

The dataset used in this project is sourced from the work of Towards the Detection of Diffusion Model Deepfakes, available at Link to Original Dataset Repository.

📖 Citation

if you make use of our work, please cite our papers

@article{keita2024harnessing,
  title={Harnessing the Power of Large Vision Language Models for Synthetic Image Detection},
  author={Keita, Mamadou and Hamidouche, Wassim and Bougueffa, Hassen and Hadid, Abdenour and Taleb-Ahmed, Abdelmalik},
  journal={arXiv preprint arXiv:2404.02726},
  year={2024}
}
@article{keita2024bi,
  title={Bi-LORA: A Vision-Language Approach for Synthetic Image Detection},
  author={Keita, Mamadou and Hamidouche, Wassim and Eutamene, Hessen Bougueffa and Hadid, Abdenour and Taleb-Ahmed, Abdelmalik},
  journal={arXiv preprint arXiv:2404.01959},
  year={2024}
}

About

[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages