🌟 Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

This 🖥️📦 Repository corresponds to our 📚📄 Paper towards Biases in datasets for AI-Generated Images Detection. As discussed detailed in the paper, experiments are examined on the GenImage dataset.

Unbiased GenImage dataset

1) Download

To use our Unbiased GenImage dataset, you first need to download the original GenImage dataset and our additional metadata CSV which contains additional information about jpeg QF, size and content of each image. This CSV is needed for our training and validation code.

⬇️ We provide an easy GenImage (and metadata CSV) download here (~500GB): DOWNLOAD. Furthermore, we removed corrupted files from the BAIDU GenImage download.
Use our download-script like this, since the web interface doesn't allow downloading all files together:

python download_genimage.py <--continue> <--destination {path}>

--continue: Optional. Skip files if they already exist. Default is to start a new download.
--destination {path}: Optional. Specify a custom directory where the files should be downloaded. Default is ./GenImage_download

Then get the final zip file using:

cat GenImage.z* > ../GenImage_restored.zip

ℹ️ NOTE: By now, there's an easy GenImage download on Google Drive. We recommend downloading the GenImage dataset there and only downloading the metadata.csv from our dataverse. ℹ️

2) Remove biases:

As shown in our training code of the detectors (-> get_data.py and get_transform.py), you can create our Unbiased Genimage dataset by selecting the subset of images in a specific size range (or by content classes). Then align the jpeg QG using jpeg_augment.py.

Example to create the (by size and compression) unbiased Wukong (512x512 px) subset:

df = pd.read_csv("metadata.csv")
df_unbiased_natural = df[ (df["generator"] == "nature") & (df["width"] >= 450) & (df["height"] >= 450) & (df["width"] <= 550) & (df["height"] <= 550) & (df["compression_rate"] == 96)]
df_unbiased_ai = df[ (df["generator"] == "wukong") ]
df_unbiased = pd.concat([df_unbiased_natural, df_unbiased_ai])

Code details

We provide Code for training and validating ResNet50 and Swin-T detectors. This aims to show that:

Detectors trained on the raw GenImage dataset actually learn from existing Biases in compression and image size.
Mitigating these Biases leads to significantly improved Cross-Generator Performance and Robustness towards JPEG-Compression, achieving state-of-the-art results.

Same as in the original GenImage paper, we use forks from timm and Swin-Transformer. We just changed the dataset (create_dataset.py) to be more suitable for our experiments. This dataset uses get_data.py for selecting the right data from the csv file and get_transform.py for transformations like JPEG-compression that are applied before the original transformations/augmentations. More details for how to start experiments can be found in the corresponding detector folders.

To do inference on own datasets, you have to create a CSV file and slightly adjust get_data.py as we did for the ffhq dataset.

Results

ResNet50

Cross-Generator Performance when training ResNet50 on constrained dataset

Difference to when training on raw dataset

Swin-T

Cross-Generator Performance when training Swin-T on constrained dataset

Difference to when training on raw dataset

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ResNet50		ResNet50
Swin-T		Swin-T
results		results
CNAME		CNAME
README.md		README.md
class_map.txt		class_map.txt
compute_qf.py		compute_qf.py
download_genimage.py		download_genimage.py
jpeg_augment.py		jpeg_augment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResNet50

ResNet50

Swin-T

Swin-T

results

results

CNAME

CNAME

README.md

README.md

class_map.txt

class_map.txt

compute_qf.py

compute_qf.py

download_genimage.py

download_genimage.py

jpeg_augment.py

jpeg_augment.py

Repository files navigation

🌟 Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

Unbiased GenImage dataset

1) Download

2) Remove biases:

Code details

Results

ResNet50

Swin-T

About

Releases

Packages

Languages

gendetection/UnbiasedGenImage

Folders and files

Latest commit

History

Repository files navigation

🌟 Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

Unbiased GenImage dataset

1) Download

2) Remove biases:

Code details

Results

ResNet50

Swin-T

About

Resources

Stars

Watchers

Forks

Languages