# AutoGluon Experiment - Authentic vs. Synthetic Face Image Detection
- Author: Petr Kuska
- Note: This is a version of a script used in the master thesis to train an AutoGluon model on data stored on Kaggle (private datasets). A more streamlined version for using the trained model will be provided later.

## Setup
Installl kaggle API to download datasets.

In [None]:
!pip install kaggle --upgrade

In Kaggle page go to the Account tab of your user profile and select Create API Token. This will trigger the download of kaggle.json, a file containing your API credentials. Place this file in the location ~/.kaggle/kaggle.json.

If you have uploaded using JupyterLab, use the below commands to copy it to the required location.

Open a terminal, and run the following commands:

In [None]:
!kaggle datasets download -d petrkuska/training
!kaggle datasets download -d petrkuska/validation
!kaggle datasets download -d petrkuska/test-data

Install zip package by running the following command:

In [None]:
!apt-get install zip

Unzip downloaded datasets:

In [5]:
!unzip -qq training.zip
!unzip -qq validation.zip
!unzip -qq test-data.zip

Install AutGluon - more info at: https://auto.gluon.ai/stable/index.html

In [2]:
!pip install autogluon

^C


## Data Preparation

In [1]:
import pandas as pd

In [24]:
training_df = pd.read_csv("training.csv")
training_df.head(5)

Unnamed: 0,img_path,label
0,/home/training/synthetic/ifakefacedb_tpdne_012...,synthetic
1,/home/training/synthetic/stylegan_007402.png,synthetic
2,/home/training/authentic/celeba_132993.jpg,authentic
3,/home/training/synthetic/stylegan2_032262.png,synthetic
4,/home/training/authentic/celeba_078422.jpg,authentic


In [25]:
validation_df = pd.read_csv("validation.csv")
validation_df.head(5)

Unnamed: 0,img_path,label
0,/home/validation/synthetic/generated_photos_v3...,synthetic
1,/home/validation/synthetic/stylegan2_038466.png,synthetic
2,/home/validation/synthetic/ifakefacedb_100f_09...,synthetic
3,/home/validation/authentic/ffhq_34648.png,authentic
4,/home/validation/synthetic/stylegan2_030897.png,synthetic


## Modelling

In [None]:
from autogluon.multimodal import MultiModalPredictor

model_path = "./authentic_vs_synthetic_face_model_2"
predictor = MultiModalPredictor(label="label", path=model_path)
predictor.fit(
    train_data=training_df, # you can use train_data_byte as well
    tuning_data=validation_df,
    time_limit=2*60*60, # seconds
) 

## Test Model

In [28]:
test_df = pd.read_csv("test.csv")
test_df.head(5)

Unnamed: 0,img_path,label
0,/home/test/synthetic/ifakefacedb_100f_083149.jpg,synthetic
1,/home/test/synthetic/generated_photos_v3_05096...,synthetic
2,/home/test/synthetic/stylegan2_030290.png,synthetic
3,/home/test/authentic/ffhq_02705.png,authentic
4,/home/test/synthetic/sdxl_a profile photo of a...,synthetic


In [None]:
scores = predictor.evaluate(test_df, metrics=["accuracy"]) # roc_auc

In [30]:
scores

{'accuracy': 0.9993333333333333}

In [34]:
predictions = predictor.predict(test_df)


Predicting: 0it [00:00, ?it/s]

In [36]:
test_df["prediction"] = predictions

In [37]:
test_df.head()

Unnamed: 0,img_path,label,prediction
0,/home/test/synthetic/ifakefacedb_100f_083149.jpg,synthetic,synthetic
1,/home/test/synthetic/generated_photos_v3_05096...,synthetic,synthetic
2,/home/test/synthetic/stylegan2_030290.png,synthetic,synthetic
3,/home/test/authentic/ffhq_02705.png,authentic,authentic
4,/home/test/synthetic/sdxl_a profile photo of a...,synthetic,synthetic


In [38]:
test_df.to_csv("test_predictions.csv", index=False)

## Load Model

In [None]:
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor.load("authentic_vs_synthetic_face_model_2")

In [9]:
predictor
# env_stratetgy set to None and it works

<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f059dd32c20>

In [None]:
!pip uninstall torchaudio -y # should solve the problem with running the model

## Make Predictions

In [60]:
from IPython.display import Image, display

image = "midjourney_woman_2.png"

pil_img = Image(filename=image)
#display(pil_img)

import numpy as np
np.set_printoptions(formatter={'float_kind':'{:f}'.format})
predictions = predictor.predict({'img_path': [image]})
print(predictions)

predictions = predictor.predict_proba({'img_path': [image]})
print(predictions)

  rank_zero_warn(


Predicting: 0it [00:00, ?it/s]

['synthetic']


  rank_zero_warn(


Predicting: 0it [00:00, ?it/s]

[[0.000010 0.999990]]


In [32]:
!zip -r autogluon_model_2.zip authentic_vs_synthetic_face_model_2

  adding: authentic_vs_synthetic_face_model_2/ (stored 0%)
  adding: authentic_vs_synthetic_face_model_2/config.yaml (deflated 58%)
  adding: authentic_vs_synthetic_face_model_2/df_preprocessor.pkl (deflated 75%)
  adding: authentic_vs_synthetic_face_model_2/data_processors.pkl (deflated 47%)
  adding: authentic_vs_synthetic_face_model_2/assets.json (deflated 47%)
  adding: authentic_vs_synthetic_face_model_2/events.out.tfevents.1700950758.71756cb5e45f.163.2 (deflated 78%)
  adding: authentic_vs_synthetic_face_model_2/hparams.yaml (deflated 37%)
  adding: authentic_vs_synthetic_face_model_2/model.ckpt (deflated 7%)
