# Neural Networks Final Project
### Reimplementation of the study: <br> ***"DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image GenerationModels"* <br> from Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang**

**Name**: *Laura Papi*

**Matricola**: *1760732*

# Project Description

The above cited study focuses on the growing concerns about the possible misuse of AI generated images, and assesses the necessity for a tool to detect and attribute these fake images.<br>
In particular, it points out the lack of research on the particular case of images generated by a text prompt.
<br>

<br>
Therefore, this research proposes methods to answer the following 3 research questions [RQ]:

- **RQ1**. Detection of images generated by text-to-image generation models

- **RQ2**. Attribution of the fake images to their source model

- **RQ3**. Analysis of the likelihood that different text prompts have to generate authentic images

<br><br>
The following sections contain instructions on how to build, train and evaluate models to answer the proposed researched questions.<br><br>
The complete code of this project can be found in the source directory of the GitHub repository __[Source Code](https://github.com/parwal-lp/De-Fake_nn_final_project)__


In [None]:
# -- Declare the variables to be used globally in this notebook

path_to_ld = "../latent-diffusion/" # set here the path to the LD directory cloned from GitHub
proj_dir = "../De-Fake_nn_final_project" # set here the path to the root of the current project (De-Fake)

SD_generated_temp_dir = "data/generated/SD+MSCOCO/"
GLIDE_generated_temp_dir = "data/generated/GLIDE+MSCOCO/"
LD_generated_temp_dir = path_to_ld + "outputs/txt2img-samples/"

In [2]:
# -- Declare all the imports needed in this notebook

# External libraries imports
import sys
import os
import torch
import torchvision

# References to other files of this project
# Functions for the management of data
from src.data_collector import fetchImagesFromMSCOCO
from src.dataset_generator import SD_generation, LD_generation, GLIDE_generation
from src.format_dataset import format_dataset_binaryclass, formatIntoTrainTest, format_dataset_multiclass
from src.encoder import get_multiclass_dataset_loader, get_dataset_loader
# Functions for building and training the models
from src.imageonly_detector.model import train_imageonly_detector, eval_imageonly_detector
from src.imageonly_attributor.model import train_imageonly_attributor, eval_imageonly_attributor
from src.hybrid_detector.hybrid_detector import TwoLayerPerceptron, train_hybrid_detector, eval_hybrid_detector
from src.hybrid_attributor.model import MultiClassTwoLayerPerceptron, train_hybrid_attributor


## RQ1. Detection of images generated by text-to-image generation models

The study proposes two detector models:

1. **Image-only detector**<br>binary classifier that decides whether an input image is fake or real.

2. **Hybrid detector**<br>binary classifier that is able to tell if an image is fake or real, based on the input image and its corresponding text prompt.


### 1. Image-only detector
This model is implemented as a two-layer perceptron, to be used for binary classification.

#### 1.1 Dataset
All the datasets are constitueted by a set of N real images (labeled 1), and a set of N corresponding fake generated images (labeled 0).

Training (on a single dataset):
- real images fetched from MSCOCO (class 1)
- fake images generated by Stable Diffusion (SD) (class 0)

Evaluation (on three different datasets):
- real images always fetched from MSCOCO (class 1)
- fake images generated respectively by Stable Diffusion (SD), Latent Diffusion (LD) and GLIDE (class 0)

The data is structured as follows:

imageonly_detector_data/<br>
&emsp;&emsp;├── train/<br>
&emsp;&emsp;&emsp;&emsp;├── class_0/<br>
&emsp;&emsp;&emsp;&emsp;│   └── *fake images generated by SD*<br>
&emsp;&emsp;&emsp;&emsp;├── class_1/<br>
&emsp;&emsp;&emsp;&emsp;│   └── *real images fetched by MSCOCO*<br>
&emsp;&emsp;├── val/<br>
&emsp;&emsp;&emsp;&emsp;├── class_0/<br>
&emsp;&emsp;&emsp;&emsp;│   └── *fake images generated by SD*<br>
&emsp;&emsp;&emsp;&emsp;├── class_1/<br>
&emsp;&emsp;&emsp;&emsp;│   └── *real images fetched by MSCOCO*<br>
&emsp;&emsp;├── val_LD/<br>
&emsp;&emsp;&emsp;&emsp;├── class_0/<br>
&emsp;&emsp;&emsp;&emsp;│   └── *fake images generated by LD*<br>
&emsp;&emsp;&emsp;&emsp;├── class_1/<br>
&emsp;&emsp;&emsp;&emsp;│   └── *real images fetched by MSCOCO*<br>
&emsp;&emsp;├── val_GLIDE/<br>
&emsp;&emsp;&emsp;&emsp;├── ...<br>


In [None]:
#import the path to the scripts needed for this section
sys.path.insert(10, '/home/parwal/Documents/GitHub/De-Fake_nn_final_project/src/imageonly_detector')
#TODO capire a chi serve questo import e metterlo nel posto giusto

# TODO TEST

#SD+MSCOCO
# The dataset generated using SD will be divided into train and test later on
fetchImagesFromMSCOCO("data/MSCOCO_for_SD/images", "data/MSCOCO_for_SD", 100)

#LD+MSCOCO --------------------------------------------------------------------------
# real images and captions needed as input for the LD model
fetchImagesFromMSCOCO("data/imageonly_detector_data/val_LD/class_1", "data/imageonly_detector_data/val_LD", 50)

#GLIDE+MSCOCO -----------------------------------------------------------------------
# real images and captions needed as input for the GLIDE model
fetchImagesFromMSCOCO("data/imageonly_detector_data/val_GLIDE/class_1", "data/imageonly_detector_data/val_GLIDE", 50)

In [None]:
# TODO TEST

#SD+MSCOCO --------------------------------------------------------------------------
#use stable-diffusion API to generate 100 fake images from the 100 captions collected before
#prima di eseguire il file ho cambiato le directory
#%run src/imageonly_detector/SD_MSCOCO_data_generation.py
SD_generation("data/MSCOCO_for_SD/mscoco_captions.csv", SD_generated_temp_dir)

#LD+MSCOCO --------------------------------------------------------------------------
#resetto la directory corrente a quella del progetto de-fake, altrimenti il file da eseguire non viene trovato
#questo è necessario perché LD_MSCOCO_data_generation.py cambia la directory a quella di latent-diffusion
#os.chdir("/home/parwal/Documents/GitHub/De-Fake_nn_final_project")
#%run src/imageonly_detector/LD_MSCOCO_data_generation.py
LD_generation("data/imageonly_detector_data/val_LD/mscoco_captions.csv")

#GLIDE+MSCOCO -----------------------------------------------------------------------
#%run src/imageonly_detector/GLIDE_MSCOCO_data_generation.ipynb
GLIDE_generation("data/imageonly_detector_data/val_GLIDE/mscoco_captions.csv", GLIDE_generated_temp_dir)

In [None]:
#transform the collected data in the previously described structure

# TODO TEST

#SD+MSCOCO
#this function generates a pair of datasets (train and val), starting from data from the Stable Diffusion generation
#the data generated from SD contains 100 images, this original dataset is split in half (50 for train, 50 for test)
formatIntoTrainTest("data/MSCOCO_for_SD/images", SD_generated_temp_dir, "data/imageonly_detector_data")
print("ok SD")

#LD+MSCOCO --------------------------------------------------------------------------
format_dataset_binaryclass(LD_generated_temp_dir, "data/imageonly_detector_data/val_LD")
print("ok LD")

#GLIDE+MSCOCO -----------------------------------------------------------------------
format_dataset_binaryclass(GLIDE_generated_temp_dir, "data/imageonly_detector_data/val_GLIDE")
print("ok GLIDE")

#### 1.2 Model

In the next block we build and train the actual model, a two-layer perceptron for binary classification, with the following steps:
- **Build the model** starting from a pre-trained version of ResNet18<br><br>
- **Create Dataset and DataLoader** objects starting from the row data fetched at 2.1 (jpg images)<br>Each item of this dataset is transformed -- TO DO WHY TRANSFORM CAPIRE --<br><br>
- **Train the model** using a custom train function and the DataLoader from the previous step

In [3]:
 # Build the model
print("Building the model...")
model = torchvision.models.resnet18(weights='IMAGENET1K_V1')

# Build the datasets
print("Building the dataset...")
data_transforms = {
    'train': torchvision.transforms.Compose([
        torchvision.transforms.RandomResizedCrop(224),
        torchvision.transforms.RandomHorizontalFlip(),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val_LD': torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val_GLIDE': torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'data/imageonly_detector_data'
image_datasets = {x: torchvision.datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val', 'val_LD', 'val_GLIDE']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4, shuffle=True, num_workers=4) for x in ['train', 'val', 'val_LD', 'val_GLIDE']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val', 'val_LD', 'val_GLIDE']}

# Train the model
print("Training starts")
trained_model = train_imageonly_detector(model, dataloaders, dataset_sizes, num_epochs=50)

#Build a the dataloaders and test the model on each of them
print("Evaluation starts")
print("loading model with trained weights...")
test_model = torchvision.models.resnet18(weights='IMAGENET1K_V1')
test_model.load_state_dict(torch.load('trained_models/imageonly_detector.pth'))
eval_imageonly_detector(test_model, dataloaders, dataset_sizes) #valuta se serve, in fondo il test lo fai durante il training... viene uguale

Building the model...
Building the dataset...
Training starts
Epoch 0/49
----------
train Loss: 3.1478 Acc: 0.5100
val Loss: 0.4255 Acc: 0.8265

Epoch 1/49
----------
train Loss: 0.5736 Acc: 0.8400
val Loss: 0.2364 Acc: 0.9388

Epoch 2/49
----------
train Loss: 0.6277 Acc: 0.8300
val Loss: 0.4284 Acc: 0.8571

Epoch 3/49
----------
train Loss: 0.5500 Acc: 0.8400
val Loss: 0.5303 Acc: 0.8469

Epoch 4/49
----------
train Loss: 0.7450 Acc: 0.8100
val Loss: 1.0299 Acc: 0.7143

Epoch 5/49
----------
train Loss: 0.6322 Acc: 0.8100
val Loss: 0.3712 Acc: 0.8673

Epoch 6/49
----------
train Loss: 0.2620 Acc: 0.8900
val Loss: 0.6136 Acc: 0.8469

Epoch 7/49
----------
train Loss: 0.5283 Acc: 0.8300
val Loss: 0.3662 Acc: 0.8571

Epoch 8/49
----------
train Loss: 0.4382 Acc: 0.8400
val Loss: 0.3725 Acc: 0.8571

Epoch 9/49
----------
train Loss: 0.2583 Acc: 0.9100
val Loss: 0.3593 Acc: 0.8776

Epoch 10/49
----------
train Loss: 0.4544 Acc: 0.8700
val Loss: 0.3310 Acc: 0.8265

Epoch 11/49
----------
t

### 2. Hybrid detector
For this problem we again implement a two-layer perceptron for binary classification, but in this case it will take as input not only the images but also their captions.

#### 2.1 Dataset

The data is first fetched and generated in the exact same way as the dataset for the image-only detector.<br>

In [None]:
# ------------------- COLLECT REAL IMAGES FROM MSCOCO -------------------- #

# TODO TEST

#SD+MSCOCO
fetchImagesFromMSCOCO("data/MSCOCO_for_SD_hybrid/images", "data/MSCOCO_for_SD_hybrid", 100)

#LD+MSCOCO --------------------------------------------------------------------------
fetchImagesFromMSCOCO("data/hybrid_detector_data/val_LD/class_1", "data/hybrid_detector_data/val_LD", 50)

#GLIDE+MSCOCO -----------------------------------------------------------------------
fetchImagesFromMSCOCO("data/hybrid_detector_data/val_GLIDE/class_1", "data/hybrid_detector_data/val_GLIDE", 50)

In [None]:
# ------------------- GENERATE FAKE IMAGES USING SD, LD, GLIDE -------------------- #
#SD+MSCOCO --------------------------------------------------------------------------
#use stable-diffusion API to generate 100 fake images from the 100 captions collected before
#%run src/imageonly_detector/SD_MSCOCO_data_generation.py
SD_generation("data/MSCOCO_for_SD_hybrid/mscoco_captions.csv", SD_generated_temp_dir)

#LD+MSCOCO --------------------------------------------------------------------------
# N.B.
# prima di lanciare questo comando, aggiungere il file src/imageonly_detector/txt2img_batch.py alla directory latent-diffusion/scripts/
#resetto la directory corrente a quella del progetto de-fake, altrimenti il file da eseguire non viene trovato
#questo è necessario perché LD_MSCOCO_data_generation.py cambia la directory a quella di latent-diffusion
#os.chdir("/home/parwal/Documents/GitHub/De-Fake_nn_final_project")
#%run src/imageonly_detector/LD_MSCOCO_data_generation_batch.py
LD_generation("data/hybrid_detector_data/val_LD/mscoco_captions.csv")

#GLIDE+MSCOCO -----------------------------------------------------------------------
#NON HO MAI PROVATO A RUNNARLO, altrimenti rigenera il modello (3gb)
#provare a runnarlo proprio alla fine di tutto per sicurezza
#%run src/imageonly_detector/GLIDE_MSCOCO_data_generation.ipynb #TODO
GLIDE_generation("data/hybrid_detector_data/val_GLIDE/mscoco_captions.csv", GLIDE_generated_temp_dir)

In [None]:
# ------------------- FORMAT THE DATA INTO THE STRUCTURE NEEDED FOR TRAINING/TESTING -------------------- #
os.chdir("/home/parwal/Documents/GitHub/De-Fake_nn_final_project")

#transform the collected data in the previously described structure

#SD+MSCOCO --------------------------------------------------------------------------
#this function generates a pair of datasets (train and val), starting from data from the Stable Diffusion generation
#the data generated from SD contains 100 images, this original dataset is split in half (50 for train, 50 for test)
formatIntoTrainTest("data/MSCOCO_for_SD_hybrid/images", SD_generated_temp_dir, "data/hybrid_detector_data")
print("ok SD")

#LD+MSCOCO --------------------------------------------------------------------------
format_dataset_binaryclass(LD_generated_temp_dir, "data/hybrid_detector_data/val_LD")
print("ok LD")

#GLIDE+MSCOCO -----------------------------------------------------------------------
format_dataset_binaryclass(GLIDE_generated_temp_dir, "data/hybrid_detector_data/val_GLIDE")
print("ok GLIDE")

#### 2.2 Model

In the next block we build and train the actual model, a two-layer perceptron for binary classification, with the following steps:
- **Build the model** using a custom implemented module<br>A two-layer perceptron that outputs 0 (fake) or 1 (real) for each sample<br><br>
- **Create Dataset and DataLoader** objects starting from the row data fetched at 2.1 (jpg images and string captions)<br>Each item of this dataset is composed by the encoding of an image concatenated with the encoding of its caption,<br>the encodings are generated using the CLIP model<br><br>
- **Train the model** using a custom train function and the DataLoader from the previous step

In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

#Build the model
print('Building the model...')
hybrid_detector = TwoLayerPerceptron(1024, 100, 1).to(device)

#Build the dataset
print('Building the dataset...')
captions_file = "data/hybrid_detector_data/mscoco_captions.csv"
real_img_dir = "data/hybrid_detector_data/train/class_1"
fake_img_dir = "data/hybrid_detector_data/train/class_0"
train_data_loader = get_dataset_loader(captions_file, real_img_dir, fake_img_dir)

#Train the model
print('Training starts')
train_hybrid_detector(hybrid_detector, train_data_loader, 10, 0.005) # few epochs are needed because we use transfer learning

Building the model...
Building the dataset...
Training starts:
EPOCH:  1/10  - MEAN ACCURACY:  tensor(0.5000)  - MEAN LOSS:  tensor(0.6976)
EPOCH:  2/10  - MEAN ACCURACY:  tensor(0.5400)  - MEAN LOSS:  tensor(0.6853)
EPOCH:  3/10  - MEAN ACCURACY:  tensor(0.5900)  - MEAN LOSS:  tensor(0.6740)
EPOCH:  4/10  - MEAN ACCURACY:  tensor(0.6200)  - MEAN LOSS:  tensor(0.6629)
EPOCH:  5/10  - MEAN ACCURACY:  tensor(0.7800)  - MEAN LOSS:  tensor(0.6523)
EPOCH:  6/10  - MEAN ACCURACY:  tensor(0.8200)  - MEAN LOSS:  tensor(0.6393)
EPOCH:  7/10  - MEAN ACCURACY:  tensor(0.8700)  - MEAN LOSS:  tensor(0.6267)
EPOCH:  8/10  - MEAN ACCURACY:  tensor(0.9000)  - MEAN LOSS:  tensor(0.6138)
EPOCH:  9/10  - MEAN ACCURACY:  tensor(0.8900)  - MEAN LOSS:  tensor(0.5987)
EPOCH:  10/10  - MEAN ACCURACY:  tensor(0.9300)  - MEAN LOSS:  tensor(0.5838)


Now that the model is trained, we can evaluate it on some test datasets.<br>
In particular we will evaluate it on:<br>
- Stable Diffusion (SD), dataset generated from the same image-to-text generator used for the train dataset.
- GLIDE
- Latent Diffusion

In [2]:
# Build the model with the weights we trained in the previous code block
print("loading model with trained weights...")
test_hybrid_detector = TwoLayerPerceptron(1024, 100, 1)
test_hybrid_detector.load_state_dict(torch.load('trained_models/hybrid_detector.pth'))

eval_dirs = {'SD': {
                'captions': "data/hybrid_detector_data/mscoco_captions.csv", 
                'real': "data/hybrid_detector_data/val/class_1", 
                'fake': "data/hybrid_detector_data/val/class_0"},
             'GLIDE': {
                 'captions': "data/hybrid_detector_data/val_GLIDE/mscoco_captions.csv",
                  'real': "data/hybrid_detector_data/val_GLIDE/class_1", 
                  'fake': "data/hybrid_detector_data/val_GLIDE/class_0"},
             'LD': {
                 'captions': "data/hybrid_detector_data/val_LD/mscoco_captions.csv", 
                 'real': "data/hybrid_detector_data/val_LD/class_1", 
                 'fake': "data/hybrid_detector_data/val_LD/class_0"}}

#Build a the dataloaders and test the model on each of them
print("Evaluation starts")
for dataset_name in eval_dirs:
    eval_data_loader = get_dataset_loader(eval_dirs[dataset_name]['captions'], eval_dirs[dataset_name]['real'], eval_dirs[dataset_name]['fake'])
    SDloss, SDacc = eval_hybrid_detector(test_hybrid_detector, eval_data_loader)
    print(f'Evaluation on {dataset_name} --> Accuracy: {SDacc} - Loss: {SDloss}')

Evaluation starts:
loading model with trained weights...
testing...
Evaluation on SD --> Accuracy: 0.8933333158493042 - Loss: 0.5959395170211792
Evaluation on GLIDE --> Accuracy: 0.7100000381469727 - Loss: 0.6429540514945984
Evaluation on LD --> Accuracy: 0.7100000977516174 - Loss: 0.6497437357902527


## RQ2. Attribution of the fake images to their source model

The study proposes two attributor models:

1. **Image-only attributor**<br>multi-class classifier that assigns each input image to its source generation model, given the image only.

2. **Hybrid attributor**<br>multi-class classifier that assigns each input image to its source generation model, based on the input image and its corresponding text prompt.


### 1. Image-only attributor

In this section we will build and train a model that is able to assign an image to the model that generated it, given only that image.<br><br>
The classes that this model will be able to address are the following:
- real image -> class 0
- fake image generated by SD -> class 1
- fake image generated by LD -> class 2
- fake image generated by GLIDE -> class 3

#### 1.1 Dataset

We generate two datasets, one for training and one for evaluating the model.<br>
The steps needed to generate the two datasets are the same:
- fetch real images and their captions from MSCOCO (class 0)
- generate fake images with SD using the captions of the real images (class 1)
- generate fake images with LD using the captions of the real images (class 2)
- generate fake images with GLIDE using the captions of the real images (class 3)
- move the real and generated images into a dataset directory, with the following structure:

imageonly_attributor_data/<br>
&emsp;&emsp;├── train/<br>
&emsp;&emsp;&emsp;&emsp;├── class_real/<br>
&emsp;&emsp;&emsp;&emsp;│   ├── ...<br>
&emsp;&emsp;&emsp;&emsp;│   └── *all the images fetched by MSCOCO*<br>
&emsp;&emsp;&emsp;&emsp;├── class_SD/<br>
&emsp;&emsp;&emsp;&emsp;│   ├── ...<br>
&emsp;&emsp;&emsp;&emsp;│   └── *all the images generated by SD*<br>
&emsp;&emsp;&emsp;&emsp;├── class_GLIDE/<br>
&emsp;&emsp;&emsp;&emsp;│   ├── ...<br>
&emsp;&emsp;&emsp;&emsp;│   └── *all the images generated by GLIDE*<br>
&emsp;&emsp;&emsp;&emsp;├── class_LD/<br>
&emsp;&emsp;&emsp;&emsp;│   ├── ...<br>
&emsp;&emsp;&emsp;&emsp;│   └── *all the images generated by LD*<br>
&emsp;&emsp;├── test/<br>
&emsp;&emsp;&emsp;&emsp;├── ...<br>

Generate the TRAIN dataset first:

In [None]:
os.chdir(proj_dir)

# fetch the images with their captions from MSCOCO (N=50)
fetchImagesFromMSCOCO("data/imageonly_attributor_data/train/class_real", "data/imageonly_attributor_data/train", 50)

# use the same 50 captions to generate images with SD
SD_generation("data/imageonly_attributor_data/train/mscoco_captions.csv", SD_generated_temp_dir)

# use the same 50 captions to generate images with GLIDE
GLIDE_generation("data/imageonly_attributor_data/train/mscoco_captions.csv", GLIDE_generated_temp_dir)

# use the same 50 captions to generate images with LD
LD_generation("data/imageonly_attributor_data/train/mscoco_captions.csv")

# move the generated images to the dataset dir
format_dataset_multiclass(SD_generated_temp_dir, LD_generated_temp_dir, GLIDE_generated_temp_dir, "data/imageonly_attributor_data/train")

Then generate the TEST dataset:

In [None]:
# Repeat the same procedure for the test dataset

# fetch the images with their captions from MSCOCO (N=50)
fetchImagesFromMSCOCO("data/imageonly_attributor_data/test/class_real", "data/imageonly_attributor_data/test", 50)

# use the same 50 captions to generate images with SD
SD_generation("data/imageonly_attributor_data/test/mscoco_captions.csv", SD_generated_temp_dir)

# use the same 50 captions to generate images with GLIDE
GLIDE_generation("data/imageonly_attributor_data/test/mscoco_captions.csv", GLIDE_generated_temp_dir)

# use the same 50 captions to generate images with LD
LD_generation("data/imageonly_attributor_data/test/mscoco_captions.csv")

# move the generated images to the dataset dir
format_dataset_multiclass(SD_generated_temp_dir, LD_generated_temp_dir, GLIDE_generated_temp_dir, "data/imageonly_attributor_data/test")

#### 1.2 Model

In the next block we build and train the actual model, a two-layer perceptron for multiclass classification, with the following steps:
- **Build the model** starting from a pre-trained version of ResNet18<br><br>
- **Create Dataset and DataLoader** objects starting from the row data fetched at 2.1 (jpg images)<br>Each item of this dataset is transformed -- TO DO WHY TRANSFORM CAPIRE --<br><br>
- **Train the model** using a custom train function and the DataLoader from the previous step

In [2]:
# Build the model
print("Building the model...")
model = torchvision.models.resnet18(weights='IMAGENET1K_V1')

# Build the datasets
print("Building the dataset...")
data_transforms = {
    'train': torchvision.transforms.Compose([
        torchvision.transforms.RandomResizedCrop(224),
        torchvision.transforms.RandomHorizontalFlip(),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'test': torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
}

data_dir = 'data/imageonly_attributor_data'
image_datasets = {x: torchvision.datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4, shuffle=True, num_workers=4) for x in ['train', 'test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'test']}

# Train the model
print("Training starts:")
trained_model = train_imageonly_attributor(model, dataloaders, dataset_sizes, num_epochs=50)

# Evaluate the model
print("Evaluation starts:")
print("loading model with trained weights...")
test_model = torchvision.models.resnet18(weights='IMAGENET1K_V1')
test_model.load_state_dict(torch.load('trained_models/imageonly_detector.pth'))
eval_imageonly_attributor(test_model, dataloaders, dataset_sizes) #valuta se serve, in fondo il test lo fai durante il training... viene uguale


Building the model...
Building the dataset...
Training starts:
Epoch 0/49
----------
train Loss: 6.2529 Acc: 0.1900
test Loss: 6.1089 Acc: 0.3003
Epoch 1/49
----------
train Loss: 2.4839 Acc: 0.6250
test Loss: 2.8051 Acc: 0.5387
Epoch 2/49
----------
train Loss: 1.4246 Acc: 0.7500
test Loss: 2.4543 Acc: 0.6409
Epoch 3/49
----------
train Loss: 0.8603 Acc: 0.7950
test Loss: 2.5517 Acc: 0.5820
Epoch 4/49
----------
train Loss: 1.0108 Acc: 0.7550
test Loss: 1.8932 Acc: 0.6285
Epoch 5/49
----------
train Loss: 0.7282 Acc: 0.8350
test Loss: 1.6203 Acc: 0.6502
Epoch 6/49
----------
train Loss: 0.6359 Acc: 0.8350
test Loss: 1.5583 Acc: 0.6533
Epoch 7/49
----------
train Loss: 0.6271 Acc: 0.7900
test Loss: 1.2853 Acc: 0.6997
Epoch 8/49
----------
train Loss: 0.6136 Acc: 0.8250
test Loss: 1.0811 Acc: 0.7276
Epoch 9/49
----------
train Loss: 0.5413 Acc: 0.8500
test Loss: 0.8983 Acc: 0.7461
Epoch 10/49
----------
train Loss: 0.6380 Acc: 0.7850
test Loss: 0.9645 Acc: 0.7399
Epoch 11/49
----------


FileNotFoundError: [Errno 2] No such file or directory: 'trained_models/imageonly_detector.pth'

### 2. Hybrid attributor
In this section we will build and train a model similar to the model built in section 1.<br>
The difference is that instead of taking as input only the image, this model also considers its textual caption.

#### 2.1 Dataset

Train and test datasets are generated in the same way as in the image-only attributor case.<br>
For the dataset directory structure also refer to the previous section.

Generate the TRAIN dataset first:

In [None]:
# fetch the images with their captions from MSCOCO (N=50)
fetchImagesFromMSCOCO("data/hybrid_attributor_data/train/class_real", "data/hybrid_attributor_data/train", 50)

# use the same 50 captions to generate images with SD
SD_generation("data/hybrid_attributor_data/train/mscoco_captions.csv", SD_generated_temp_dir)

# use the same 50 captions to generate images with GLIDE
GLIDE_generation("data/hybrid_attributor_data/train/mscoco_captions.csv", GLIDE_generated_temp_dir)

# use the same 50 captions to generate images with LD OK
LD_generation("data/hybrid_attributor_data/train/mscoco_captions.csv")

# move the generated images to the dataset dir
format_dataset_multiclass(SD_generated_temp_dir, LD_generated_temp_dir, GLIDE_generated_temp_dir, "data/hybrid_attributor_data/train")

Then denerate the TEST dataset:

In [None]:
# fetch the images with their captions from MSCOCO (N=50)
fetchImagesFromMSCOCO("data/hybrid_attributor_data/test/class_real", "data/hybrid_attributor_data/test", 50)

# use the same 50 captions to generate images with SD
SD_generation("data/hybrid_attributor_data/test/mscoco_captions.csv", SD_generated_temp_dir)

# use the same 50 captions to generate images with GLIDE
GLIDE_generation("data/hybrid_attributor_data/test/mscoco_captions.csv", GLIDE_generated_temp_dir)

# use the same 50 captions to generate images with LD OK
LD_generation("data/hybrid_attributor_data/test/mscoco_captions.csv")

# move the generated images to the dataset dir
format_dataset_multiclass(SD_generated_temp_dir, LD_generated_temp_dir, GLIDE_generated_temp_dir, "data/hybrid_attributor_data/test")

#### 2.2 Model

In the next block we build and train the actual model, a two-layer perceptron for multiclass classification, with the following steps:
- **Build the model** using a custom implemented module<br>A two-layer perceptron that outputs the class predicted for each sample<br><br>
- **Create Dataset and DataLoader** objects starting from the row data fetched at 2.1 (jpg images and string captions)<br>Each item of this dataset is composed by the encoding of an image concatenated with the encoding of its caption,<br>the encodings are generated using the CLIP model<br><br>
- **Train the model** using a custom train function and the DataLoader from the previous step

In [1]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Build the model
print('Building the model...')
hybrid_attributor = MultiClassTwoLayerPerceptron(1024, 100, 4).to(device)

# Build the dataset (each sample in the dataset is the encoding of an image concatenated to the encoding of its caption - encodings generated using the CLIP model)
print('Building the dataset...')
captions_file = "data/hybrid_attributor_data/train/mscoco_captions.csv"
dataset_dir = "data/hybrid_attributor_data/train"
classes = {"class_real", "class_SD", "class_LD", "class_GLIDE"}

train_data_loader = get_multiclass_dataset_loader(captions_file, dataset_dir, classes)

# Train the model on the dataset just generated
print('Training starts:')
train_hybrid_attributor(hybrid_attributor, train_data_loader, 30, 0.005)

Building the model...
Building the dataset...
Training starts:
epoch: 1/30
EPOCH:  1  - MEAN ACCURACY:  tensor(0.2200)  - MEAN LOSS:  tensor(1.3872)
epoch: 2/30
EPOCH:  2  - MEAN ACCURACY:  tensor(0.3183)  - MEAN LOSS:  tensor(1.3617)
epoch: 3/30
EPOCH:  3  - MEAN ACCURACY:  tensor(0.4183)  - MEAN LOSS:  tensor(1.3365)
epoch: 4/30
EPOCH:  4  - MEAN ACCURACY:  tensor(0.5933)  - MEAN LOSS:  tensor(1.3108)
epoch: 5/30
EPOCH:  5  - MEAN ACCURACY:  tensor(0.6767)  - MEAN LOSS:  tensor(1.2796)
epoch: 6/30
EPOCH:  6  - MEAN ACCURACY:  tensor(0.7567)  - MEAN LOSS:  tensor(1.2472)
epoch: 7/30
EPOCH:  7  - MEAN ACCURACY:  tensor(0.7767)  - MEAN LOSS:  tensor(1.2103)
epoch: 8/30
EPOCH:  8  - MEAN ACCURACY:  tensor(0.7817)  - MEAN LOSS:  tensor(1.1686)
epoch: 9/30
EPOCH:  9  - MEAN ACCURACY:  tensor(0.8517)  - MEAN LOSS:  tensor(1.1257)
epoch: 10/30
EPOCH:  10  - MEAN ACCURACY:  tensor(0.7917)  - MEAN LOSS:  tensor(1.0752)
epoch: 11/30
EPOCH:  11  - MEAN ACCURACY:  tensor(0.8250)  - MEAN LOSS:  te

## RQ3. Analysis of the likelihood that different text prompts have to generate authentic images

### 1. Semantic Analysis

### 2. Structure Analysis

## Conclusions