# Naive CNN Identifier
_A Naive CNN Application to AI Generated Image Detection_

Wesley Jones - Digital Forensics, CprE 536 - Iowa State University

# Background

Convolutional neural networks (CNN) are a method of deep learning that will allow you to build an AI.

The advantge of using a CNN for AI image related tasks is the natural ability of the model to identify parameters _unsupervised_.

<img src="images/2D_Convolution_Animation.gif" alt="drawing" width="300"/>

* _image source_: <https://en.wikipedia.org/wiki/Convolution#Discrete_convolution>

## Naive Approach and Related Works

I'm not the first one to think of this. Common implementations:

* Focus on identifying specfic parameters or artifacts about images
* Rely on less quality testing data sets
* Naive approaches perform at chance

## Objective and Motiviation


### Objective

My objective is to implement a naive CNN model that can identify a human face as real or snythetically generated. It is to be training on high quality images and should have real world replication -- in that it can identify images that would be passed off as real people in real world settings.

* A naive approach indicates that image pre-processing, custom parameterization, and model filtering/domain training is **not** utilized.

### Motivation

The increased _sharing_ of AI generated materials and increased _development_ of high quality content generated with AI models means digital forensics will need the lowest overhead possible for quick and reliable identification.

## Methodology

1. Acquire datasets for training
2. Run and save the model
3. Develop a testing dataset
4. Gather predictions of testing dataset from the saved model

## Training

* I used 160,000 images to train the model on two parameters, "real" and "synth". The model was trained in 15 epochs, with training dataset validation reaching >95% at epoch 10 and 99% by epoch 15.
* Training occured with Python TensorFlow, using an HPC and nVidia A100s. The total training took just under 14.5 hours.
* TensorFlow uses a modelling software called keras. This allowed me to save and reload the model to quickly make perdictions against.

| Purpose | Image Count | Classes |
| --- | --- | --- |
| Training | 112,952 | 2 |
| Test | 16,136 | 2 |
| Validation | 32,273 | 2 |

## Generating Test Data

I used the following prompt in current AI image generators to generate images I would ask my naive CNN model to predict.

```plain
{race} {gender} looking directly at the camera for a professional headshot taken using a Sony A7 III camera with 1/250, f/1.4, ISO 200 - FE 35mm 1.4 ZA - Portrait Style and 6200 K
```

**Generating models used**
* Turbo SDXL
* Fast SDXL

### Issues with generated data

* Models are reluctant to supply women presenting faces unless specifically prompted.
* Cultural issues with specific prompts:

```plain
Native American {gender} looking directly at the camera for a professional headshot taken using a Sony A7 III camera with 1/250, f/1.4, ISO 200 - FE 35mm 1.4 ZA - Portrait Style and 6200 K
```

<img src="images/na_woman.jpg" alt="Native American woman" width="120"/>
<img src="images/na_person_3.jpg" alt="Native American woman" width="120"/>


* Due to the nature of these generative models the faces had _a lot_ of similarities

## Results

* Model performed at chance across the board (~50%)
* Turbo SDXL was noticiably _wrong_; Fast SDXL was noticably _correct_
* (not) Shocking contribution: there's an issue with pre-processing, sizing, or tensor correlation.

_All of the images tested were synthetic._

Test prediction results overview. Notice the similarities in the exact prediction results. If the model determines an image is real or synthetic it is very confident in this result.

* _1 = synthetic_
* _2 = real_

![](images/selected.png)

Selected results focused on the middling predictions. Anything falling below 0.5 is considered a real image by the prediction.

![](images/zoom_selected.png)

This plot highlights the performance of each model. This is not an incremental model - the x-axis indicates the index number of an image, a counter that increases as subsequent images are tested. The x-axis indicates the dividing line. Prediction values < 0.5 are determined to be real. A perfect test would result in all predictions being > 0.5

![](images/selected_plot.png)

## Dicussion & Conclusion

### What went wrong?

* Pre-processing for sizing and tensor alignment
* Image quality didn't seem to play a factor in correct predictions

### Future Work

* Additional layers for CNNs to improve results
* Changing the scope of parameters to fix confidence in incorrect results

### Conclusion

* Meta announces AI face generator this week, <imagine.meta.com>
* Paper pre-release December 7, 2023 – MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar
    * https://yufan1012.github.io/MonoGaussianAvatar

<img src="images/imagine_meta.png" alt="Meta image generated with faces" width="400"/>

## Resources

* See my codebase for details: <https://github.com/iamwpj/naive-cnn-identifier>

![](images/qr-iamwpj-github.png)