# RvF: Real vs Fake face Detection

## Getting Started

To understand CNN theoretically, other than using online resources, you can refer to [CNN Crash Course (U-M Only)](https://docs.google.com/presentation/d/1p3EWFMfTNT773PEt3q16tlLxQ4FuD-JTwnTj1A_N4a0/edit?usp=sharing)

Here is also a guide on [PyTorch for CNNS](https://github.com/MichiganDataScienceTeam/W24-RvF/blob/main/notebooks/pytorch_cnn.ipynb)

In [1]:
import torch
import torchvision.transforms.v2 as v2
import torch
from torch import nn, optim
from starter_code.dataset import RvFDataset, get_loaders
from starter_code.train import train_model, plot_performance, load_model


## Step 1: Define Preprocessing

There are numerous advantages for preprocessing, depending on the topic of your program. For our project on CNN for RvF, the two major benefits of preprocessing are:

### Normalization
Normalization brings features onto a similar scale, preventing certain features from dominating the learning process due to larger magnitude. By normalizing the data, we ensure that each feature contributes proportionally to the learning process, leading to efficient convergence and model generalization.

### Generalization
Preprocessing techniques help to generalize the model better to unseen data by introducing variability in the training images. This prevents the model from overfitting to the training data. Examples of such preprocessing are random crop, random jitter, etc.

Here is a more detailed guide on [Image Preprocessing](https://github.com/MichiganDataScienceTeam/W24-RvF/blob/main/notebooks/image_preprocessing.ipynb).

In [2]:
# load training dataset
train_dataset = RvFDataset("train", data_directory = "data/rvf10k")

In [3]:
# normalization is implemented for you, which is called in the next cell
mean = torch.zeros((3,))
variance = torch.zeros((3,))
tensor_converter = v2.ToTensor()

for image, _ in train_dataset:
    mean += tensor_converter(image).mean(dim=(1, 2))
    mean /= len(train_dataset)
for image, _ in train_dataset:
    image = tensor_converter(image)
    variance += ((image - mean.view(3, 1, 1))**2).mean(dim=(1, 2))

std = torch.sqrt(variance / len(train_dataset))



### TODO1: Define Your Preprocessing

In [4]:
def preprocess(image) -> torch.Tensor:

    """
    Preprocesses an image by applying a series of transformation.

    Args:
        image (npt.ArrayLike): The input image to be preprocessed.

    Returns:
        torch.Tensor: The preprocessed image as a tensor.
    """

    tensor = torch.tensor(image, dtype = torch.float32).permute(2, 0, 1) # convert image to tensor

    tensor = v2.Normalize(mean = mean, std = std)(tensor)

    # TODO: Add more preprocessing steps to improve model performance.
    
    
    return tensor

## Step 2: Model Definition

Below is an example of a class definition in Python for a very simple convolutional neural network called BasicCNN. Let's break its components down.

In [5]:
class BasicCNN(nn.Module): # Net inherits from nn.Module
    def __init__(self):
        """Constructor for the neural network."""
        super(BasicCNN, self).__init__()        # Call superclass constructor
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=128, kernel_size=3, stride=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()              
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(3200, 10) 

    def forward(self, x):
        z1 = self.conv1(x)
        h1 = self.relu(z1)
        p1 = self.pool(h1)

        z2 = self.conv2(p1)
        h2 = self.relu(z2)
        p2 = self.pool(h2)

        flat = self.flatten(p2)
        z = self.fc(flat)

        return z

#### Subclass Inheritance

This first criteria is met by defining the subclass relationship between `BasicCNN` and `nn.Module`
- When we write the first line of the class defintion, we write `BasicCNN(nn.Module):` to indicate that `BasicCNN` is a subclass of `nn.Module`
- On line 4, we call the superclass constructor for this model:
  
  ```py
    super(BasicCNN, self).__init__() 
  ```

#### Layer Definition

For PyTorch to recognize that a layer is a part of our model, we must add all them as **member variables** of the `BasicCNN`. This can be done in the class constructor `__init__()` by evoking the `self` pointer:

```py
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=128, kernel_size=3, stride=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.relu = nn.ReLU()              
self.flatten = nn.Flatten()
self.fc = nn.Linear(3200, 10) 
```

This code defines 5 layers for our model:
- `conv1`: convolution layer that expects 1 channel and has 16 filters with filter size of 3 pixels and a stride of 1
- `conv2`: convolution layer that expects 16 channel and has 128 filters with filter size of 3 pixels and a stride of 1
- `pool`: max pooling layer that has a window size of 2 and a stride of 2. We will reuse this layer multiple times (since max pooling is stateless)
- `relu`: activation layer using the ReLU activation function. We will reuse this activation layer multiple times (since activation functions are stateless)
- `fc`: a dense layer that expects a vector with 3200 components and returns a vector with 10 components (one for each of the 10 classes in the MNIST dataset)

#### Defining the Forward Pass

The third criteria is more tricky - we have to define a function called `forward()` that specifies _how_ to call each layer and make predictions for some input image. For the model above, we have the following definition for this function

```py
def forward(self, x):
   z1 = self.conv1(x)
   h1 = self.relu(z1)
   p1 = self.pool(h1)

   z2 = self.conv2(p1)
   h2 = self.relu(z2)
   p2 = self.pool(h2)

   flat = self.flatten(p2)
   z = self.fc(flat)

   return z
```

Let's break down the first few lines of this function:
1. The `forward()` function takes as input the parameters
   1. `self` - is the self-pointer, is equivalent to `this` in C++
   2. `x` - the input to the model - in this case an image of a handwritten digit.
2. The image `x` is immediately passed as input into the first convolution layer `conv1` to perform convolution. The output of this convolution layer is saved to the local variable `z1`.
   1. Note that in this case, `self.conv` is actually a **functor** - it is an object that can be called like a function to produce an output
3. The convolution layer output `z1` is passed through the ReLU activation layer to get the activated outputs `h1`
4. The activated output has max pooling applied to downsample it, and the output is then saved the result to `p1`.
5. The pooled output is passed as input the second convolution layer `conv2` to perform another round of convolution. The output of this convolution layer is saved to the local variable `z2`
   1. 🚨 As `p1` has 16 channels, we MUST define `conv2` to accept 16 input channels. It is SUPER important to be careful to make sure that your input to your convolution layer has the correct number of channels, otherwise PyTorch will throw errors!
6. ...

And so forth! 

### TODO2: Define Your Model

Here are the restrictions:
- your first convolution layer must accept images that have only 4 channels
- your last convolution layer must accept images that have only 256 channels
- your CNN output should return a vector with 2 entries
- if you find difficulties in understanding the model, visit [Pytorch and CNN](https://github.com/MichiganDataScienceTeam/W24-RvF/blob/main/notebooks/pytorch_cnn.ipynb)

Other than that, you have as much flexibility as you prefer for how you want to define your model!

In [None]:
class Model(torch.nn.Module):
    def __init__(self):
      """Constructor for the neural network."""
      super(Model, self).__init__()        # Call superclass constructor

      # A few preprocessing ... 
      # You are free to add more or delete 
      self.batchnorm = torch.nn.BatchNorm2d(num_features = 3)
      self.padding = torch.nn.ZeroPad2d(padding = 2)
      self.dropout = torch.nn.Dropout(p = 0.50)

      # TODO: define your convolution layer, max pooling layer, activation layer, and dense layer

    # TODO: define your forward function below

## Step 3: Model Training


For model training, we utilized functions wirtten in starter_code directory located under "Optional-Challenge/RvF". They are ready to use, and feel free to refer back to them.

In [None]:
train_loader, val_loader = get_loaders(batch_size = 16, preprocessor = preprocess, data_directory = "data/rvf10k")

model = Model()

optimizer = torch.optim.Adam(model.parameters(), lr = 5e-4) # TODO: Change the optimizer to explore different options
criterion = torch.nn.CrossEntropyLoss() # TODO: Change the criterion to explore different options

history = train_model(model, criterion, optimizer, train_loader, val_loader)
plot_performance(history)

# Load the model from the training run
load_model(model, "checkpoints", 10) # you can modify the number of epochs, currently set at 10

This is the end of RvF challenge! Please save your file and submit your work.