# Convolutional Neural Networks - Smile Classifier

In this notebook, I implement a CNN to classify face images based on smiles. I also explore data augmentation techiniques to enhance the model's performance.

The CNN I am to create has four convolutional layers producing 32, 64, 128, and 256 feature maps respectively. All those convolutional layers use a kernel size of $3 \times 3$ with padding 1. The first three convolution layers are followed by max-pooling layers $P_{2 \times 2}$. Two dropout layers are also included for regularization.

Let's jump in right away and load the celebA dataset.

# Loading the CelebA dataset

CelebFaces Attributes Dataset, or CelebA for short, is an image dataset that identifies celebrity face attributes. It contains 202,599 face images across five landmark locations, with 40 binary attribute annotations for each image. 

Tought the dataset is available through the PyTorch's `torchvision` module, the link appears to be unstable. So, I downloaded the dataset manually using this [link](https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing)

In [1]:
import torch
import torchvision
from torchvision import transforms

image_path = './dataset'

#Load training partition of the dataset
celeba_train_dataset = torchvision.datasets.CelebA(
    root=image_path, split='train',
    target_type='attr', download=False
)

#Load validation partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='valid',
    target_type='attr', download=False
)

#Load testing partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='test',
    target_type='attr', download=False
)

# Data augmentation

**Data augmentation** refers to a set of techniques for dealing with cases where the training data is limited. Those techniques let us modify or even synthesize more data to bring more variation in the dataset which is good.

To augment our dataset, we need to perform "transformations" on it. Remember, in the folder 03 in `mpl-torch.ipynb`, I said the following:

> I import the torchvision and **transforms** modules. The second module[transform], as the name suggests, let us perform common transformations on **image** data. According to the documentation, Transforms are common image transformations available in the torchvision.transforms module.
>
>
> Another interesting feature is that transform operations can be **chained** together using `Compose`.

Here again, I will use the `transform` module to perform the transformations and use `Compose` to chain those transformations. 

Let's start with the set of transformations to perform on the training partition of the data.

**NOTE: Data augmentation is only applied to the training partition**.

In [2]:
transform_train = transforms.Compose([
    transforms.RandomCrop([178, 178]),
    transforms.RandomHorizontalFlip(),
    transforms.Resize([64, 64]),
    transforms.ToTensor(),
])

Let's continue with specify the set of transformation to perform on both the validation and testing partition of the dataset. Only training examples should be augmented. 

**NOTE: I am not modifying the images themselves, but just croping the images, then resize them to the desired $64 \times 64$**.

In [3]:
transform = transforms.Compose([
    transforms.CenterCrop([178, 178]),
    transforms.Resize([64, 64]),
    transforms.ToTensor()
])

With all the transformation defined, let's *reload* the partitions of the dataset, but this time... I will apply the tranformations defined in the previous cells.

In this introduction of this notebook, I said that the dataset under consideration has 40 attributes for *each* training example. As proof, I print the shape of `celeba_train_dataset.attr`.

In [4]:
celeba_train_dataset.attr.shape

torch.Size([162770, 40])

There are 40 columns. One for each attribute. The same applies for each partition we loaded earlier. For this model, I am interested in only one of them: The **smilling** attribute, and it is the 32nd attribute.

So, I write the `get_smile` function whose job will be to extract the smilling attribute from the 40 attributes. The function is be passed as `target_transform` parameter when the dataset partitions are reloaded in the cells below. 

When loading a dataset the function specifed as `target_transform` is passed the attribute tensor (containing target variables), and manipulates it; which in our case, is grabbing the 32nd column.

In [5]:
get_smile = lambda attr: attr[31]

Okay, with `get_smile` out of the way, let's reload the partitions of the dataset.

In [6]:
#Reload training partition of the dataset
celeba_train_dataset = torchvision.datasets.CelebA(
    image_path, split='train',
    target_type='attr', download=False,
    transform=transform_train, target_transform=get_smile #extract smiling attribute
)

#Reload validation partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='valid',
    target_type='attr', download=False,
    transform=transform, target_transform=get_smile
)

#Reload testing partition of the dataset
celeba_test_dataset = torchvision.datasets.CelebA(
    root=image_path, split='test',
    target_type='attr', download=False,
    transform=transform, target_transform=get_smile
)

I can now, create data loaders for the three partitions of the dataset.

In [7]:
from torch.utils.data import DataLoader

batch_size = 32
torch.manual_seed(1)

train_dl = DataLoader(celeba_train_dataset,
                      batch_size, shuffle=True)

valid_dl = DataLoader(celeba_valid_dataset,
                      batch_size, shuffle=False)

test_dl = DataLoader(celeba_test_dataset,
                     batch_size, shuffle=False)

# Implementing the model in PyTorch

I implement the model now using the `torch.nn` module.

In [8]:
import torch.nn as nn

model = nn.Sequential()

I proceed with adding the first convolutional layer, followed by the first `ReLU` activation layer, a max-pooling layer, and dropout layer.

*Note: This first convolutional layer outputs 32 feature maps.*

In [9]:
model.add_module(
    'conv1',
    nn.Conv2d(
        in_channels=3, out_channels=32,
        kernel_size=3, padding=1
    )
)

model.add_module('relu1', nn.ReLU())
model.add_module('pool1', nn.MaxPool2d(kernel_size=2))
model.add_module('dropout1', nn.Dropout(p=0.5))

I continue with adding the second convolutional layer, followed by the second `ReLU` activation layer, another max-pooling layer, and the second dropout layer.

*Note: This second convolutional layer outputs 64 feature maps.*

In [10]:
model.add_module(
    'conv2',
    nn.Conv2d(
        in_channels=32, out_channels=64,
        kernel_size=3, padding=1
    )
)

model.add_module('relu2', nn.ReLU())
model.add_module('pool2', nn.MaxPool2d(kernel_size=2))
model.add_module('dropout2', nn.Dropout(p=0.5))

I continue and add the third convolutional layer. I follow it with a `ReLU` activation layer and a max-pooling layer.

*Note: This third convolutional layer outputs 128 feature maps*

In [11]:
model.add_module(
    'conv3',
    nn.Conv2d(
        in_channels=64, out_channels=128,
        kernel_size=3, padding=1
    )
)

model.add_module('relu3', nn.ReLU())
model.add_module('pool3', nn.MaxPool2d(kernel_size=2))

Now, I add the fourth, and final convolutional layer to the model. As before, I follow this convolutional layer with a `ReLU` activation layer, but this I follow the ReLU layer with an average pooling layer and not a max-pooling layer.

*Note: This fourth layer outputs 256 feature maps*

In [12]:
model.add_module(
    'conv4',
    nn.Conv2d(
        in_channels=128, out_channels=256,
        kernel_size=3, padding=1
    )
)

model.add_module('relu4', nn.ReLU())
model.add_module('pool4', nn.AvgPool2d(kernel_size=8))

In [13]:
x = torch.ones((4, 3, 64, 64))
model(x).shape

torch.Size([4, 256, 1, 1])

The outputs of the fourth convolutional layer have 256 channels, where each feature map has height and width of $8$. In other words, outputs from the last convolutional layer have the shape $[N \times 256 \times 8 \times 8]$, where $N$ is the batch size.

This output is then fed into the Average pooling layer which computes the average value of each feature map (i.e. channel) resulting in a single value. The output of the average pooling layer is thus of the shape $[N \times 256 \times 1 \times 1]$ where $N$ is the batch size.

I then flatten the output of the average pooling layer, and feeds it into a fully connected layer. After flattening the shape of the output is: $[N \times 256]$ where $N$ is the batch size.

In [14]:
model.add_module('flatten', nn.Flatten())
model.add_module('fc', nn.Linear(256, 1))
model.add_module('sigmoid', nn.Sigmoid()) #To get a final output between 0 and 1

I finished implementing the model. I now define the loss function, and the optimizer for the model. 

In [21]:
loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

I am ready to train the model. But before going ahead and do that, I would like to print the model, to list all the layers I added.

In [18]:
model

Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (dropout1): Dropout(p=0.5, inplace=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (dropout2): Dropout(p=0.5, inplace=False)
  (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu3): ReLU()
  (pool3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv4): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu4): ReLU()
  (pool4): AvgPool2d(kernel_size=8, stride=8, padding=0)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

# Training the model

Like before, I would like to train this model on the GPU.

In [22]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

Below, the model is moved to the GPU

In [23]:
model = model.to(device)

Just like I did in `cnn-mnist.ipynb`, I now implement a `train` function where I write logic to train the model we implemented earlier.

In [25]:
def train(model, num_epochs, train_dl, valid_dl):
    #Loss and accuracy PER epoch during training
    loss_hist_train = [0] * num_epochs
    accuracy_hist_train = [0] * num_epochs

    #Loss and accuracy PER epoch during validation
    loss_hist_valid = [0] * num_epochs
    accuracy_hist_valid = [0] * num_epochs

    
    for epoch in range(num_epochs):
        model.train()
        for x_batch, y_batch in train_dl:
            x_batch, y_batch = x_batch.to(device), y_batch.to(device)

            #forward pass, then compute loss
            pred = model(x_batch)[:, 0] #grad numbers in first column
            loss = loss_fn(pred, y_batch.float())

            #Compute gradients using backprop, update weights, then reset gradients
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            
            #running loss for current batch. Accumulated for the entire epoch
            loss_hist_train[epoch] += loss.item() * y_batch.size(0)

            #Correct prediction for current batch. Accumulated for the entire epoch
            is_correct = (
                ((pred >= 0.5).float() == y_batch).float()
            ).float()
            accuracy_hist_train[epoch] += is_correct.sum()

        #Average loss and accuracy for this epoch
        loss_hist_train[epoch] /= len(train_dl.dataset)
        accuracy_hist_train[epoch] /= len(train_dl.dataset)

        
        model.eval()
        with torch.no_grad():
            for x_batch, y_batch in valid_dl:
                x_batch, y_batch = x_batch.to(device), y_batch(device)
                pred = model(x_batch)[:, 0]
                loss = loss_fn(pred, y_batch.float())
                loss_hist_valid[epoch] += loss.item() * y_batch.size(0)
                is_correct = (
                    ((pred >= 0.5).float() == y_batch).float()
                ).float()
                accuracy_hist_valid[epoch] += is_correct.sum()

        loss_hist_valid[epoch] /= len(valid_dl.dataset)
        accuracy_hist_valid[epoch] /= len(valid_dl.dataset)

        print(f'Epoch{epoch + 1}: training accuracy: '
              f'{accuracy_hist_train[epoch]:.4f} validation accuracy: '
              f'{accuracy_hist_valid[epoch]:.4f}')

    
    return loss_hist_train, loss_hist_valid, accuracy_hist_train, accuracy_hist_valid

With the `train` routine done, let's train the model.

In [None]:
torch.manual_seed(1)

num_epochs = 20
history = train(model, num_epochs=num_epochs, train_dl=train_dl, valid_dl=valid_dl)

# Last words...