# Convolutional Neural Networks - Smile Classifier

In this notebook, I implement a CNN to classify face images based on smiles. I also explore data augmentation techiniques to enhance the model's performance.

The CNN I am to create has four convolutional layers producing 32, 64, 128, and 256 feature maps respectively. All those convolutional layers use a kernel size of $3 \times 3$ with padding 1. The first three convolution layers are followed by max-pooling layers $P_{2 \times 2}$. Two dropout layers are also included for regularization.

Let's jump in right away and load the celebA dataset.

# Loading the CelebA dataset

CelebFaces Attributes Dataset, or CelebA for short, is an image dataset that identifies celebrity face attributes. It contains 202,599 face images across five landmark locations, with 40 binary attribute annotations for each image. 

Tought the dataset is available through the PyTorch's `torchvision` module, the link appears to be unstable. So, I downloaded the dataset manually using this [link](https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing)

In [1]:
import torch
import torchvision
from torchvision import transforms

image_path = './dataset'

#Load training partition of the dataset
celeba_train_dataset = torchvision.datasets.CelebA(
    root=image_path, split='train',
    target_type='attr', download=False
)

#Load validation partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='valid',
    target_type='attr', download=False
)

#Load testing partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='test',
    target_type='attr', download=False
)

# Data augmentation

**Data augmentation** refers to a set of techniques for dealing with cases where the training data is limited. Those techniques let us modify or even synthesize more data to bring more variation in the dataset which is good.

To augment our dataset, we need to perform "transformations" on it. Remember, in the folder 03 in `mpl-torch.ipynb`, I said the following:

> I import the torchvision and **transforms** modules. The second module[transform], as the name suggests, let us perform common transformations on **image** data. According to the documentation, Transforms are common image transformations available in the torchvision.transforms module.
>
>
> Another interesting feature is that transform operations can be **chained** together using `Compose`.

Here again, I will use the `transform` module to perform the transformations and use `Compose` to chain those transformations. 

Let's start with the set of transformations to perform on the training partition of the data.

**NOTE: Data augmentation is only applied to the training partition**.

In [2]:
transform_train = transforms.Compose([
    transforms.RandomCrop([178, 178]),
    transforms.RandomHorizontalFlip(),
    transforms.Resize([64, 64]),
    transforms.ToTensor(),
])

Let's continue with specify the set of transformation to perform on both the validation and testing partition of the dataset. 

**NOTE: I am not modifying the images themselves, but just croping the images, then resize them to the desired $64 \times 64$**.

In [3]:
transform = transforms.Compose([
    transforms.CenterCrop([178, 178]),
    transforms.Resize([64, 64]),
    transforms.ToTensor()
])

With all the transformation defined, let's *reload* the partitions of the dataset, but this time... I will apply the tranformations defined in the previous cells.

In this introduction of this notebook, I said that the dataset under consideration has 40 attributes for *each* training example. As proof, I print the shape of `celeba_train_dataset.attr`.

In [4]:
celeba_train_dataset.attr.shape

torch.Size([162770, 40])

There are 40 columns. One for each attribute. The same applies for each partition we loaded earlier. For this model, I am interested in only one of them: The **smilling** attribute, and it is the 32nd attribute.

So, I write the `get_smile` function whose job will be to extract the smilling attribute from the 40 attributes. The function is be passed as `target_transform` parameter when the dataset partitions are reloaded in the cells below. 

When loading a dataset the function specifed as `target_transform` is passed the attribute tensor (containing target variables), and manipulates it; which in our case, is grabbing the 32nd column.

In [5]:
get_smile = lambda attr: attr[31]

Okay, with `get_smile` out of the way, let's reload the partitions of the dataset.

In [6]:
#Reload training partition of the dataset
celeba_train_dataset = torchvision.datasets.CelebA(
    image_path, split='train',
    target_type='attr', download=False,
    transform=transform_train, target_transform=get_smile #extract smiling attribute
)

#Reload validation partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='valid',
    target_type='attr', download=False,
    transform=transform, target_transform=get_smile
)

#Reload testing partition of the dataset
celeba_valid_dataset = torchvision.datasets.CelebA(
    root=image_path, split='test',
    target_type='attr', download=False,
    transform=transform, target_transform=get_smile
)

# Implementing the model in PyTorch

I implement the model now using the `torch.nn` module.

In [7]:
import torch.nn as nn

model = nn.Sequential()

I proceed with adding the first convolutional layer, followed by the first `ReLU` activation layer, a max-pooling layer, and dropout layer.

*Note: This first convolutional layer outputs 32 feature maps.*

In [9]:
model.add_module(
    'conv1',
    nn.Conv2d(
        in_channels=3, out_channels=32,
        kernel_size=3, padding=1
    )
)

model.add_module('relu1', nn.ReLU())
model.add_module('pool1', nn.MaxPool2d(kernel_size=2))
model.add_module('dropout1', nn.Dropout(p=0.5))

I continue with adding the second convolutional layer, followed by the second `ReLU` activation layer, another max-pooling layer, and the second dropout layer.

*Note: This second convolutional layer outputs 64 feature maps.*

In [10]:
model.add_module(
    'conv2',
    nn.Conv2d(
        in_channels=32, out_channels=64,
        kernel_size=3, padding=1
    )
)

model.add_module('relu2', nn.ReLU())
model.add_module('pool2', nn.MaxPool2d(kernel_size=2))
model.add_module('dropout2', nn.Dropout(p=0.5))

I continue and add the third convolutional layer. I follow it with a `ReLU` activation layer and a max-pooling layer.

*Note: This third convolutional layer outputs 128 feature maps*

In [11]:
model.add_module(
    'conv3',
    nn.Conv2d(
        in_channels=64, out_channels=128,
        kernel_size=3, padding=1
    )
)

model.add_module('relu3', nn.ReLU())
model.add_module('pool3', nn.MaxPool2d(kernel_size=2))

Now, I add the fourth, and final convolutional layer to the model. As before, I follow this convolutional layer with a `ReLU` activation layer, and a max-pooling layer as well.

*Note: This fourth layer outputs 256 feature maps*

In [12]:
model.add_module(
    'conv4',
    nn.Conv2d(
        in_channels=128, out_channels=256,
        kernel_size=3, padding=1
    )
)

model.add_module('relu4', nn.ReLU())
model.add_module('pool4', nn.MaxPool2d(kernel_size=2))

# Training the model

# Last words...