# CE7454 2019 Project -- Group 3

**Add the full name here**

Please find all the models and data at [https://github.com/occia/ce7454-group3-project](https://github.com/occia/ce7454-group3-project)

In [None]:
import sys, os
import time
import copy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from random import randint
from torchvision import transforms

## 1. Project Description

Here briefly introduces the background, and throw out the 2 preject questions:
- How accurate the modern neural network models could be? (How to relate this with the Identity Acquisition?)
- What's the performance of age authentication (below/above 18) for current neural networks?

## 2. Data Preparation

### 2.1 Data Acquisition


For the project, we prepare 2 kinds of data:
- training data
- validation/testing data

Due to the requirement of the large amount of labeled data, we merged 3 existing labeled benchmark datasets as our training data, including [All-Age-Faces](https://github.com/JingchunCheng/All-Age-Faces-Dataset), [FGNET](https://yanweifu.github.io/FG_NET_data/index.html), [UTK Face](https://susanqq.github.io/UTKFace).

In total, the amount of labelled images from these 3 benchmarks is 38000, 32818 is used for training and 5182 is used for validation/testing.

// **And need to talk about the instagram data**
// **For instagram data, we need to show some code snippets about the data scraper**

### 2.2 Data Exploration

// **add the age distribution graph for 38000**

// **add the age distribution graph for 1900+ instagram data**

### 2.3 Data Validation


### 2.4 Data Preprocessing

// **this section should describe the bin-size splitting thing, section 3 will use that**

## 3. Models and Training

In this section, we discuss the choosen models, the training configurations for each model, and the whole training pipeline. The outputs of this section are the saved trained weights for all models.

### 3.1 Model Selection

We targeted on 3 representative models in face recognition and age prediction, the MLP, VGG, and ResNet.

As there are many variants of these networks, the first thing is to determine which variants of these model are suitable for our project. 
We probed ResNet18, ResNet50, ResNet152 using parts of the training data (around 10,000) and found that the performance has no big difference. 
Thus we made the following selection:
- ResNet18, resnet with 18 layers
- VGG19_bn, vgg 19 layers with batch normalization
- MLP18, 18-layer mlp

The ResNet and VGG models can directly imported using the following statements:

In [None]:
from torchvision.models import resnet18
from torchvision.models import vgg19_bn

And the MLP model is implemented by ourself and you can find it in the `./src/neural_network/mlp.py` in the [project github](https://github.com/occia/ce7454-group3-project).

For demo usage, here is a smaller version MLP implementation.

In [None]:
# this class is for demo use
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, hidden_size3, hidden_size4, output_size):
        super(MLP, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_size, hidden_size1),
            nn.ReLU(),
            nn.Linear(hidden_size1, hidden_size2),
            nn.ReLU(),
            nn.Linear(hidden_size2, hidden_size3),
            nn.ReLU(),
            nn.Linear(hidden_size3, hidden_size4),
            nn.ReLU(),
            nn.Linear(hidden_size4, output_size)
        )
        
    def forward(self, x):
        # convert tensor (128, 1, 28, 28) --> (128, 1*28*28)
        x = x.view(x.size(0), -1)
        x = self.layers(x)
        return x

### 3.2 Training Configuration

#### 3.2.1 Training Parameters Setup

We keep the following training configuration for all 3 choosen models:
- Learning Rate, the initial value of learning rate is set as `0.001`
- Optimizer, using **Adam** rather than **SGD**
- Criterion, using `torch.nn.CrossEntropyLoss()`
- Epoches, set to 50 as it balances the training time costs and the training consequence
- Batch size, set as 256
- Image pixels, set as `(3, 200, 200)`, 3 means 3 channels (a.k.a colors)

In [None]:
#
# training parameters setup for demo use
#

#device= torch.device("cuda")
device= torch.device("cpu")

channels = 3
img_pixels = (200,200)
lr = 0.001
num_epochs = 50
batch_size = 128

# loading dataset
def loading_dataset(train_dataset, test_dataset):
    transform = transforms.Compose([
        transforms.Resize(img_pixels),
        transforms.ToTensor()])

    img_data_train = torchvision.datasets.ImageFolder(root=train_dataset, transform=transform)
    data_loader_train = torch.utils.data.DataLoader(img_data_train, batch_size=batch_size,shuffle=True)

    img_data_val = torchvision.datasets.ImageFolder(root=test_dataset, transform=transform)
    data_loader_val = torch.utils.data.DataLoader(img_data_val, batch_size=batch_size,shuffle=True)

    dataloaders = {}
    dataloaders['train'] = data_loader_train
    dataloaders['val'] = data_loader_val
    
    return dataloaders

#### 3.2.2 Model Training WorkFlow

The workflow is based on the template teacher provided in the class, and is improved in some aspects.

Here lists the code.

In [None]:
#
# main training workflow
#
def train_model(model, dataloaders, criterion, optimizer, num_epochs=25):
    since = time.time()
    last = since
    time_elapsed = since

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
            
            time_elapsed = time.time() - last
            last = time.time()
            
            print('{} Loss: {:.4f} Acc: {:.4f} Time: {:.0f}m {:.0f}s'.format(phase, epoch_loss, epoch_acc, time_elapsed // 60, time_elapsed % 60))

            # deep copy the modeltopk
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

The weights of the networks are initialized randomly.
Also the images of the dataset are shuffled every time.
The key different parts of our implementation from the teacher's template are:
- we do train & validation for every epoch
- based on the validation result, we save the best epoch's weights, and return that instead of the one be trained longest

### 3.3 Training Pipeline

Till now, we know which model to train and how to train a model. To answer the questions we raised at the beginning, we need to train all the combinations of the selected models and the prepared datasets.

Thus, the next step is building the training pipeline for all training combinations.

In [None]:
!mkdir -p ./saved_models

def training_and_save_model(net, num_epochs, model_save_name):
    net = net.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer=torch.optim.Adam(net.parameters(), lr)
    net, _ = train_model(net, dataloaders, criterion, optimizer, num_epochs)
    torch.save(net.state_dict(), os.path.join("./saved_models/", model_save_name))

#
# whole training pipeline
#
for binsize in [1, 6, 10]:
    classes = (100 + binsize - 1) / binsize
    
    dataloaders = loading_dataset("./dataset/merged_train_bin%d" % (binsize), "./dataset/merged_test_bin%d" % (binsize))
    
    for model in ["MLP", "ResNet", "VGG"]:
        if model == "MLP":
            net = MLP(channels * img_pixels[0] * img_pixels[1], 512, 512, 512, 512, classes)
        else if model == "ResNet":
            net = resnet18(num_classes=classes)
        else:
            net = vgg19_bn(num_classes=classes)
        
        print("[+] Training for %s with binsize %d dataset started" % (model, binsize))
        
        model_save_name = "%s_%s_merged_train_bin%d" % (num_epochs, net.__class__.__name__, binsize)
        training_and_save_model(net, num_epochs, model_save_name)
        
        print("[+] Training for %s with binsize %d dataset done" % (model, binsize))

        del net

As shown in the pipeline code, we saved weights of the best epoch for all the models towards all the datasets.

## 4. Evaluation

### 4.1 Accuracy Comparison Among Models

### 4.2 Identity Acquision Accuracy Cross Ages

### 4.3 Age Authentication For 18
