# 1. Model Training Pipeline

The code that was given contains only the hyperparameters and the dataloaders required for training the model.

First of all, we coded the algorithm to train the model based on the given hyperparameters. We set the `batch_size` to be 64 and go for 32 epochs, even though it seems unnecessary as the total running loss stabilizes quickly at around 15 epochs. We used the Stochastic Gradient Descent optimizer at first with a learning rate of 0.0001. The loss function used is the Cross Entropy Loss. The model was trained on the CPU.

At first, the model had a 92% accuracy which is already very good, due to the quality of the dataset that we got. However, the process was painfully slow, as it would take up to almost an hour for 20 epochs. Furthermore, the running loss was quite high, which means that the model was not learning very well.

Therefore, we made many modifications to the first pipeline to improve the performance:
- We changed to use the GPU instead of the CPU if it's available to massively speed up the training process.
- We changed the number of workers in the dataloader to 4 to allow the data to be loaded in parallel. Adding more workers improved the performance for more powerful computers, but for some, it used too much memory and caused the training process to crash, so we settle at 4 to be sure that it works for everyone.
- The optimizer was changed to **Adam** with a learning rate of **0.001**.

These modifications cause the model to have the accuracy of around 94%, and the running loss stabilizes at a low value of 1-4. Plus, we were satisfied with our learning speed As the model hasn't reached the desired accuracy of 98% yet, we tried to improve the model in other pipelines.


In [1]:
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
from torch.utils.data.sampler import SubsetRandomSampler
from net import Net

In [2]:
torch.cuda.empty_cache()

In [3]:
# Use CUDA if possible
device = torch.device("cpu")
if torch.cuda.is_available():
    device = torch.device("cuda")

In [4]:
train_dir = './train_images'    # folder containing training images
test_dir = './test_images'    # folder containing test images

transform = transforms.Compose(
    [transforms.Grayscale(),   # transforms to gray-scale (1 input channel)
     transforms.ToTensor(),    # transforms to Torch tensor (needed for PyTorch)
     transforms.Normalize(mean=(0.5,),std=(0.5,))]) # subtracts mean (0.5) and devides by standard deviation (0.5) -> resulting values in (-1, +1)


In [5]:
# Define two pytorch datasets (train/test) 
train_data = torchvision.datasets.ImageFolder(train_dir, transform=transform)
test_data = torchvision.datasets.ImageFolder(test_dir, transform=transform)

valid_size = 0.2   # proportion of validation set (80% train, 20% validation)
batch_size = 64

# Define randomly the indices of examples to use for training and for validation
num_train = len(train_data)
indices_train = list(range(num_train))
np.random.shuffle(indices_train)
split_tv = int(np.floor(valid_size * num_train))
train_new_idx, valid_idx = indices_train[split_tv:],indices_train[:split_tv]

# Define two "samplers" that will randomly pick examples from the training and validation set
train_sampler = SubsetRandomSampler(train_new_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

In [6]:
len(train_data)

91720

In [7]:
# Dataloaders (take care of loading the data from disk, batch by batch, during training)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, num_workers=4)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=4)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True, num_workers=4)

classes = ('noface','face')  # indicates that "1" means "face" and "0" non-face (only used for display)

In [8]:
net = Net()
net = net.to(device)
n_epochs = 32

optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()

In [9]:
# Training 
running_loss =0
# loop over epochs: one epoch = one pass through the whole training dataset
for epoch in range(1, n_epochs+1):  
#   loop over iterations: one iteration = 1 batch of examples
    running_loss =0
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad() # zero the gradient buffers
        output = net(data)
        loss = criterion(output, target)
        running_loss +=loss
        loss.backward()
        optimizer.step() # Does the update
    print ('epoch: %d, running_loss: %5.7f' % (epoch,running_loss))  

epoch: 1, running_loss: 125.2362366
epoch: 2, running_loss: 32.6978760
epoch: 3, running_loss: 22.5062962
epoch: 4, running_loss: 17.6517105
epoch: 5, running_loss: 13.6878967
epoch: 6, running_loss: 10.4754963
epoch: 7, running_loss: 9.5012617
epoch: 8, running_loss: 8.0409603
epoch: 9, running_loss: 6.7228460
epoch: 10, running_loss: 6.2631798
epoch: 11, running_loss: 4.6120424
epoch: 12, running_loss: 5.2147675
epoch: 13, running_loss: 4.7225890
epoch: 14, running_loss: 4.1333299
epoch: 15, running_loss: 3.9522016
epoch: 16, running_loss: 3.6043513
epoch: 17, running_loss: 4.1491671
epoch: 18, running_loss: 3.0875738
epoch: 19, running_loss: 4.0652013
epoch: 20, running_loss: 3.0250406
epoch: 21, running_loss: 2.5669188
epoch: 22, running_loss: 3.5642259
epoch: 23, running_loss: 2.5443280
epoch: 24, running_loss: 3.4569964
epoch: 25, running_loss: 3.1803613
epoch: 26, running_loss: 3.4557011
epoch: 27, running_loss: 1.7927408
epoch: 28, running_loss: 3.5596201
epoch: 29, running_los

In [10]:
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %5.4f %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 94.7824 %


In [11]:
# Save the trained model
torch.save(net.state_dict(), './saved_model.pth')