# 3. Imbalanced training dataset

In this part, we will try to implement with imbalanced data as the input. It refers to the situation where the distribution of classes in the target variable is not equal. In our case, we will try to train the model on a dataset that comprised much more non-face images than face images.
Imbalanced data can cause classification algorithms to have a biased decision boundary. As such the algorithms may favor the majority class, leading to poor performance and low prediction accuracy for the minority class. We will see if the model cannot predict well the images showing a face - the minority class in our situation.

We start by importing the necessary libraries and modules

In [3]:
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
from torch.utils.data.sampler import SubsetRandomSampler
from torchsampler import ImbalancedDatasetSampler

from net import Net

In [4]:
# Use CUDA if possible
device = torch.device("cpu")
if torch.cuda.is_available():
    device = torch.device("cuda")

We load the data from the same folders and with the same transformation to Pytorch tensor as in previous parts

In [5]:
train_dir = './train_images_imbalanced'    # folder containing training images
test_dir = './test_images'    # folder containing test images

transform = transforms.Compose(
    [transforms.Grayscale(),   # transforms to gray-scale (1 input channel)
     transforms.ToTensor(),    # transforms to Torch tensor (needed for PyTorch)
     transforms.Normalize(mean=(0.5,),std=(0.5,))]) # subtracts mean (0.5) and devides by standard deviation (0.5) -> resulting values in (-1, +1)

In [6]:
# Define two pytorch datasets (train/test) 
train_data = torchvision.datasets.ImageFolder(train_dir, transform=transform)
test_data = torchvision.datasets.ImageFolder(test_dir, transform=transform)

valid_size = 0.2   # proportion of validation set (80% train, 20% validation)
batch_size = 32    

# Define randomly the indices of examples to use for training and for validation
num_train = len(train_data)
indices_train = list(range(num_train))
np.random.shuffle(indices_train)
split_tv = int(np.floor(valid_size * num_train))
train_new_idx, valid_idx = indices_train[split_tv:],indices_train[:split_tv]


Here, instead of using the SubsetRandomSampler class, we use the ImbalancedDatasetSampler class, which will take more non-face images than face images to our training data set.  

In [27]:
# Define two "samplers" that will pick examples from the training and validation set in an imbalanced way
train_sampler = SubsetRandomSampler(train_new_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

# Dataloaders (take care of loading the data from disk, batch by batch, during training)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, num_workers=4)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=4)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True, num_workers=4)

classes = ('noface','face')  # indicates that "1" means "face" and "0" non-face (only used for display)

We can see below, the percentage of face images is only around 17%, which is small compare to the number of non-face images.

In [28]:
print("number of train samples: ", len(train_sampler))

nb_faces = 0
for data, target in train_loader:
    nb_faces += target.sum().item()

print("number of face images is: ", nb_faces)


number of train samples:  25904
number of face images is:  4349


We keep the same optimizer and criterion as previous parts. Then, we train our model with the previous imbalanced training set.

In [31]:
net = Net()
net = net.to(device)
n_epochs = 8
optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()

In [32]:
# Training 
running_loss =0
# loop over epochs: one epoch = one pass through the whole training dataset
for epoch in range(1, n_epochs+1):  
#   loop over iterations: one iteration = 1 batch of examples
    running_loss =0
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad() # zero the gradient buffers
        output = net(data)
        loss = criterion(output, target)
        running_loss +=loss
        loss.backward()
        optimizer.step() # Does the update
    print ('epoch: %d, running_loss: %5.7f' % (epoch,running_loss))  

epoch: 1, running_loss: 99.1066437
epoch: 2, running_loss: 17.0050240
epoch: 3, running_loss: 9.7368145
epoch: 4, running_loss: 6.8766294
epoch: 5, running_loss: 5.3132062
epoch: 6, running_loss: 3.8671057
epoch: 7, running_loss: 3.7035697
epoch: 8, running_loss: 2.6331968


Now, we want to create a classification map that contains the number of true positive, false positive, true negative, false negative of the predictions of our trained model.

In [34]:
classification_map = {"TP" : 0,
                      "FP" : 0,
                      "TN" : 0,
                      "FN" : 0}

correct = 0
total = 0
count = 0
with torch.no_grad():
    for data in test_loader:
        count+=1
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        
        for i in range(0,len(labels)):
            if predicted[i].item() == labels[i].item():
                if predicted[i].item() == 1:
                    classification_map["TP"] +=1
                else:
                    classification_map["TN"] +=1
            elif predicted[i].item() == 1:
                classification_map["FP"] +=1
            else: 
                classification_map["FN"] +=1

        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %5.6f %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 91.858941 %


Now, let's take a look at the performance metrics of the model. The scores are good enough, except for the Recall score and F score. The Recall score calculates the ratio of good predictions of face images over total number of face images. This score is very low (only 0.23 in this case) because the model predicted many false negatives. It is coherent with the known issue with imbalanced training dataset there will be low prediction accuracy for the minority class (in our case is the face image). Since the Recall score is low, then the F score is also low.

In [36]:

print ("TP: ", classification_map["TP"])
print ("TN: ", classification_map["TN"])
print ("FP: ", classification_map["FP"])
print ("FN: ", classification_map["FN"])

print("\n")

stats_map = {
            "Specificity" : float(classification_map["TN"]) / float(classification_map["TN"] + classification_map["FP"]),
            "Recall" : float(classification_map["TP"]) / float(classification_map["TP"] + classification_map["FN"]),
            "Precision" : float(classification_map["TP"]) / float(classification_map["TP"] + classification_map["FP"]),
            "Accuracy" : float(classification_map["TP"] + classification_map["TN"]) / float(classification_map["TP"] + classification_map["TN"] + classification_map["FP"] + classification_map["FN"])
        }
stats_map["F-score"] = 2.0 / float((1.0 / float(stats_map["Precision"])) + (1.0 / float(stats_map["Recall"])))

for key, value in stats_map.items():
    print(key, ": ", value)

TP:  183
TN:  6824
FP:  7
FN:  614


Specificity :  0.998975259844825
Recall :  0.22961104140526975
Precision :  0.9631578947368421
Accuracy :  0.9185894074462506
F-score :  0.3708206686930091


With the issue of the having imbalanced training dataset, it is necessary to avoid this problem. There are several techniques that can be implemented to reduce the effect of imbalanced data such as oversampling (adding more samples to the minority class), undersampling (removing samples from the majority class), or by using class weights. 