<a href="https://colab.research.google.com/github/shayan-mk/Semi-supervisied_CNN_MNIST/blob/master/CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Section 0: Introduction

# Abstract:

The objective of this project is to demonstrate the effect of semi-supervised image classification over its fully supervised counterpart. Using the MNIST dataset of handwritten digits, the motivation being to show the benefits of having unlabeled data points in the training dataset. Using a convolutional neural network trained over a subset of the labeled MNIST dataset, we leverage its learned associations as an encoder to extract deep features from the remaining subset of the labeled dataset. In the CNN, we use a series of convolutional layers to expose low level features before flattening them for the linear classifier. Applying unsupervised K-means clustering over the unlabeled subset of training data, we introduce a clustering loss factor to these data points in relation to the ground truth label. Using the K-nearest neighbors algorithms, we propagate the set of assignable labels (0-9) over the 10 clusters. The CNN is then trained over the full set of ground-truth and generated-label datapoints. The goal then is to show the varying degrees of effectiveness between fully labeled and different ratios of unlabeled-to-labeled datasets. Illustrated and graphed by the change in loss and accuracy at each batch across all epochs for both the training and test MNIST datasets.

# Team members and contributions:

- Bowen Luo (b23luo@uwaterloo.ca)

# Code outline:

- Apply data augmentation techniques to expand the training dataset with rotation, cropping, blurring, and noise transformations to reduce overfitting.
- Train the initial fully supervised convolutional neural network on the labeled subset of MNIST data.
- Extract deep features from the last layer before the linear classifier for the unlabeled data points.
- Perform K-means clustering on the deep features to assign cluster labels to the unlabeled data points.
- Use K-Nearest Neighbors (KNN) on the deep features to further refine the labels assigned by K-means clustering based on the labels of their nearest neighbors in the feature space.
- Combine the labeled, K-means labeled, and KNN-refined labeled data points into a single dataset.
- Fine-tune the neural network using this dataset with the combined loss function.

Finally, Evaluate the model performance with varying a variying ratio of labeled data to demonstrate how the performance changes as the ratio gets progressively smaller. We will do this by repeating the above steps for different proportions of the training MNIST dataset.

# Section 1: Code libraries

Aside from essential libraries such as math and numpy, pytorch is used for its tensors, transforms, dataset objects, as well as general CNN related functions including loss functions, convolutional layers, and forward/backward passes over the CNN. Matplotlib is used to graph the loss and accuracies over the batches, sklearn is used for K-means clustering, and we use KeOps for its conversion of tensors into symbolic variables for low overhead matrix operations for K-nearest-neighbors.

Note: running on colab <strong>using GPU</strong>, the following code cell is included for installation of KeOps (pykeops). Locally, it requires the CUDA toolkit and compatible g++ compiler. Further installation requirements can be found here: https://www.kernel-operations.io/keops/python/installation.html

In [9]:
!pip install pykeops

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pykeops
  Downloading pykeops-2.1.2.tar.gz (88 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.9/88.9 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pybind11
  Downloading pybind11-2.10.4-py3-none-any.whl (222 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.3/222.3 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting keopscore==2.1.2
  Downloading keopscore-2.1.2.tar.gz (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.5/84.5 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pykeops, keopscore
  Building wheel for pykeops (setup.py) ... [?25l[?25hdone
  Created wheel for pykeops: filename=pykeops-2.1.2-py3-none-any.whl size=114095 sha2

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torchvision.datasets as datasets
import matplotlib.pyplot as plt
import math
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from pykeops.torch import LazyTensor

In [11]:
USE_GPU = True
EPOCH = 2
BATCH = 60
LABELED_RATIO = 0.2
DEVICE = "cpu"

In [12]:
def set_device():
    global DEVICE
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Section 2: Define the CNN architecture

In [13]:
class Net(nn.Module):
    # create convolutional and batch layers 1 through 8 in constructor
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, 3, padding=1)
        self.conv1_bn = nn.BatchNorm2d(64)
        
        self.conv2 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv2_bn = nn.BatchNorm2d(128)
        
        self.conv3 = nn.Conv2d(128, 256, 3, padding=1)
        self.conv3_bn = nn.BatchNorm2d(256)
        
        self.conv4 = nn.Conv2d(256, 256, 3, padding=1)
        self.conv4_bn = nn.BatchNorm2d(256)
        
        self.conv5 = nn.Conv2d(256, 512, 3, padding=1)
        self.conv5_bn = nn.BatchNorm2d(512)
        
        self.conv6 = nn.Conv2d(512, 512, 3, padding=1)
        self.conv6_bn = nn.BatchNorm2d(512)
        
        self.conv7 = nn.Conv2d(512, 512, 3, padding=1)
        self.conv7_bn = nn.BatchNorm2d(512)
        
        self.conv8 = nn.Conv2d(512, 512, 3, padding=1)
        self.conv8_bn = nn.BatchNorm2d(512)
        # create fully connected layers and dropout
        self.fc1 = nn.Linear(512, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 10)
        
        self.dropout = nn.Dropout(0.5)

    # define forward pass through network, conversely, could use nn.sequential
    def forward(self, x, extract_features=False):
        x = F.max_pool2d(F.relu(self.conv1_bn(self.conv1(x))), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2_bn(self.conv2(x))), (2, 2))
        x = F.relu(self.conv3_bn(self.conv3(x)))
        x = F.max_pool2d(F.relu(self.conv4_bn(self.conv4(x))), (2, 2))
        x = F.relu(self.conv5_bn(self.conv5(x)))
        x = F.max_pool2d(F.relu(self.conv6_bn(self.conv6(x))), (2, 2))
        x = F.relu(self.conv7_bn(self.conv7(x)))
        x = F.max_pool2d(F.relu(self.conv8_bn(self.conv8(x))), (2, 2))
        
        x = torch.flatten(x, 1)

        if extract_features:
            return x

        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

# Section 3: Load, augment, preprocess, and split the dataset
- Augment data with techniques such as random rotations, translations, and flips.
- Preprocess and split the dataset into labeled, unlabeled, and test sets.

In [14]:
transform = transforms.Compose([
    transforms.Resize(32), 
    transforms.ToTensor()
])

crop_augmentation = transforms.Compose([
    transforms.Resize(32),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomResizedCrop(24),
    transforms.ToTensor()
])

gaussian_blur_augmentation = transforms.Compose([
    transforms.Resize(32),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.GaussianBlur(3, sigma=(0.1, 2.0)),
    transforms.ToTensor()
])

gaussian_noise_augmentation = transforms.Compose([
    transforms.Resize(32),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ToTensor(),
    transforms.Lambda(lambda x : x + math.sqrt(0.1)*torch.randn_like(x))
])

In [15]:
mnist_train = datasets.MNIST(root='./data', train=True, download=True,
                                transform=transform)
mnist_train_crop = datasets.MNIST(root='./data', train=True, download=False, 
                                transform=gaussian_blur_augmentation)
mnist_train_blur = datasets.MNIST(root='./data', train=True, download=False, 
                                transform=gaussian_noise_augmentation)
mnist_train_noise = datasets.MNIST(root='./data', train=True, download=False, 
                                transform=transform)

augmented_trainset = torch.utils.data.ConcatDataset([
    mnist_train, 
    mnist_train_crop, 
    mnist_train_blur, 
    mnist_train_noise])

mnist_test = datasets.MNIST(root='./data', train=False, download=True,
                               transform=transform)
mnist_test_crop = datasets.MNIST(root='./data', train=False, download=False, 
                                transform=gaussian_blur_augmentation)
mnist_test_blur = datasets.MNIST(root='./data', train=False, download=False, 
                                transform=gaussian_noise_augmentation)
mnist_test_noise = datasets.MNIST(root='./data', train=False, download=False, 
                                transform=transform)

augmented_testset = torch.utils.data.ConcatDataset([
    mnist_test, 
    mnist_test_crop, 
    mnist_test_blur, 
    mnist_test_noise])

labeled_size = int(LABELED_RATIO * len(augmented_trainset))
unlabeled_size = len(augmented_trainset) - labeled_size
labeled_augmented_trainset, unlabeled_augmented_trainset = torch.utils.data.random_split(augmented_trainset, [labeled_size, unlabeled_size])

fully_labeled_loader = torch.utils.data.DataLoader(augmented_trainset, BATCH, shuffle=True)

labeled_loader = torch.utils.data.DataLoader(labeled_augmented_trainset, BATCH, shuffle=True)
unlabeled_loader = torch.utils.data.DataLoader(unlabeled_augmented_trainset, BATCH, shuffle=True)

test_loader = torch.utils.data.DataLoader(augmented_testset, 1, shuffle=True)

# Section 6: Define the training and evaluation functions

In [18]:
def train(model, dataloader, opt, loss_fn):
    model.train()
    correct = 0
    total_loss = 0
    train_loss = []
    train_acc = []

    for batch, (train, label) in enumerate(dataloader):
        train, label = train.to(DEVICE), label.to(DEVICE)
        opt.zero_grad()
        output = model(train)
        loss = loss_fn(output, label)
        loss.backward()
        opt.step()

        _, predicted = torch.max(output, dim=1)
        correct += (predicted == label).float().sum().item()
        total_loss += loss.item()

        train_loss.append(loss)
        train_acc.append(100 * correct / ((batch + 1) * BATCH))

    accuracy = 100 * correct / len(dataloader.dataset)
    average_loss = total_loss / len(dataloader)

    return average_loss, accuracy, train_loss, train_acc

def evaluate(model, dataloader, loss_fn):
    model.eval()
    correct = 0
    total_loss = 0
    test_loss = []
    test_acc = []

    with torch.no_grad():
        for i, (test, label) in enumerate(dataloader):
            test, label = test.to(DEVICE), label.to(DEVICE)
            output = model(test)
            loss = loss_fn(output, label)

            _, predicted = torch.max(output, dim=1)
            correct += (predicted == label).float().sum().item()
            total_loss += loss.item()

            test_loss.append(loss)
            test_acc.append(100 * correct / (i + 1))

    accuracy = 100 * correct / len(dataloader.dataset)
    average_loss = total_loss / len(dataloader)

    return average_loss, accuracy, test_loss, test_acc

# Extract deep features from the given model
def extract_deep_features(model, dataloader):
    model.eval()
    features_list = []
    label_list = []
    with torch.no_grad():
        for train, label in dataloader:
            train = train.to(DEVICE)
            features = model.forward(train, extract_features=True)
            features_list.append(features.cpu())
            label_list.append(label)

    features_tensor = torch.cat(features_list)
    labels_tensor = torch.cat(label_list)
    return features_tensor, labels_tensor

# Perform K-means clustering on the given data
def k_means_clustering(data, n_clusters):
    kmeans = KMeans(n_clusters=n_clusters, n_init=10, random_state=42)
    kmeans.fit(data)
    return kmeans

# Assign pseudo-labels to the unlabeled data using KNN
def knn_only_labeling(labeled_features, labels, unlabeled_features, k):
    labeled_broadcast = LazyTensor(labeled_features.unsqueeze(0))
    neighbors = LazyTensor(unlabeled_features.unsqueeze(1))

    # [unlabeled_features x labeled_features] squared L2 distances
    L2_dist = ((labeled_broadcast - neighbors) ** 2).sum(-1)
    knn_indices = L2_dist.argKmin(k, dim=1)
    knn_labels = labels[knn_indices]

    pseudo_labels, _ = knn_labels.mode(dim=1)
    return pseudo_labels

def knn_clustering_labeling(labeled_features, labels, unlabeled_features, k=5, n_clusters=10):
    kmeans_model = k_means_clustering(unlabeled_features, n_clusters)
    pseudo_labels = torch.zeros(len(unlabeled_augmented_trainset))

    for cluster_index in range(n_clusters):
        cluster_indices = torch.nonzero(
            torch.from_numpy(kmeans_model.labels_ == cluster_index)).squeeze()
        cluster_features = unlabeled_features[cluster_indices]
        knn_labels = knn_only_labeling(labeled_features, labels, cluster_features, k=k)

        # Assign the majority label to the entire cluster
        cluster_label, _ = knn_labels.mode()
        pseudo_labels[cluster_indices] = cluster_label.item()

    return pseudo_labels

# Merge labeled and pseudo-labeled data
def merge_labeled_and_pseudo_labeled_data(labeled_loader, unlabeled_loader, pseudo_labels):
    # Extract the labeled dataset and replace the targets with pseudo-labels
    labeled_dataset = labeled_loader.dataset
    unlabeled_dataset = unlabeled_loader.dataset
    unlabeled_dataset.targets = pseudo_labels.tolist()

    # Combine the labeled and unlabeled datasets and create a new dataloader
    combined_dataset = torch.utils.data.ConcatDataset([labeled_dataset, unlabeled_dataset])
    combined_loader = torch.utils.data.DataLoader(combined_dataset, batch_size=BATCH, shuffle=True)

    return combined_loader

# Section 7: Train, evaluate the model

In [19]:
def train_and_test(train_loader, test_loader, model_name):
    set_device()
    net = Net()
    net.to(DEVICE)
    loss = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.01)
    for epoch in range(EPOCH):
        avg_train_loss, avg_train_acc, train_loss, train_acc = train(net, train_loader, optimizer, loss)
        avg_test_loss, avg_test_acc, test_loss, test_acc = evaluate(net, test_loader, loss)

        print(f"Epoch {epoch + 1}/{EPOCH}")
        print(f"Train Loss: {avg_train_loss:.4f}, Train Accuracy: {avg_train_acc:.2f}%")
        print(f"Test Loss: {avg_test_loss:.4f}, Test Accuracy: {avg_test_acc:.2f}%")
        print("-" * 30)

    torch.save(net.state_dict(), model_name)
    return net, train_loss, train_acc, test_loss, test_acc

In [20]:
net, _, _, _, _ = train_and_test(labeled_loader, test_loader, 'encoder.pth')

# Extract deep features from the supervised model
deep_features_labeled, labels = extract_deep_features(net, labeled_loader)
deep_features_unlabeled, _ = extract_deep_features(net, unlabeled_loader)

# Assign pseudo-labels to the clustered data using KNN 
pseudo_labels = knn_clustering_labeling(deep_features_labeled, labels, deep_features_unlabeled, k=5, n_clusters=10)

# Merge labeled and pseudo-labeled data
combined_loader = merge_labeled_and_pseudo_labeled_data(labeled_loader, unlabeled_loader, pseudo_labels)

print(f"Fully-supervised MNIST training")
_, fs_train_loss, fs_train_acc, fs_test_loss, fs_test_acc = train_and_test(fully_labeled_loader, test_loader, 'fully-supervised.pth')
print(f"Semi-supervised MNIST training")
_, ss_train_loss, ss_train_acc, ss_test_loss, ss_test_acc = train_and_test(combined_loader, test_loader, 'semi-supervised.pth')

Epoch 1/2
Train Loss: 0.5117, Train Accuracy: 83.38%
Test Loss: 0.1532, Test Accuracy: 95.04%
------------------------------
Epoch 2/2
Train Loss: 0.1476, Train Accuracy: 95.42%
Test Loss: 0.1392, Test Accuracy: 95.42%
------------------------------
[KeOps] Generating code for formula ArgKMin_Reduction(Sum((Var(0,512,1)-Var(1,512,0))**2),0) ... OK
Epoch 1/2
Train Loss: 0.1936, Train Accuracy: 93.81%
Test Loss: 0.0920, Test Accuracy: 97.10%
------------------------------
Epoch 2/2
Train Loss: 0.0651, Train Accuracy: 97.96%
Test Loss: 0.0631, Test Accuracy: 97.89%
------------------------------


Net(
  (conv1): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv1_bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2_bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv4_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv5): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv5_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv6_bn): BatchNorm2d(512, eps=1e-05, momentum=

# Section 8: Comparison of the methods

- Self training
- PCA + KNN
- ...

# Section 9: Visualize the results and save the model

In [21]:
plt.figure(1)
x_axis = np.arange(len(fs_train_loss))
y_axis = fs_train_loss
plt.xlabel('Batch number')
plt.ylabel('Loss')
plt.plot(x_axis, y_axis)

plt.figure(2)
x_axis = np.arange(len(fs_train_acc))
y_axis = fs_train_acc
plt.xlabel('Batch number')
plt.ylabel('Accuracy')
plt.plot(x_axis, y_axis)

plt.figure(3)
x_axis = np.arange(len(fs_test_loss))
y_axis = fs_test_loss
plt.xlabel('Test point')
plt.ylabel('Loss')
plt.plot(x_axis, y_axis)

plt.figure(4)
x_axis = np.arange(len(fs_test_acc))
y_axis = fs_test_acc
plt.xlabel('Test point')
plt.ylabel('Accuracy')
plt.plot(x_axis, y_axis)

plt.show()

# Section 10: Conclusion and final thoughts