<a href="https://colab.research.google.com/github/suhayb-h/Acute-Lymphoblastic-Leukemia-Classifier/blob/main/4_ARC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Attentive Recurrent Comparator Progress
Early papers on Attentive Recurrent Comparators (ARCs) yielded promising “superhuman” performance results based on classifying the Omniglot dataset. ARCs are rarely used in machine learning applications, seemingly due to higher computational demands and insignificant performance improvements. Despite this shortcoming, this project attempts to retrofit an ARC to differentiate between cancer cells and non-cancer cell. The PyTorch code in this notebook was modified from a data scientist by the name of Sanyam Agarwal. Agarwal translated the code to be used in PyTorch. The code he translated was originally created to be used within the Theano language in the paper "Attentive Recurrent Comparators" written by Shyam, Gupta and Dukkipati. 

PyTorch makes the retrofitting for the C-NMC dataset possible and feasible. This library was created as a modern machine learning language that could utilize GPU CUDA cores, which could offset computational demand. To date, there have been no published papers detailing the use of an ARC in a binary classification problem.

So far, the model was reconfigured able to work on Google Co-Lab and successfully ran on the original OMNIGLOT database (a database of letters from different languages). The model is divided into 5 components:

1. Downloading the dataset to a numpy array

2. Augmenting images

3. Construction of the model

4. Batching the data

5.	Training the model

<hr color=red>


In [None]:
# Import Relevent Libraries

# Download - Many of these libraries could potentially be removed
import os
import urllib.request
import numpy as np
import zipfile
from imageio import imread #changed from scipy.ndimage -> imageio
import matplotlib.image as img
import glob as glob
from numpy import asarray
from numpy import save

#Batcher
from numpy.random import choice
import torch
from torch.autograd import Variable
from PIL import Image

#Model
import torch.nn as nn
import torch.nn.functional as F
import math

#Train
import argparse
from datetime import datetime, timedelta

#vizualize
import matplotlib.pyplot as plt
import matplotlib as mpl

<hr color=red>

##Downloading the dataset to a numpy array

Since the C-NMC dataset is locally stored, this portion of the code was replaced. The original code appended all images into a single array arranged in order to be passed through the batcher code. In recognition of this finding, a single array was created with all the images appended non-randomly with all normal cells appended first and cancer cells appended afterwards:

<hr color=red>

In [None]:
train_ = []

for i in glob.glob(
    '/content/drive/Othercomputers/My MacBook Air/'
    'C-NMC_Leukemia/training_data/hem/*.bmp'):
    im=img.imread(i)
    train_.append(im)

for i in glob.glob(
    '/content/drive/Othercomputers/My MacBook Air/'
    'C-NMC_Leukemia/training_data/all/*.bmp'):
    im=img.imread(i)
    train_.append(im)

train_array = np.array(train_)

# save to npy file to save time in further training trials
save('/content/drive/Othercomputers/My MacBook Air/'
'C-NMC_Leukemia/training_data/train.npy', train_array)

<hr color=red>

##Augmenting images
This portion of the code was eliminated when assessing the C-NMC dataset. Since the dataset is significantly larger than the OMNIGLOT dataset, the need to create new images through an augmentor code should be unneccesary.

##Model Construction
The ARC is still a relatively experimental model, and is not widely adopted. As such, current progress in model construction has required intricate mathematical coding and subsequent careful translation between languages. It was in the best interest of this project to leave this portion of this code completely untouched.

<hr color=red>

In [None]:
#Model
use_cuda = False

class GlimpseWindow:
    """
    Generates glimpses from images using Cauchy kernels.
    Args:
        glimpse_h (int): The height of the glimpses to be generated.
        glimpse_w (int): The width of the glimpses to be generated.
    """
    def __init__(self, glimpse_h: int, glimpse_w: int):
        self.glimpse_h = glimpse_h
        self.glimpse_w = glimpse_w

    @staticmethod
    def _get_filterbanks(delta_caps: Variable, 
                         center_caps: Variable, 
                         image_size: int, 
                         glimpse_size: int) -> Variable:
        """
        Generates Cauchy Filter Banks along a dimension.
        Args:
            delta_caps (B,):  A batch of deltas [-1, 1]
            center_caps (B,): A batch of [-1, 1] reals that dictate location of 
                              center of cauchy kernel glimpse
            image_size (int): size of images along that dimension
            glimpse_size (int): size of glimpses to be generated along that 
                                dimension
        Returns:
            (B, image_size, glimpse_size): A batch of filter banks
        """
        # convert dimension sizes to float. lots of math ahead.
        image_size = float(image_size)
        glimpse_size = float(glimpse_size)

        # scale centers and deltas to map to the actual size of given image.
        centers = (image_size - 1) * (center_caps + 1) / 2.0  # (B)
        deltas = \
        (float(image_size) / glimpse_size) * (1.0 - torch.abs(delta_caps))

        # calculate gamma for cauchy kernel
        gammas = torch.exp(1.0 - 2 * torch.abs(delta_caps))  # (B)

        # coordinate of pixels on the glimpse
        # glimpse_size
        glimpse_pixels = \
        Variable(torch.arange(0, glimpse_size) - (glimpse_size - 1.0) / 2.0)
        if use_cuda:
            glimpse_pixels = glimpse_pixels.cuda()

        # space out with delta
        # (B, glimpse_size)
        glimpse_pixels = deltas[:, None] * glimpse_pixels[None, :]
        # center around the centers
        glimpse_pixels = centers[:, None] + glimpse_pixels  # (B, glimpse_size)

        # coordinates of pixels on the image
        image_pixels = Variable(torch.arange(0, image_size))  # (image_size)
        if use_cuda:
            image_pixels = image_pixels.cuda()

        # (B, glimpse_size, image_size)
        fx = image_pixels - glimpse_pixels[:, :, None]
        fx = fx / gammas[:, None, None]
        fx = fx ** 2.0
        fx = 1.0 + fx
        fx = math.pi * gammas[:, None, None] * fx
        fx = 1.0 / fx
        fx = fx / (torch.sum(fx, dim=2) + 1e-4)[:, :, None]  
        # ^^^ add small constant in the denominator to avoid division by 0.

        return fx.transpose(1, 2)

    def get_attention_mask(
        self, glimpse_params: Variable, mask_h: int, mask_w: int) -> Variable:
        """
        Visualization: generate a mask of which pixels got most "attention".
        Args:
            glimpse_params (B, hx):  A batch of glimpse parameters.
            mask_h (int): Height of image for which the mask is being generated.
            mask_w (int): Width of image for which the mask is being generated.
        Returns:
            (B, mask_h, mask_w):  Batch of masks with attended 
                                  pixels weighted more.
        """

        batch_size, _ = glimpse_params.size()

        # (B, image_h, glimpse_h)
        F_h = self._get_filterbanks(
            delta_caps=glimpse_params[:, 2], center_caps=glimpse_params[:, 0],
            image_size=mask_h, glimpse_size=self.glimpse_h)

        # (B, image_w, glimpse_w)
        F_w = self._get_filterbanks(
            delta_caps=glimpse_params[:, 2], center_caps=glimpse_params[:, 1],
            image_size=mask_w, glimpse_size=self.glimpse_w)

        # (B, glimpse_h, glimpse_w)
        glimpse_proxy = Variable(
            torch.ones(batch_size, self.glimpse_h, self.glimpse_w))

        # find the attention mask that lead to the glimpse.
        mask = glimpse_proxy
        mask = torch.bmm(F_h, mask)
        mask = torch.bmm(mask, F_w.transpose(1, 2))

        # scale to between 0 and 1.0
        mask = mask - mask.min()
        mask = mask / mask.max()
        mask = mask.float()

        return mask

    def get_glimpse(
        self, images: Variable, glimpse_params: Variable) -> Variable:
        """
        Generate glimpses given images and glimpse parameters. This is the main 
        method of this class. The glimpse parameters are 
        (h_center, w_center, delta). (h_center, w_center) represents the 
        relative position of the center of the glimpse on the image. 
        delta determines the zoom factor of the glimpse.
        Args:
            images (B, h, w):  A batch of images
            glimpse_params (B, 3):  A batch of glimpse parameters 
                                    (h_center, w_center, delta)
        Returns:
            (B, glimpse_h, glimpse_w): A batch of glimpses.
        """
        batch_size, image_h, image_w = images.size()

        # (B, image_h, glimpse_h)
        F_h = self._get_filterbanks(delta_caps=glimpse_params[:, 2], 
                                    center_caps=glimpse_params[:, 0],
                                    image_size=image_h, 
                                    glimpse_size=self.glimpse_h)

        # (B, image_w, glimpse_w)
        F_w = self._get_filterbanks(delta_caps=glimpse_params[:, 2], 
                                    center_caps=glimpse_params[:, 1],
                                    image_size=image_w, 
                                    glimpse_size=self.glimpse_w)

        # F_h.T * images * F_w
        glimpses = images
        glimpses = torch.bmm(F_h.transpose(1, 2), glimpses)
        glimpses = torch.bmm(glimpses, F_w)

        return glimpses  # (B, glimpse_h, glimpse_w)

class ARC(nn.Module):
    """
    This class implements the Attentive Recurrent Comparators in two main parts.
    1.) controller: The RNN module that takes input glimpses from a pair of 
                    images and emits a hidden state.
    2.) glimpser: A Linear layer that takes the hidden state emitted by the 
                  controller and generates the glimpse parameters. These glimpse 
                  parameters are (h_center, w_center, delta). 
                  (h_center, w_center) represents the relative position of the 
                  center of the glimpse on the image. delta determines the zoom 
                  factor of the glimpse.
    Args:
        num_glimpses (int): How many glimpses must the ARC "see" before emitting 
                            the final hidden state.
        glimpse_h (int): The height of the glimpse in pixels.
        glimpse_w (int): The width of the glimpse in pixels.
        controller_out (int): The size of the hidden state emitted by the 
                              controller.
    """
    def __init__(self, 
                 num_glimpses: int=8, 
                 glimpse_h: int=8, 
                 glimpse_w: int=8, 
                 controller_out: int=128) -> None:
        super().__init__()
        self.num_glimpses = num_glimpses
        self.glimpse_h = glimpse_h
        self.glimpse_w = glimpse_w
        self.controller_out = controller_out

        # main modules of ARC
        self.controller = nn.LSTMCell(input_size=(glimpse_h * glimpse_w), 
                                      hidden_size=self.controller_out)
        self.glimpser = nn.Linear(in_features=self.controller_out, 
                                  out_features=3)

        # Generate glimpses from images using the glimpse parameters.
        self.glimpse_window = GlimpseWindow(glimpse_h=self.glimpse_h, 
                                            glimpse_w=self.glimpse_w)

    def forward(self, image_pairs: Variable) -> Variable:
        """
        Calls the internal _forward() method and returns hidden states for all 
        time steps.
        Args:
            image_pairs (B, 2, h, w):  A batch of pairs of images
        Returns:
            (B, controller_out):  A batch of final hidden states after each pair 
                                  of image has been shown for num_glimpses
            glimpses.
        """
        # return only the last hidden state
        all_hidden = self._forward(image_pairs)
        # ^^^ (2*num_glimpses, B, controller_out)
        last_hidden = all_hidden[-1, :, :]  # (B, controller_out)

        return last_hidden

    def _forward(self, image_pairs: Variable) -> Variable:
        """
        The main forward method of ARC. But it returns hidden state from all 
        time steps (all glimpses) as opposed to just the last one. See the 
        exposed forward() method.
        Args:
            image_pairs: (B, 2, h, w) A batch of pairs of images
        Returns:
            (2*num_glimpses, B, controller_out) 
            Hidden states from ALL time steps.
        """
        # convert to images to float.
        image_pairs = image_pairs.float()

        # calculate the batch size
        batch_size = image_pairs.size()[0]

        # an array for collecting hidden states from each time step.
        all_hidden = []

        # initial hidden state of the LSTM.
        Hx = Variable(torch.zeros(batch_size, self.controller_out))  
        # (B, controller_out)
        Cx = Variable(torch.zeros(batch_size, self.controller_out))  
        # (B, controller_out)

        if use_cuda:
            Hx, Cx = Hx.cuda(), Cx.cuda()

        # take `num_glimpses` glimpses for both images, alternatingly.
        for turn in range(2*self.num_glimpses):
            # select image to show, 
            # alternate between the first and second image in pair
            images_to_observe = image_pairs[:,  turn % 2]  # (B, h, w)

            # choose a portion from image to glimpse using attention
            glimpse_params = torch.tanh(self.glimpser(Hx))  
            # ^^^ (B, 3)  a batch of glimpse params (x, y, delta)
            glimpses = self.glimpse_window.get_glimpse(
                images_to_observe, glimpse_params)  
            # ^^^ (B, glimpse_h, glimpse_w)
            flattened_glimpses = glimpses.view(batch_size, -1)  
            # ^^^ (B, glimpse_h * glimpse_w), one time-step

            # feed the glimpses and the previous hidden state to the LSTM.
            Hx, Cx = self.controller(flattened_glimpses, (Hx, Cx))  
            # (B, controller_out), (B, controller_out)

            # append this hidden state to all states
            all_hidden.append(Hx)

        all_hidden = torch.stack(all_hidden)  
        # (2*num_glimpses, B, controller_out)

        # return a batch of all hidden states.
        return all_hidden

class ArcBinaryClassifier(nn.Module):
    """
    A binary classifier that uses ARC.
    Given a pair of images, feeds them to the ARC and uses the final hidden 
    state of ARC to classify the images as belonging to the same class or not.
    Args:
        num_glimpses (int): How many glimpses must the ARC "see" before emitting 
                            the final hidden state.
        glimpse_h (int): The height of the glimpse in pixels.
        glimpse_w (int): The width of the glimpse in pixels.
        controller_out (int): The size of the hidden state emitted by the 
                              controller.
    """
    def __init__(self, num_glimpses: int=8, 
                 glimpse_h: int=8, 
                 glimpse_w: int=8, 
                 controller_out: int = 128):
        super().__init__()
        self.arc = ARC(
            num_glimpses=num_glimpses,
            glimpse_h=glimpse_h,
            glimpse_w=glimpse_w,
            controller_out=controller_out)

        # Two dense layers. Takes hidden state from controller of ARC and
        # classifies images as belonging to the same class or not.
        self.dense1 = nn.Linear(controller_out, 64)
        self.dense2 = nn.Linear(64, 1)

    def forward(self, image_pairs: Variable) -> Variable:
        arc_out = self.arc(image_pairs)

        d1 = F.elu(self.dense1(arc_out))
        decision = torch.sigmoid(self.dense2(d1))

        return decision

    def save_to_file(self, file_path: str) -> None:
        torch.save(self.state_dict(), file_path)

<hr color=red>

##Batching the data
Since the original code was meant to classify 50 different letter images, this portion of the code was significantly modified. Failure to produce meaningful C-NMC training seems to be related to the index splitting portion of the 'batcher' code. The next step might be to replace the batcher code with a binary classification related array batcher.

<hr color=red>

In [None]:
#Batcher: Original Source -> https://github.com/pranv/ARC
use_cuda = False

class Omniglot(object):
    def __init__(self, path=os.path.join(
        '/content/drive/Othercomputers/My MacBook Air/'
        'C-NMC_Leukemia/training_data/', 
        'train.npy'), batch_size=128, image_size=224):
        """
        batch_size: the output is (2 * batch size, 1, image_size, image_size)
                    X[i] & X[i + batch_size] are the pair
        image_size: size of the image
        data_split: in number of alphabets, e.g. [30, 10] means out of 50 
                    Omniglot characters, 30 is for training, 10 for validation 
                    and the remaining(10) for testing

        within_alphabet:  for verfication task, when 2 characters are sampled to 
                          form a pair, this flag specifies if should they be 
                          from the same alphabet/language
        ---------------------
        Data Augmentation Parameters:
            flip: here flipping both the images in a pair
            scale: x would scale image by + or - x%
            rotation_deg
            shear_deg
            translation_px: in both x and y directions
        """
        chars = np.load(path)

        # resize the images
        resized_chars = np.zeros((10661, 20, image_size, image_size), 
                                 dtype='uint8')
        for i in range(10661):
            for j in range(20):
                resized_chars[i, j] = np.resize(
                    chars[i, j], (image_size, image_size)) 
                # ^^^ np added for compatability
        chars = resized_chars

        self.mean_pixel = chars.mean() / 255.0  
        # used later for mean subtraction

        # starting index of each alphabet in a list of chars
        a_start = [0, 3389]

        # size of each alphabet (num of chars)
        a_size = [3389, 7272]

        # each alphabet/language has different number of characters.
        # in order to uniformly sample all characters, weighs the probability
        # of sampling a alphabet by its size. p is probability
        def size2p(size):
            s = np.array(size).astype('float64')
            return s / s.sum()

        self.size2p = size2p
        self.data = chars
        self.a_start = a_start
        self.a_size = a_size
        self.image_size = image_size
        self.batch_size = batch_size
        flip = True
        scale = 0.2
        rotation_deg = 20
        shear_deg = 10
        translation_px = 5
        #self.augmentor = ImageAugmenter(image_size, image_size,
        #                                hflip=flip, vflip=flip,
        #                                scale_to_percent=1.0 + scale, 
        #                                rotation_deg=rotation_deg, 
        #                                shear_deg=shear_deg,
        #                                translation_x_px=translation_px, 
        #                                translation_y_px=translation_px)

    def fetch_batch(self, part):
        """
            This outputs batch_size number of pairs
            Thus the actual number of images outputted is 2 * batch_size
            Say A & B form the half of a pair
            The Batch is divided into 4 parts:
                Dissimilar A 		Dissimilar B
                Similar A 			Similar B

            Corresponding images in Similar A and Similar B form similar pair
            similarly, Dissimilar A and Dissimilar B form the dissimilar pair

            When flattened, the batch has 4 parts with indices:
                Dissimilar A 		0 - batch_size / 2
                Similar A    		batch_size / 2  - batch_size
                Dissimilar B 		batch_size  - 3 * batch_size / 2
                Similar B 			3 * batch_size / 2 - batch_size
        """
        pass

class Batcher(Omniglot):
    def __init__(self, path=os.path.join(
        '/content/drive/Othercomputers/My MacBook Air/'
        'C-NMC_Leukemia/training_data', 'train.npy'), 
        batch_size=128, 
        image_size=32):
        Omniglot.__init__(self, path, batch_size, image_size)

        a_start = self.a_start
        a_size = self.a_size

        # slicing indices for splitting a_start & a_size
        i = 1
        j = 10662
        starts = {}
        starts['train'], starts['val'] = a_start[:i], a_start[i:j]
        #starts['train'], starts['val'], starts['test'] = \
        #a_start[:i], a_start[i:j], a_start[j:]
        sizes = {}
        sizes['train'], sizes['val'] = a_size[:i], a_start[i:j]
        #sizes['train'], sizes['val'], sizes['test'] = \
        #a_size[:i], a_size[i:j], a_size[j:]
        size2p = self.size2p
        p = {}
        p['train'], p['val'] = size2p(sizes['train']), size2p(sizes['val'])
        #p['train'], p['val'], p['test'] = \
        #size2p(sizes['train']), size2p(sizes['val']), size2p(sizes['test'])        
        self.starts = starts
        self.sizes = sizes
        self.p = p

    def fetch_batch(self, part, batch_size: int = None):

        if batch_size is None:
            batch_size = self.batch_size

        X, Y = self._fetch_batch(part, batch_size)
        X = Variable(torch.from_numpy(X)).view(2*batch_size, 
                                               self.image_size, 
                                               self.image_size)
        X1 = X[:batch_size]  # (B, h, w)
        X2 = X[batch_size:]  # (B, h, w)
        X = torch.stack([X1, X2], dim=1)  # (B, 2, h, w)
        Y = Variable(torch.from_numpy(Y))

        if use_cuda:
            X, Y = X.cuda(), Y.cuda()

        return X, Y

    def _fetch_batch(self, part, batch_size: int = None):
        if batch_size is None:
            batch_size = self.batch_size

        data = self.data
        starts = self.starts[part]
        sizes = self.sizes[part]
        p = self.p[part]
        image_size = self.image_size
        num_alphbts = len(starts)
        X = np.zeros((2 * batch_size, image_size, image_size), dtype='uint8')
        #for i in range(batch_size // 2):
            # choose similar chars
#            same_idx = choice(range(starts[0], starts[-1] + sizes[-1])) 

            # choose dissimilar chars within alphabet
#            alphbt_idx = choice(num_alphbts, p=p)
# #           char_offset = choice(sizes[alphbt_idx], 2, replace=False)
#  #          diff_idx = starts[alphbt_idx] + char_offset
#   #         X[i], X[i + batch_size] = data[diff_idx, choice(20, 2)]
#    #        X[i + batch_size // 2], X[i + 3 * batch_size // 2] = \
#     #       data[same_idx, choice(20, 2, replace=False)]

        y = np.zeros((batch_size, 1), dtype='int32')
        y[:batch_size // 2] = 0
        y[batch_size // 2:] = 1

        if part == 'train':
            #X = self.augmentor.augment_batch(X)
        #else:
        #Above two lines removed for rest of code to work without ImageAugmenter
            X = X / 255.0

        X = X - self.mean_pixel
        X = X[:, np.newaxis]
        X = X.astype("float32")

        return X, y


<hr color=red>

##Training the model
The training code was initially created with no stop set in place, which requires the user to manually stop training. This should be relatively easy to implement. Furthermore,  GPU cuda cores were successfully utilized in the training portion of the code.

<hr color=red>

In [None]:
#Train
parser = argparse.ArgumentParser()
parser.add_argument('-f') #neccessary null argument for colab compatibility
parser.add_argument('--batchSize', type=int, default=128, 
                    help='input batch size')
parser.add_argument('--imageSize', type=int, default=32, 
                    help='the height / width of the input image to ARC')
parser.add_argument('--glimpseSize', type=int, default=8, 
                    help='the height / width of glimpse seen by ARC')
parser.add_argument('--numStates', type=int, default=128, 
                    help='number of hidden states in ARC controller')
parser.add_argument('--numGlimpses', type=int, default=6, 
                    help='number glimpses of each image in pair seen by ARC')
parser.add_argument('--lr', type=float, default=0.0002, 
                    help='learning rate, default=0.0002')
parser.add_argument('--cuda', action='store_true', 
                    help='enables cuda')
parser.add_argument('--name', default=None, 
                    help='Custom name for this configuration. Needed for saving'
                    ' model checkpoints in a separate folder.')
parser.add_argument('--load', default=None, 
                    help='model to load from. Start fresh if not specified.')

def get_pct_accuracy(pred: Variable, target) -> int:
    hard_pred = (pred > 0.5).int()
    correct = (hard_pred == target).sum().data#[0]
    accuracy = float(correct) / target.size()[0]
    accuracy = int(accuracy * 100)
    return accuracy

def train():
    opt = parser.parse_args()

    if opt.cuda:
        batcher.use_cuda = True
        models.use_cuda = True

    if opt.name is None:
        # if no name is given, we generate a name from the parameters.
        # only those parameters are taken, which if changed break 
        # torch.load compatibility.
        opt.name = "{}_{}_{}_{}".format(opt.numGlimpses, 
                                        opt.glimpseSize, 
                                        opt.numStates,
                                        "cuda" if opt.cuda else "cpu")
        
    # make directory for storing models.
    models_path = os.path.join("saved_models", opt.name)
    os.makedirs(models_path, exist_ok=True)

    # initialise the model
    discriminator = ArcBinaryClassifier(num_glimpses=opt.numGlimpses,
                                        glimpse_h=opt.glimpseSize,
                                        glimpse_w=opt.glimpseSize,
                                        controller_out=opt.numStates)

    if opt.cuda:
        discriminator.cuda()

    # load from a previous checkpoint, if specified.
    if opt.load is not None:
        discriminator.load_state_dict(
            torch.load(os.path.join(models_path, opt.load)))

    # set up the optimizer.
    bce = torch.nn.BCELoss()
    if opt.cuda:
        bce = bce.cuda()

    optimizer = torch.optim.SGD(params=discriminator.parameters(), lr=opt.lr) 
    # ^^^ Switched from Adam to SGD

    # load the dataset in memory.
    loader = Batcher(batch_size=opt.batchSize, image_size=opt.imageSize)

    # ready to train ...
    best_validation_loss = None
    saving_threshold = 1.02
    last_saved = datetime.utcnow()
    save_every = timedelta(minutes=10)

    i = -1
    while True:
        i += 1
        X, Y = loader.fetch_batch("train")
        pred = discriminator(X)
        loss = bce(pred, Y.float())

        if i % 10 == 0:
            # validate your model
            X_val, Y_val = loader.fetch_batch("val")
            pred_val = discriminator(X_val)
            loss_val = bce(pred_val, Y_val.float())

            training_loss = loss.data#[0]
            validation_loss = loss_val.data#[0]

            print(
                "Iter: {} \t Train: Acc={}%, "
                "Loss={} \t\t Val: Acc={}%, Loss={}".format(
                i, get_pct_accuracy(pred, Y), 
                training_loss, 
                get_pct_accuracy(pred_val, Y_val), 
                validation_loss
            ))

            if best_validation_loss is None:
                best_validation_loss = validation_loss

            if best_validation_loss > (saving_threshold * validation_loss):
                print("Improved val loss from {} --> {}. Saving...".format(
                    best_validation_loss, validation_loss
                ))
                discriminator.save_to_file(
                    os.path.join(models_path, str(validation_loss)))
                best_validation_loss = validation_loss
                last_saved = datetime.utcnow()

            if last_saved + save_every < datetime.utcnow():
                print("Too long since last saved model. Saving...")
                discriminator.save_to_file(
                    os.path.join(models_path, str(validation_loss)))
                last_saved = datetime.utcnow()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

def main() -> None:
    train()

if __name__ == "__main__":
    main()

Iteration: 0 	 Train: Acc=50%, Loss=0.6938114762306213 		 Validation: Acc=50%, Loss=0.6938114762306213
Iteration: 10 	 Train: Acc=50%, Loss=0.6938105225563049 		 Validation: Acc=50%, Loss=0.6938105225563049
Iteration: 20 	 Train: Acc=50%, Loss=0.693809449672699 		 Validation: Acc=50%, Loss=0.693809449672699
Iteration: 30 	 Train: Acc=50%, Loss=0.6938083171844482 		 Validation: Acc=50%, Loss=0.6938083171844482
Iteration: 40 	 Train: Acc=50%, Loss=0.6938072443008423 		 Validation: Acc=50%, Loss=0.6938072443008423
Iteration: 50 	 Train: Acc=50%, Loss=0.6938061714172363 		 Validation: Acc=50%, Loss=0.6938061714172363
Iteration: 60 	 Train: Acc=50%, Loss=0.6938051581382751 		 Validation: Acc=50%, Loss=0.6938051581382751
Iteration: 70 	 Train: Acc=50%, Loss=0.6938040256500244 		 Validation: Acc=50%, Loss=0.6938040256500244
Iteration: 80 	 Train: Acc=50%, Loss=0.6938028931617737 		 Validation: Acc=50%, Loss=0.6938028931617737
Iteration: 90 	 Train: Acc=50%, Loss=0.6938018798828125 		 Validati

KeyboardInterrupt: ignored

<hr color=red>

Progress currently stands at a model successfully configured to utilize the C-NMC dataset, but the accuracy score for both training and validation sets are stuck at 50% exactly. This could be due to indexing incompatibility issues within the batcher code of the original model. Loss values do seem to be variating every epoch, which is a promising sign. 