To run with tensorboard. 

# Classifying Alzheimer's Using CyTOF Data and Deep Learning. 
### 31/05/2023 ####
Building from scratch, attempting to follow the structure of DeepLearningCyTOF (Hu et al. (2020)) but implement it in PyTorch rather than Keras. Also want to implement (optional) k-fold cross-validation. 
At the moment there are two code blocks to perform training/plotting. 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
#!pip install torch torchvision

In [None]:
import torch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()

##### Step 1: Import functions #####

In [None]:
## From Hu et al. (2020)
import pickle
import pandas as pd
import numpy as np
from numpy.random import seed; seed(111)
import random
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind
from IPython.display import Image  

## 0. Import 
import torch
from torch import nn
import torchvision
from torch.utils.data import DataLoader
import os
import pandas as pd
import torch
from torch.utils.data import Dataset
import time
import matplotlib.pyplot as plt
from tqdm import tqdm
import numpy as np
import torch.nn.functional as F

In [None]:
class AdDataset(Dataset):
    def __init__(self, data_dir):
        ## Build a list of tuples
        # Here x is our .csv files (i.e. CyTOF data)
        # Here y is our output (0 for a control and 1 for a AD patient)
        
        self.data_dir = data_dir
        self.x = os.listdir(data_dir)
        self.y= []  # Initialize an empty list to store class labels

        for file_name in self.x:
            # Extract the letter preceding ".csv" in the file name
            y_label = file_name.split(".")[0][-1]
            # Check if the class label is "C" and assign 0, else assign 1
            if y_label == "C":
                self.y.append(0)
                #print(0)
            else:
                self.y.append(1)
                #print(1)

    def __len__(self):
        ## Size of whole data set
        return len(self.x)

    def __getitem__(self, idx):
        ## For loading data on demand, rather than loaded in __init__ step, to increase memory inefficiency. 

        file_path = os.path.join(self.data_dir, self.x[idx])
        data = pd.read_csv(file_path, sep="\t", header=None).values
        data = torch.from_numpy(data)
        label = self.y[idx]  # Get the class label for the corresponding file WATCH OUT FOR FLOAT --> MAY CAUSE ERRORS BECAUSE DATA NOT IN SAME DTYPE AS CLASS_LABEL
        return data, label

##### Step 2: Load data #####


In [None]:
data_dir_train = "/content/drive/MyDrive/colabData/st1/train" # 290
train_dataset = AdDataset(data_dir_train)

data_dir_val = "/content/drive/MyDrive/colabData/st1/validate"
val_dataset = AdDataset(data_dir_val)

train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=4, shuffle=True)

##### Functions to evaluate model performance #####


In [None]:
def F_score(output, label, threshold=0.5, beta=1):
    prob = output > threshold
    label = label > threshold

    TP = (prob & label).sum(1).float()
    TN = ((~prob) & (~label)).sum(1).float()
    FP = (prob & (~label)).sum(1).float()
    FN = ((~prob) & label).sum(1).float()

    precision = torch.mean(TP / (TP + FP + 1e-12))
    recall = torch.mean(TP / (TP + FN + 1e-12))
    F2 = (1 + beta**2) * precision * recall / (beta**2 * precision + recall + 1e-12)
    return F2.mean(0)

In [None]:
class ClassificationBase(nn.Module):
    def training_step(self, batch):
        #inputs, classes = batch
        images, targets = batch 
        images = images.type(torch.FloatTensor) # Uncomment for BreastCancer ClassfierBase class
        #images = torch.reshape(images.type(torch.DoubleTensor), (len(images), 1))
        targets = torch.reshape(targets.type(torch.FloatTensor), (len(targets), 1))
        out = self(images)                      
        loss = F.binary_cross_entropy(out, targets)      
        return loss
    
    def validation_step(self, batch):
        images, targets = batch
        images = images.type(torch.FloatTensor) # Uncomment for BreastCancer ClassfierBase class
        #images = torch.reshape(images.type(torch.DoubleTensor), (len(images), 1))
        #print(images)
        targets = torch.reshape(targets.type(torch.FloatTensor), (len(targets), 1))
        #print(targets)
        out = self(images)                           # Generate predictions
        loss = F.binary_cross_entropy(out, targets)  # Calculate loss
        score = F_score(out, targets)
        return {'val_loss': loss.detach(), 'val_score': score.detach() }
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_scores = [x['val_score'] for x in outputs]
        epoch_score = torch.stack(batch_scores).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_score': epoch_score.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.4f}, train_loss: {:.4f}, val_loss: {:.4f}, val_score: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_score']))

##### Step 3: Define the DL model #####
DNN1 was presented on 26.05, which shows an improvement in accuracy over epochs (all near .87). 
DNN2 is my updated version with a CNN architecture. 


In [None]:
# class DNN1(ClassificationBase):
#     def __init__(self, input_size):
#         super().__init__()
#         self.linear = nn.Sequential(
#             nn.Linear(input_size, 2048),
#             nn.Dropout(p=0.2),
#             nn.ReLU(),
#             nn.Linear(2048, 1024),
#             nn.Dropout(p=0.15),
#             nn.ReLU(),
#             nn.Linear(1024, 512),
#             nn.Dropout(p=0.1),
#             nn.ReLU(),
#             nn.Linear(512, 1)
#         )
#     #HERE THE MODEL PERFORMS A FORWARD PASS --> OUTPUT/PREDICTION
#     def forward(self, xb):
#         xb = xb.reshape(-1,input_size)
#         xb = xb.to(torch.float32)  # Convert input to float32 data type
#         out = self.linear(xb)
#         #out = out.to(torch.float32) # Leave as comment for DNN 1
#         return torch.sigmoid(out)

In [None]:
class DNN4(ClassificationBase):
    def __init__(self, input_shape, flat_shape):
        super().__init__()
        channels, height, width = input_shape
        ## First convolutional layer
        # "Uses three filters to scan each row of the CyTOF data. This layer extracts relevant information from the cell marker profile of each cell." Is this grid AxBxC? Fix in inputShape[X]. Filter size = 1 x B
        # We want to measure C markers. 
        # How many output markers
        self.conv1 = nn.Conv2d(in_channels=channels, out_channels=1, kernel_size=(25,25)) #(1,A)? - THE NUMBER OF NODES IN THE INPUT VECTOR. OR JUST KERNEL SIZE = 3?
        self.bn1 = torch.nn.BatchNorm2d(1)
        self.act1 = nn.ReLU()
        

        ## Second convolution layer
        # The second convolution layer uses three filters to scan each row of the first layer's output. Each filter combines information from the first layer for each cell.
        self.conv2 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2,2)) 
        self.bn2 = torch.nn.BatchNorm2d(1)
        self.act2 = nn.ReLU()

        ## Pooling layer
        # "The pooling layers averages the outputs of the second convolution layer. The purpose is to aggregate the cell level information into sample-level information.""

        self.pool1 = nn.MaxPool2d(kernel_size=(2,2), stride=2) 
        #self.pool1 = nn.AvgPool2d(kernel_size=3, stride=2) #1,1 would not change anything right? 
        self.flat = nn.Flatten() 

        ## Dense layer
        # "The dense layer further extracts information from the pooling layer."
        self.fc1 = nn.Linear(in_features=7569, out_features=2048) #In features = , out features 
        #self.fc = nn.Linear(in_features=flat_shape, out_features=1)
        # Better to make biggest jump here or reduce slowly?
        self.bn3 = torch.nn.BatchNorm1d(2048)
        self.act3 = nn.ReLU()
        self.do1 = nn.Dropout(p=0.1)

        # "The dense layer further extracts information from the pooling layer."
        self.fc2 = nn.Linear(in_features=2048, out_features=1028) #In features = , out features 
        self.bn4 = torch.nn.BatchNorm1d(1028)
        self.act4 = nn.ReLU()
        self.do2 = nn.Dropout(p=0.1)

        # "The dense layer further extracts information from the pooling layer."
        self.fc3 = nn.Linear(in_features=1028, out_features=512) #In features = , out features 
        self.bn5 = torch.nn.BatchNorm1d(512)
        self.act5 = nn.ReLU()
        self.do3 = nn.Dropout(p=0.1)

        # "The dense layer further extracts information from the pooling layer."
        self.fc4 = nn.Linear(in_features=512, out_features=256) #In features = , out features 
        self.bn6 = torch.nn.BatchNorm1d(256)
        self.act6 = nn.ReLU()
        self.do4 = nn.Dropout(p=0.1)

        ## Output layer
        # "The output layer uses logistic regression to report the probability of CMV infection for each sample."
        self.fc5 = nn.Linear(in_features=256, out_features=1)
        #self.bn3 = nn.BatchNorm1d(1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Reshape input size
        #print("Input dimensions", x.shape)
        #x = x.to(torch.float32)
        x = x.float()
        x = x.unsqueeze(1)
        #print("Input dimensions", x.shape)
        out = self.conv1(x)
        #print()
        #print("Input dimensions conv1", out.shape)
        out = self.bn1(out)
        #print("Input dimensions bn1", out.shape)
        out = self.act1(out)     
        #print("Input dimensions act1", out.shape)
        out = self.conv2(out)
        #print("Input dimensions conv2", out.shape)
        out = self.bn2(out)
        #print("Input dimensions bn2", out.shape)
        out = self.act2(out)   
        #print("Input dimensions act2", out.shape)

        out = self.pool1(out)
        #print("Input dimensions pool1", out.shape)
        out = self.flat(out)
        #print("Input dimensions flat", out.shape)

        #out = out.reshape(-1, input_size)
        #print("Out dimensions", out.shape)
        out = self.fc1(out)  
        #out = self.fc(out)  
        #print("Input dimensions flat", out.shape)    
        out = self.act3(out)  
        out = self.bn3(out) 
        out = self.do1(out)
        #print("Input dimensions do1", out.shape)   

        out = self.fc2(out)  
        #print("Input dimensions fc2", out.shape)        
        out = self.act4(out)   
        out = self.bn4(out)
        out = self.do2(out)

        out = self.fc3(out)    
        #print("Input dimensions fc3", out.shape)    
        out = self.act5(out)   
        #out = self.sigmoid(out)  
        out = self.bn5(out) 
        #print("Input dimensions bn3", out.shape)     

        out = self.fc4(out)      
        out = self.act6(out)  
        #print("Input dimensions fn4", out.shape)
        out = self.bn6(out)     
        out = self.do4(out)
        

        out = self.fc5(out)     
        #print("Input dimensions fn5", out.shape)   
        out = self.sigmoid(out)
           

        return out



In [None]:

# class DNN2(ClassificationBase):
#     def __init__(self, input_shape, flat_shape):
#         super().__init__()
#         channels, height, width = input_shape
#         ## First convolutional layer
#         # "Uses three filters to scan each row of the CyTOF data. This layer extracts relevant information from the cell marker profile of each cell." Is this grid AxBxC? Fix in inputShape[X]. Filter size = 1 x B
#         # We want to measure C markers. 
#         # How many output markers
#         self.conv1 = nn.Conv2d(in_channels=channels, out_channels=1, kernel_size=(2,2)) #(1,A)? - THE NUMBER OF NODES IN THE INPUT VECTOR. OR JUST KERNEL SIZE = 3?
#         self.bn1 = torch.nn.BatchNorm2d(1)
#         self.act1 = nn.ReLU()
        
#         ## Second convolution layer
#         # The second convolution layer uses three filters to scan each row of the first layer's output. Each filter combines information from the first layer for each cell.
#         self.conv2 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2,2)) 
#         self.bn2 = torch.nn.BatchNorm2d(1)
#         self.act2 = nn.ReLU()

#         ## Pooling layer
#         # "The pooling layers averages the outputs of the second convolution layer. The purpose is to aggregate the cell level information into sample-level information.""

#         self.pool1 = nn.MaxPool2d(kernel_size=(2,2), stride=2) 
#         #self.pool1 = nn.AvgPool2d(kernel_size=3, stride=2) #1,1 would not change anything right? 
#         self.flat = nn.Flatten() 

#         ## Dense layer
#         # "The dense layer further extracts information from the pooling layer."
#         self.fc1 = nn.Linear(in_features=flat_shape, out_features=2048) #In features = , out features 
#         #self.fc = nn.Linear(in_features=flat_shape, out_features=1)
#         # Better to make biggest jump here or reduce slowly?
#         self.bn3 = torch.nn.BatchNorm1d(2048)
#         self.act3 = nn.ReLU()
#         self.do1 = nn.Dropout(p=0.1)

#         # "The dense layer further extracts information from the pooling layer."
#         self.fc2 = nn.Linear(in_features=2048, out_features=1028) #In features = , out features 
#         self.bn4 = torch.nn.BatchNorm1d(1028)
#         self.act4 = nn.ReLU()
#         self.do2 = nn.Dropout(p=0.1)

#         # "The dense layer further extracts information from the pooling layer."
#         self.fc3 = nn.Linear(in_features=1028, out_features=512) #In features = , out features 
#         self.bn5 = torch.nn.BatchNorm1d(512)
#         self.act5 = nn.ReLU()
#         self.do3 = nn.Dropout(p=0.1)

#         # "The dense layer further extracts information from the pooling layer."
#         self.fc4 = nn.Linear(in_features=512, out_features=256) #In features = , out features 
#         self.bn6 = torch.nn.BatchNorm1d(256)
#         self.act6 = nn.ReLU()
#         self.do4 = nn.Dropout(p=0.1)

#         ## Output layer
#         # "The output layer uses logistic regression to report the probability of CMV infection for each sample."
#         self.fc5 = nn.Linear(in_features=256, out_features=1)
#         #self.bn3 = nn.BatchNorm1d(1)
#         self.sigmoid = nn.Sigmoid()

#     def forward(self, x):
#         # Reshape input size
#         #print("Input dimensions", x.shape)
#         #x = x.to(torch.float32)
#         x = x.float()
#         x = x.unsqueeze(1)
#         #print("Input dimensions", x.shape)
#         out = self.conv1(x)
#         ##print("Input dimensions conv1", out.shape)
#         out = self.bn1(out)
#         ##print("Input dimensions bn1", out.shape)
#         out = self.act1(out)     
#         #print("Input dimensions act1", out.shape)

#         out = self.conv2(out)
#         #print("Input dimensions conv2", out.shape)
#         out = self.bn2(out)
#         #print("Input dimensions bn2", out.shape)
#         out = self.act2(out)   
#         #print("Input dimensions act2", out.shape)

#         out = self.pool1(out)
#         #print("Input dimensions pool1", out.shape)
#         out = self.flat(out)
#         #print("Input dimensions flat", out.shape)

#         #out = out.reshape(-1, input_size)
#         #print("Out dimensions", out.shape)
#         out = self.fc1(out)  
#         #out = self.fc(out)  
#         #print("Input dimensions flat", out.shape)    
#         #flat_shape = out.shape[1]
#         out = self.act3(out)  
#         out = self.bn3(out) 
#         out = self.do1(out)
#         #print("Input dimensions do1", out.shape)   

#         out = self.fc2(out)  
#         #print("Input dimensions fc2", out.shape)        
#         out = self.act4(out)   
#         out = self.bn4(out)
#         out = self.do2(out)

#         out = self.fc3(out)    
#         #print("Input dimensions fc3", out.shape)    
#         out = self.act5(out)   
#         #out = self.sigmoid(out)  
#         out = self.bn5(out) 
#         #print("Input dimensions bn3", out.shape)     

#         out = self.fc4(out)      
#         out = self.act6(out)  
#         #print("Input dimensions fn4", out.shape)
#         out = self.bn6(out)     
#         out = self.do4(out)
        

#         out = self.fc5(out)     
#         #print("Input dimensions fn5", out.shape)   
#         out = self.sigmoid(out)
           

#         return out


In [None]:
# class DNN3(ClassificationBase):
#     def __init__(self, input_shape, flat_shape):
#         super().__init__()
#         channels, height, width = input_shape
#         ## First convolutional layer
#         # "Uses three filters to scan each row of the CyTOF data. This layer extracts relevant information from the cell marker profile of each cell." Is this grid AxBxC? Fix in inputShape[X]. Filter size = 1 x B
#         # We want to measure C markers. 
#         # How many output markers
#         self.feature_extractor = nn.Sequential(
#             nn.Conv2d(in_channels=channels, out_channels=1, kernel_size=(2,2)), #(1,A)? - THE NUMBER OF NODES IN THE INPUT VECTOR. OR JUST KERNEL SIZE = 3?, 
#             torch.nn.BatchNorm2d(1),
#             nn.ReLU(),
#             nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2,2)),
#             torch.nn.BatchNorm2d(1),
#             nn.ReLU()
#         )

#         self.classifier = nn.Sequential(
#             nn.MaxPool2d(kernel_size=(2,2), stride=2),
#             nn.Flatten(),
#             nn.Linear(in_features=flat_shape, out_features=2048),
#             torch.nn.BatchNorm1d(2048),
#             nn.ReLU(),
#             nn.Dropout(p=0.1),
#             nn.Linear(in_features=2048, out_features=1024),
#             torch.nn.BatchNorm1d(1024),
#             nn.ReLU(),
#             nn.Dropout(p=0.1),
#             nn.Linear(in_features=1024, out_features=512),
#             torch.nn.BatchNorm1d(512),
#             nn.ReLU(),
#             nn.Dropout(p=0.1),
#             nn.Linear(in_features=512, out_features=256),
#             torch.nn.BatchNorm1d(256),
#             nn.ReLU(),
#             nn.Linear(in_features=256, out_features=1)   
#         )
#         self.sigmoid = nn.Sigmoid()

#     def forward(self, x):
#         x = x.float()
#         x = x.unsqueeze(1)
#         x = self.feature_extractor(x)
#         x = self.classifier(x)
#         out = self.sigmoid(x) ## If it is so close to .5... is that problematic?
#         return out
#         #return logits, probs

In [None]:
# #Defining the convolutional neural network
# # class LeNet5(ClassificationBase):
# #     def __init__(self, input_shape, flat_shape):
# #         super().__init__()
# #         self.layer1 = nn.Sequential(
# #             nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=0),
# #             nn.BatchNorm2d(6),
# #             nn.ReLU(),
# #             nn.MaxPool2d(kernel_size = 2, stride = 2))
# #         self.layer2 = nn.Sequential(
# #             nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0),
# #             nn.BatchNorm2d(16),
# #             nn.ReLU(),
# #             nn.MaxPool2d(kernel_size = 2, stride = 2))
# #         self.fc0 = nn.Linear(35344, 400)
# #         self.relu0 = nn.ReLU()
# #         self.fc = nn.Linear(400, 120)
# #         self.relu = nn.ReLU()
# #         self.fc1 = nn.Linear(120, 84)
# #         self.relu1 = nn.ReLU()
# #         self.fc2 = nn.Linear(84, 1)
# #         #self.bn = nn.BatchNorm1d(1)
# #         self.sigmoid = nn.Sigmoid()
# #     def forward(self, x):
# #         x = x.float()
# #         x = x.unsqueeze(1)
# #         out = self.layer1(x)
# #         out = self.layer2(out)
# #         out = out.reshape(out.size(0), -1)
# #         out = self.fc0(out)
# #         out = self.relu0(out)
# #         out = self.fc(out)
# #         out = self.relu(out)
# #         out = self.fc1(out)
# #         out = self.relu1(out)
# #         out = self.fc2(out)
# #         #out = self.bn(out)
# #         out = self.sigmoid(out)
# #         return out

# #Defining the convolutional neural network
# class LeNet5EfficientShallow(ClassificationBase):
#     def __init__(self, input_shape, flat_shape):
#         super().__init__()
#         self.layer1 = nn.Sequential(
#             nn.Conv2d(1, 3, kernel_size=5, stride=1, padding=0),
#             nn.BatchNorm2d(3),
#             nn.ReLU(),
#             nn.MaxPool2d(kernel_size = 2, stride = 2))
#         self.layer2 = nn.Sequential(
#             nn.Conv2d(3, 16, kernel_size=5, stride=1, padding=0),
#             nn.BatchNorm2d(16),
#             nn.ReLU(),
#             nn.MaxPool2d(kernel_size = 2, stride = 2))
#         self.fc0 = nn.Linear(35344, 400)
#         self.relu0 = nn.ReLU()
#         self.fc = nn.Linear(400, 120)
#         self.relu = nn.ReLU()
#         self.fc1 = nn.Linear(120, 84)
#         self.relu1 = nn.ReLU()
#         self.fc2 = nn.Linear(84, 1)
#         #self.bn = nn.BatchNorm1d(1)
#         self.sigmoid = nn.Sigmoid()
#     def forward(self, x):
#         x = x.float()
#         x = x.unsqueeze(1)
#         out = self.layer1(x)
#         #print(out.shape)
#         out = self.layer2(out)
#         #print(out.shape)
#         out = out.reshape(out.size(0), -1)
#         out = self.fc0(out)
#         out = self.relu0(out)
#         out = self.fc(out)
#         out = self.relu(out)
#         out = self.fc1(out)
#         out = self.relu1(out)
#         out = self.fc2(out)
#         #out = self.bn(out)
#         out = self.sigmoid(out)
#         return out

##### Step 4. Set device and load data #####

In [None]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda') #REQUIRES CHANGING THE TORCH.FLOATTENSOR TO TORCH.CUDA.FLOATTENSOR 
    else:
        return torch.device('cpu')
    
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

device = get_default_device()
device

device(type='cpu')

In [None]:
train_dl = DeviceDataLoader(train_loader, device)
val_dl = DeviceDataLoader(val_loader, device)

In [None]:
from torch.utils.tensorboard import SummaryWriter
%load_ext tensorboard

def evaluate(model, val_loader):
    model.eval()
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, 
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    torch.cuda.empty_cache()
    history = []
    
    # Set up custom optimizer with weight decay
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    # Set up one-cycle learning rate scheduler
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, 
                                                steps_per_epoch=len(train_loader))
    
    #writer = SummaryWriter()  # Create a SummaryWriter instance
    
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        lrs = []  # learning rate
        step = 0  # Initialize the step counter
        for batch in tqdm(train_loader):
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            
            # Write the training loss to TensorBoard with unique step for each batch
            writer.add_scalar('Training Batch Loss', loss, step)
            step += 1  # Increment the step counter
            
            # Gradient clipping
            if grad_clip: 
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            
            optimizer.step()
            optimizer.zero_grad()
            
            # Record & update learning rate
            lrs.append(get_lr(optimizer))
            sched.step()
        
        # Write the training loss and learning rate to TensorBoard
        writer.add_scalar('Training Loss', torch.stack(train_losses).mean().item(), epoch)
        writer.add_scalar('Learning Rate', lrs[-1], epoch)
        
        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)

    return history

def plot_scores(history):
    scores = [x['val_score'] for x in history]
    plt.plot(scores, '-x')
    plt.xlabel('epoch')
    plt.ylabel('score')
    plt.title('F1 score vs. No. of epochs')
    plt.show()
    #plt.savefig("DNN_scores_no_augmentation")

def plot_losses(history):
    train_losses = [x.get('train_loss') for x in history]
    val_losses = [x['val_loss'] for x in history]
    plt.plot(train_losses, '-bx')
    plt.plot(val_losses, '-rx')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs')
    plt.show()
    #plt.savefig("DNN_losses_no_augmentation")

def plot_lrs(history):
    lrs = np.concatenate([x.get('lrs', []) for x in history])
    plt.plot(lrs)
    plt.xlabel('Batch no.')
    plt.ylabel('Learning rate')
    plt.title('Learning Rate vs. Batch no.')
    plt.show()
    #plt.savefig("DNN_lrs_no_augmentation")


In [None]:
input=[1, 200, 200]
input_size=200*200
model = to_device(DNN4(input_shape=input, flat_shape=9801), device) #DONT WANT THIS TO BE AN INPUT!
epochs = 100
max_lr = 0.01
opt_func = torch.optim.Adam


##### Step 5. Training #####


In [None]:
# Run classifier 
history = [evaluate(model, val_dl)]
history

[{'val_loss': 0.6820436120033264, 'val_score': 0.835106372833252}]

In [None]:
start_time = time.time()
history += fit_one_cycle(epochs, max_lr, model, train_dl, val_dl, opt_func=opt_func)
train_time = time.time() - start_time
total_train_time = time.time() - start_time
print("Total training time =", total_train_time)
writer.flush
writer.close()

100%|██████████| 21/21 [01:40<00:00,  4.79s/it]


Epoch [0], last_lr: 0.0004, train_loss: 0.7454, val_loss: 0.6432, val_score: 0.8333


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [1], last_lr: 0.0005, train_loss: 0.6382, val_loss: 0.4475, val_score: 0.8351


100%|██████████| 21/21 [00:25<00:00,  1.24s/it]


Epoch [2], last_lr: 0.0006, train_loss: 0.6097, val_loss: 0.4889, val_score: 0.8351


100%|██████████| 21/21 [00:28<00:00,  1.36s/it]


Epoch [3], last_lr: 0.0008, train_loss: 0.5139, val_loss: 0.5011, val_score: 0.7589


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [4], last_lr: 0.0010, train_loss: 0.4470, val_loss: 0.3298, val_score: 0.7713


100%|██████████| 21/21 [00:26<00:00,  1.29s/it]


Epoch [5], last_lr: 0.0013, train_loss: 0.3981, val_loss: 0.3956, val_score: 0.7642


100%|██████████| 21/21 [00:26<00:00,  1.25s/it]


Epoch [6], last_lr: 0.0016, train_loss: 0.3165, val_loss: 0.3768, val_score: 0.7713


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [7], last_lr: 0.0020, train_loss: 0.3021, val_loss: 0.4794, val_score: 0.6738


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [8], last_lr: 0.0024, train_loss: 0.3501, val_loss: 0.2815, val_score: 0.8280


100%|██████████| 21/21 [00:25<00:00,  1.23s/it]


Epoch [9], last_lr: 0.0028, train_loss: 0.2604, val_loss: 0.2803, val_score: 0.7926


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [10], last_lr: 0.0032, train_loss: 0.1966, val_loss: 14.3405, val_score: 0.0053


100%|██████████| 21/21 [00:26<00:00,  1.28s/it]


Epoch [11], last_lr: 0.0037, train_loss: 0.2962, val_loss: 1.6265, val_score: 0.8351


100%|██████████| 21/21 [00:26<00:00,  1.28s/it]


Epoch [12], last_lr: 0.0042, train_loss: 0.2923, val_loss: 0.6477, val_score: 0.8351


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [13], last_lr: 0.0047, train_loss: 0.3542, val_loss: 0.8246, val_score: 0.5621


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [14], last_lr: 0.0052, train_loss: 0.2288, val_loss: 0.3037, val_score: 0.8067


100%|██████████| 21/21 [00:26<00:00,  1.26s/it]


Epoch [15], last_lr: 0.0057, train_loss: 0.2383, val_loss: 0.8831, val_score: 0.4610


100%|██████████| 21/21 [00:26<00:00,  1.29s/it]


Epoch [16], last_lr: 0.0062, train_loss: 0.2461, val_loss: 0.2450, val_score: 0.8280


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [17], last_lr: 0.0067, train_loss: 0.2237, val_loss: 0.2351, val_score: 0.7624


100%|██████████| 21/21 [00:28<00:00,  1.35s/it]


Epoch [18], last_lr: 0.0071, train_loss: 0.1685, val_loss: 0.9948, val_score: 0.8351


100%|██████████| 21/21 [00:27<00:00,  1.30s/it]


Epoch [19], last_lr: 0.0076, train_loss: 0.2803, val_loss: 0.5887, val_score: 0.8351


100%|██████████| 21/21 [00:27<00:00,  1.30s/it]


Epoch [20], last_lr: 0.0080, train_loss: 0.1899, val_loss: 0.1925, val_score: 0.8245


100%|██████████| 21/21 [00:26<00:00,  1.28s/it]


Epoch [21], last_lr: 0.0084, train_loss: 0.1884, val_loss: 0.3516, val_score: 0.8351


100%|██████████| 21/21 [00:27<00:00,  1.30s/it]


Epoch [22], last_lr: 0.0088, train_loss: 0.2217, val_loss: 0.1770, val_score: 0.8298


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [23], last_lr: 0.0091, train_loss: 0.1283, val_loss: 0.3480, val_score: 0.8351


100%|██████████| 21/21 [00:26<00:00,  1.27s/it]


Epoch [24], last_lr: 0.0094, train_loss: 0.2132, val_loss: 2.0666, val_score: 0.8333


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [25], last_lr: 0.0096, train_loss: 0.3323, val_loss: 0.1911, val_score: 0.8262


100%|██████████| 21/21 [00:26<00:00,  1.29s/it]


Epoch [26], last_lr: 0.0098, train_loss: 0.2689, val_loss: 0.1711, val_score: 0.8298


100%|██████████| 21/21 [00:26<00:00,  1.27s/it]


Epoch [27], last_lr: 0.0099, train_loss: 0.2714, val_loss: 0.3968, val_score: 0.8333


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [28], last_lr: 0.0100, train_loss: 0.3446, val_loss: 0.6837, val_score: 0.5869


100%|██████████| 21/21 [00:27<00:00,  1.29s/it]


Epoch [29], last_lr: 0.0100, train_loss: 0.2416, val_loss: 0.1777, val_score: 0.8298


100%|██████████| 21/21 [00:26<00:00,  1.27s/it]


Epoch [30], last_lr: 0.0100, train_loss: 0.1399, val_loss: 0.2822, val_score: 0.7642


100%|██████████| 21/21 [00:27<00:00,  1.33s/it]


Epoch [31], last_lr: 0.0100, train_loss: 0.1620, val_loss: 0.2900, val_score: 0.8316


100%|██████████| 21/21 [00:29<00:00,  1.39s/it]


Epoch [32], last_lr: 0.0100, train_loss: 0.1816, val_loss: 0.6469, val_score: 0.8333


100%|██████████| 21/21 [00:31<00:00,  1.52s/it]


Epoch [33], last_lr: 0.0099, train_loss: 0.1804, val_loss: 0.3334, val_score: 0.8316


100%|██████████| 21/21 [00:31<00:00,  1.48s/it]


Epoch [34], last_lr: 0.0099, train_loss: 0.1397, val_loss: 0.1475, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [35], last_lr: 0.0098, train_loss: 0.1080, val_loss: 0.4208, val_score: 0.8351


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [36], last_lr: 0.0098, train_loss: 0.0628, val_loss: 0.1202, val_score: 0.8121


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [37], last_lr: 0.0097, train_loss: 0.0871, val_loss: 0.1180, val_score: 0.8138


100%|██████████| 21/21 [00:29<00:00,  1.41s/it]


Epoch [38], last_lr: 0.0096, train_loss: 0.0479, val_loss: 0.1970, val_score: 0.8351


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [39], last_lr: 0.0095, train_loss: 0.0665, val_loss: 0.0949, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [40], last_lr: 0.0094, train_loss: 0.0720, val_loss: 14.6638, val_score: 0.5337


100%|██████████| 21/21 [00:29<00:00,  1.42s/it]


Epoch [41], last_lr: 0.0093, train_loss: 0.2055, val_loss: 0.7300, val_score: 0.8351


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [42], last_lr: 0.0092, train_loss: 0.2029, val_loss: 0.1493, val_score: 0.8227


100%|██████████| 21/21 [00:29<00:00,  1.43s/it]


Epoch [43], last_lr: 0.0090, train_loss: 0.0904, val_loss: 0.1233, val_score: 0.8280


100%|██████████| 21/21 [00:29<00:00,  1.43s/it]


Epoch [44], last_lr: 0.0089, train_loss: 0.0758, val_loss: 0.5663, val_score: 0.8351


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [45], last_lr: 0.0088, train_loss: 0.0788, val_loss: 0.5790, val_score: 0.8333


100%|██████████| 21/21 [00:31<00:00,  1.50s/it]


Epoch [46], last_lr: 0.0086, train_loss: 0.0846, val_loss: 3.7026, val_score: 0.6418


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [47], last_lr: 0.0085, train_loss: 0.0916, val_loss: 2.3855, val_score: 0.7482


100%|██████████| 21/21 [00:31<00:00,  1.48s/it]


Epoch [48], last_lr: 0.0083, train_loss: 0.0438, val_loss: 0.2919, val_score: 0.6950


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [49], last_lr: 0.0081, train_loss: 0.0988, val_loss: 0.4221, val_score: 0.6472


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [50], last_lr: 0.0079, train_loss: 0.0882, val_loss: 0.0922, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [51], last_lr: 0.0078, train_loss: 0.1009, val_loss: 0.1820, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [52], last_lr: 0.0076, train_loss: 0.2234, val_loss: 12.8810, val_score: 0.6188


100%|██████████| 21/21 [00:31<00:00,  1.52s/it]


Epoch [53], last_lr: 0.0074, train_loss: 0.2170, val_loss: 0.2177, val_score: 0.8333


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [54], last_lr: 0.0072, train_loss: 0.1662, val_loss: 63.5689, val_score: 0.0904


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [55], last_lr: 0.0070, train_loss: 0.0905, val_loss: 0.2539, val_score: 0.8121


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [56], last_lr: 0.0068, train_loss: 0.0767, val_loss: 0.2401, val_score: 0.8298


100%|██████████| 21/21 [00:29<00:00,  1.42s/it]


Epoch [57], last_lr: 0.0065, train_loss: 0.1563, val_loss: 0.3124, val_score: 0.6968


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [58], last_lr: 0.0063, train_loss: 0.0881, val_loss: 0.1125, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [59], last_lr: 0.0061, train_loss: 0.0390, val_loss: 0.0963, val_score: 0.8298


100%|██████████| 21/21 [00:31<00:00,  1.51s/it]


Epoch [60], last_lr: 0.0059, train_loss: 0.0435, val_loss: 0.1429, val_score: 0.8351


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [61], last_lr: 0.0057, train_loss: 0.1021, val_loss: 35.9800, val_score: 0.4539


100%|██████████| 21/21 [00:29<00:00,  1.42s/it]


Epoch [62], last_lr: 0.0054, train_loss: 0.4943, val_loss: 7.6117, val_score: 0.7447


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [63], last_lr: 0.0052, train_loss: 0.2935, val_loss: 0.3306, val_score: 0.8333


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [64], last_lr: 0.0050, train_loss: 0.1709, val_loss: 0.4182, val_score: 0.6365


100%|██████████| 21/21 [00:29<00:00,  1.43s/it]


Epoch [65], last_lr: 0.0048, train_loss: 0.1568, val_loss: 0.1874, val_score: 0.8351


100%|██████████| 21/21 [00:31<00:00,  1.52s/it]


Epoch [66], last_lr: 0.0046, train_loss: 0.1703, val_loss: 0.2138, val_score: 0.8174


100%|██████████| 21/21 [00:29<00:00,  1.42s/it]


Epoch [67], last_lr: 0.0043, train_loss: 0.1493, val_loss: 0.1511, val_score: 0.8316


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [68], last_lr: 0.0041, train_loss: 0.0743, val_loss: 0.1542, val_score: 0.8333


100%|██████████| 21/21 [00:29<00:00,  1.42s/it]


Epoch [69], last_lr: 0.0039, train_loss: 0.1216, val_loss: 0.6624, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [70], last_lr: 0.0037, train_loss: 0.0776, val_loss: 0.1298, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [71], last_lr: 0.0035, train_loss: 0.0503, val_loss: 0.1186, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [72], last_lr: 0.0032, train_loss: 0.1055, val_loss: 0.6265, val_score: 0.8280


100%|██████████| 21/21 [00:32<00:00,  1.54s/it]


Epoch [73], last_lr: 0.0030, train_loss: 0.0382, val_loss: 0.6541, val_score: 0.8280


100%|██████████| 21/21 [00:29<00:00,  1.42s/it]


Epoch [74], last_lr: 0.0028, train_loss: 0.0299, val_loss: 0.6347, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [75], last_lr: 0.0026, train_loss: 0.0247, val_loss: 0.6354, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [76], last_lr: 0.0024, train_loss: 0.0270, val_loss: 0.6415, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [77], last_lr: 0.0022, train_loss: 0.0501, val_loss: 0.6440, val_score: 0.8262


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [78], last_lr: 0.0021, train_loss: 0.0179, val_loss: 0.6397, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [79], last_lr: 0.0019, train_loss: 0.0236, val_loss: 0.6460, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [80], last_lr: 0.0017, train_loss: 0.0362, val_loss: 0.6506, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [81], last_lr: 0.0015, train_loss: 0.0217, val_loss: 1.1562, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.43s/it]


Epoch [82], last_lr: 0.0014, train_loss: 0.0405, val_loss: 1.1564, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [83], last_lr: 0.0012, train_loss: 0.0442, val_loss: 1.1694, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [84], last_lr: 0.0011, train_loss: 0.0210, val_loss: 0.6524, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [85], last_lr: 0.0010, train_loss: 0.0123, val_loss: 1.1876, val_score: 0.8262


100%|██████████| 21/21 [00:31<00:00,  1.52s/it]


Epoch [86], last_lr: 0.0008, train_loss: 0.0393, val_loss: 1.1704, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [87], last_lr: 0.0007, train_loss: 0.0238, val_loss: 0.6581, val_score: 0.8280


100%|██████████| 21/21 [00:31<00:00,  1.48s/it]


Epoch [88], last_lr: 0.0006, train_loss: 0.0100, val_loss: 0.6429, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [89], last_lr: 0.0005, train_loss: 0.0250, val_loss: 0.6480, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.44s/it]


Epoch [90], last_lr: 0.0004, train_loss: 0.0222, val_loss: 0.6433, val_score: 0.8245


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [91], last_lr: 0.0003, train_loss: 0.0065, val_loss: 0.6438, val_score: 0.8298


100%|██████████| 21/21 [00:31<00:00,  1.51s/it]


Epoch [92], last_lr: 0.0002, train_loss: 0.0164, val_loss: 0.6432, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.47s/it]


Epoch [93], last_lr: 0.0002, train_loss: 0.0133, val_loss: 1.1554, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [94], last_lr: 0.0001, train_loss: 0.0190, val_loss: 0.6325, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.45s/it]


Epoch [95], last_lr: 0.0001, train_loss: 0.0084, val_loss: 0.6510, val_score: 0.8280


100%|██████████| 21/21 [00:30<00:00,  1.48s/it]


Epoch [96], last_lr: 0.0000, train_loss: 0.0045, val_loss: 0.6411, val_score: 0.8298


100%|██████████| 21/21 [00:30<00:00,  1.46s/it]


Epoch [97], last_lr: 0.0000, train_loss: 0.0095, val_loss: 0.6200, val_score: 0.8262


100%|██████████| 21/21 [00:31<00:00,  1.49s/it]


Epoch [98], last_lr: 0.0000, train_loss: 0.0146, val_loss: 1.1602, val_score: 0.8298


100%|██████████| 21/21 [00:32<00:00,  1.54s/it]


Epoch [99], last_lr: 0.0000, train_loss: 0.0075, val_loss: 1.1580, val_score: 0.8280
Total training time = 3733.0631201267242


In [None]:
!yes|tensorboard dev upload --logdir /content/runs/ --name "DNN2() with 25x25 " --description "CNN with 30/70 stratified split, kernel conv1"



***** TensorBoard Uploader *****

This will upload your TensorBoard logs to https://tensorboard.dev/ from
the following directory:

/content/runs/

This TensorBoard will be visible to everyone. Do not upload sensitive
data.

Your use of this service is subject to Google's Terms of Service
<https://policies.google.com/terms> and Privacy Policy
<https://policies.google.com/privacy>, and TensorBoard.dev's Terms of Service
<https://tensorboard.dev/policy/terms/>.

This notice will not be shown again while you are logged into the uploader.
To log out, run `tensorboard dev auth revoke`.

Continue? (yes/NO) 
To sign in with the TensorBoard uploader:

1. On your computer or phone, visit:

   https://www.google.com/device

2. Sign in with your Google account, then enter:

   RQS-LTH-RLF


Upload started and will continue reading any new data as it's added to the logdir.

To stop uploading, press Ctrl-C.

New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/afvD6