## 2nd model attempt###
We import the same libraries as before, adding the last one for the hyperparameter tunnig. 

In [1]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torch.utils import data
import torch.optim as optim

from torch.utils.tensorboard import SummaryWriter

The train_loader is used in order to group the data into batches.
Here, I am assuming the train_set contains only the 4-fold and complete MRI scans. We don't need the 8-fold yet. 

In [None]:
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32) #can also try 64 or 128

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
torch.cuda.device_count() #check number of GPUs

Let's just take one batch from the train_loader and check if the model is working using it. 
I also made a grid to visualise the images which hopefully will work.

In [None]:
tb = SummaryWriter()

batch = next(iter(train_loader))
four_folds, full_scans = batch[0].to(device), batch[1].to(device)
print(four_folds.shape)
print(full_scans.shape)
grid = torchvision.utils.make_grid(four_folds[:][0:7], nrow=2)

tb.add_image("images", grid)

plt.figure(figsize=(15,15))
plt.imshow(np.transpose(grid, (1,2,0)))
plt.show()

This is the neural net architecture. The layers are the same as the original AlexNet. I have changed a lot of the hyperparameters inside the model the original values were tuned for 256x images and ours are 320x. 

I have also changed the number of neurons in the fully connected layers in order to get a 320x image as the output of the forward propagation. 

Finally, I have applied a sigmoid activation function on the last layer to make sure all values are between 0 and 1, then I multiplied that by 255 and changed the type to Int32. Essentially, I transformed all output values into pixel values. 

The numbers I have commented next to the layers are the lengths of the square matrix on that specific layer.

In [2]:
class AlexNet(nn.Module):

    def __init__(self):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(2, 64, kernel_size=9, stride=1, padding=4), #320/320
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 192, kernel_size=9, padding=4), #320/320
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 384, kernel_size=7, padding=3), #320/320
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 192, kernel_size=5, padding=2), #320/320
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 64, kernel_size=3, padding=1), #320/320
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 2, kernel_size=3, padding=1),  # 320/320
        )

    def forward(self, x):
        x = self.features(x)
        #x = nn.functional.sigmoid(x)
        #x = x * 255
        #x = x.type(torch.int32)
        return x

## Set this to true before strating the backprop!!!! #

As long as this is false, the gradients cannot be computed. It does make the forward prop a little bit faster so I set it to false initially.  

In [None]:
torch.set_grad_enabled(False) #set to true before starting the training!!!!

Based on how long the next step takes to compute, we can get a rough idea of how long we'd have to wait for the training process. If it takes a few seconds, then the training will probably take a few hours, so we might want to look again at the architecture and decide if we wanna change something first. 

In [None]:
network = AlexNet()
network.to(device) #move the model on the GPU

tb.add_graph(network, four_folds)
tb.close()

output = network(four_folds)
output

Function for mean absolute error... couldn't find it already built in pytorch

In [None]:
def mae(output, target):
    loss = torch.mean(abs(output - target))
    return loss

This should, in theory, return the loss for all images in the batch combined. I flattened the full_scans. I'm pretty sure it won't work. I'll have to look at the shape of the data to change it properly. 

In [None]:
loss = mae(output, torch.flatten(full_scans, 1))
loss.item()

This will compute the gradients after backprop and return the shape of the gradient tensor for the first layer, which should be the same as the shape of the weight tensor for that layer. 

In [None]:
loss.backward()
network.features[0].weight.grad.shape

This will update all the weights based on the previously computed gradients. The algorithm I used for optimisation, Adam, basically makes sure the model will converge towards a minimum faster.  

In [None]:
optimizer = optim.Adam(network.parameters(), lr=0.03)
optimizer.step() 
loss.item()

If everything went well so far, we're gonna try going through the whole training set once.
The total loss should hopefully decrease from one batch to the next.

In [None]:
for batch in train_loader:
    four_folds, full_scans = batch[0].to(device), batch[1].to(device)     #take the X and y out of the batch
    output = network(four_folds)       #feedforward
    loss = mae(output, full_scans)     #compute the loss
    optimizer.zero_grad()       #set current gradients to 0
    loss.backward()      #backpropagate
    optimizer.step()     #update the weights
    print('total loss: ', loss.item())
    tb.add_scalar('Loss', loss.item(), batch)
    tb.add_histogram('C1 Weights', network.features[0].weight, batch)
    tb.add_histogram('C1 Bias', network.features[0].bias, batch)
    tb.add_histogram('C1 Grad', network.features[0].weight.grad, batch)
    
    tb.add_histogram('C2 Weights', network.features[2].weight, batch)
    tb.add_histogram('C2 Bias', network.features[2].bias, batch)
    tb.add_histogram('C2 Grad', network.features[2].weight.grad, batch)
    
    tb.add_histogram('C3 Weights', network.features[4].weight, batch)
    tb.add_histogram('C3 Bias', network.features[4].bias, batch)
    tb.add_histogram('C3 Grad', network.features[4].weight.grad, batch)
    
    tb.add_histogram('C4 Weights', network.features[6].weight, batch)
    tb.add_histogram('C4 Bias', network.features[6].bias, batch)
    tb.add_histogram('C4 Grad', network.features[6].weight.grad, batch)
    
    tb.add_histogram('C5 Weights', network.features[8].weight, batch)
    tb.add_histogram('C5 Bias', network.features[8].bias, batch)
    tb.add_histogram('C5 Grad', network.features[8].weight.grad, batch)
    
    tb.add_histogram('C6 Weights', network.features[10].weight, batch)
    tb.add_histogram('C6 Bias', network.features[10].bias, batch)
    tb.add_histogram('C6 Grad', network.features[10].weight.grad, batch)

If by some miracle we get all the way here in a reasonable amount of time, we can try running multiple epochs and seeing how low we can get the loss function.

In [1]:
batch_size_list = [32, 64, 128, 256]
lr_list = [0.003, 0.01, 0.03, 0.06, 0.1]
mse_loss = nn.MSELoss()

for batch_size in batch_size_list:
    for lr in lr_list:
        network = AlexNet()
        train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size)
        optimizer = optim.Adam(network.parameters(), lr=lr)
        comment = f'batch_size={batch_size} lr={lr}'
        tb = SummaryWriter(comment=comment)
        for batch in train_loader:
            four_folds, full_scans = batch[0].to(device), batch[1].to(device)     #take the X and y out of the batch
            output = network(four_folds)       #feedforward
            loss = mse_loss(output, full_scans)
            optimizer.zero_grad()       #set current gradients to 0
            loss.backward()      #backpropagate
            optimizer.step()     #update the weights
            print(loss.item(), "  ")
            total_corect = get_num_correct(output, full_scans)
            tb.add_scalar('Loss', loss.item(), batch)
            tb.add_scalar('Accuracy', total_loss/batch_size, batch)
            tb.add_histogram('C1 Weights', network.features[0].weight, batch)
            tb.add_histogram('C1 Bias', network.features[0].bias, batch)
            tb.add_histogram('C1 Grad', network.features[0].weight.grad, batch)

            tb.add_histogram('C2 Weights', network.features[2].weight, batch)
            tb.add_histogram('C2 Bias', network.features[2].bias, batch)
            tb.add_histogram('C2 Grad', network.features[2].weight.grad, batch)

            tb.add_histogram('C3 Weights', network.features[4].weight, batch)
            tb.add_histogram('C3 Bias', network.features[4].bias, batch)
            tb.add_histogram('C3 Grad', network.features[4].weight.grad, batch)

            tb.add_histogram('C4 Weights', network.features[6].weight, batch)
            tb.add_histogram('C4 Bias', network.features[6].bias, batch)
            tb.add_histogram('C4 Grad', network.features[6].weight.grad, batch)

            tb.add_histogram('C5 Weights', network.features[8].weight, batch)
            tb.add_histogram('C5 Bias', network.features[8].bias, batch)
            tb.add_histogram('C5 Grad', network.features[8].weight.grad, batch)

            tb.add_histogram('C6 Weights', network.features[10].weight, batch)
            tb.add_histogram('C6 Bias', network.features[10].bias, batch)
            tb.add_histogram('C6 Grad', network.features[10].weight.grad, batch)
    

NameError: name 'train_loader' is not defined