# VGG16 Pre trained to test

One of the key components for the evaluation of the models is to compare between the VGGnet16 discriminative approach accuracies <br>
vs the accuracies obtained by using a VGGnet16 as encoder with a deep generative model in between. <br><br>
This jupyter notebook has the objective to, not only retrieve the accuracies of the VGGnet16 pretrained, but to obtain also <br>
the layer features before the last classification layer.

In [3]:
 #Import necessary modules
import os
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision.models as models
from torch.utils.data import DataLoader
from torchvision import transforms
plt.rcParams['figure.figsize'] = [20, 12]

### Set the path to here

Make sure the setup the paths properly!

In [4]:
#Path to assign tests (copy path directly)
test_path = r"C:\Users\myrea\Desktop\Stanford_CS236\project\cs236_project\CS236-Final-Proj\test"

#Set the path to this working directory
os.chdir(test_path)
print(os.getcwd())

import sys
#Append the path the src folder
sys.path.append(r'C:\Users\myrea\Desktop\Stanford_CS236\project\cs236_project\CS236-Final-Proj\src')

c:\Users\myrea\Desktop\Stanford_CS236\project\cs236_project\CS236-Final-Proj\test


### Import the necessary module for downloading

Note for this: EVERYTIME There is a change inside the download <br>
the changes inside the file would only be shown if the jupyter kernel is restarted. <br>


In [5]:
# Import the necessary modules
from utils import CXReader, DfReader

### Set the data path

In [6]:
# Create the data path
data_path = os.path.join(test_path, os.pardir, "data")

### Get the dataframes of the data
First, lets obtain the dataframes for the data and check that all metadata <br>
information has been set up properly. <br>

In [7]:
#Create a dataframe compiler
df_compiler = DfReader()

#set the path and retrieve the dataframes
df_compiler.set_folder_path(data_path)

#Get the dataframe holder and names
dfs_holder, dfs_names = df_compiler.get_dfs()

100%|██████████| 112124/112124 [00:00<00:00, 489679.32it/s]

The file: miccai2023_nih-cxr-lt_labels_test.csv has been retrieved
The file: miccai2023_nih-cxr-lt_labels_train.csv has been retrieved
The file: miccai2023_nih-cxr-lt_labels_val.csv has been retrieved





# Read the images and labels

Also, obtain DataLoaders for test, train, and validation datasets using <br>
the Dataloader class from pytorch.

In [8]:
print(torch.cuda.is_available())

True


In [9]:
# Get the device if cuda or not
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#Define a transformations for the VGGnet16 (requires a 224,224)
transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
])

#Create datasets and dataloaders
test_dataset = CXReader(data_path=data_path, dataframe=dfs_holder[0], transform=transform, device=device)
train_dataset = CXReader(data_path=data_path, dataframe=dfs_holder[1], transform=transform,device=device)
val_dataset = CXReader(data_path=data_path, dataframe=dfs_holder[2], transform=transform, device=device)

#Sampled images from train to see single shape
samp3_image, label3 = train_dataset[1]
print("Shape of a single image and its labels")
print(f"Image: {samp3_image.shape}, labels: {label3.shape}")

#With batch size of 32, and shuffle true, and num workers = 4
batch_size = 32

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False,  num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False,  num_workers=2)

#Iterate inside a batch
for idx, batch in enumerate(train_loader):
    print(f"batch number: {idx}")
    images, labels = batch
    print("Shape of batch of images and labels")
    print(f"Images: {images.shape}, labels: {labels.shape}")
    if idx == 5:
        print("It can iterate through all batches")
        break

Shape of a single image and its labels
Image: torch.Size([3, 224, 224]), labels: torch.Size([20])
batch number: 0
Shape of batch of images and labels
Images: torch.Size([32, 3, 224, 224]), labels: torch.Size([32, 20])
batch number: 1
Shape of batch of images and labels
Images: torch.Size([32, 3, 224, 224]), labels: torch.Size([32, 20])
batch number: 2
Shape of batch of images and labels
Images: torch.Size([32, 3, 224, 224]), labels: torch.Size([32, 20])
batch number: 3
Shape of batch of images and labels
Images: torch.Size([32, 3, 224, 224]), labels: torch.Size([32, 20])
batch number: 4
Shape of batch of images and labels
Images: torch.Size([32, 3, 224, 224]), labels: torch.Size([32, 20])
batch number: 5
Shape of batch of images and labels
Images: torch.Size([32, 3, 224, 224]), labels: torch.Size([32, 20])
It can iterate through all batches


### Load the vgg16 pretrained model

Check if you have GPU Envidia! Else, use the cpu

In [10]:
#Load the pretrained model
vgg16 = models.vgg16(pretrained = True)



# Modify the last layer
We know that VGGnet16 has a last linear layer with 1000 output units...<br>
However, this doesnt really resemble our problem per se...<br><br>
Lets do this! Lets replace the last layer with a linear layer that has the same <br> number of classes as our data!. (In our case, is 20).


In [11]:
print(vgg16.features)
print(vgg16.avgpool)
print(vgg16.classifier)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace=True)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace=True)
  (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (17): Conv2d(256, 512, kernel_si

In [12]:
# NEW CODE CELL to conduct fine-tuning on Vggnet16 only on the last (Linear) layer

# First, freeze all the parameters
for param in vgg16.parameters():
    param.requires_grad = False

In [13]:
# Modify the last layer for the last 20 classes
num_classes = 20  # Number of classes for your specific task
num_features = vgg16.classifier[-1].in_features

# Replace the last layer with a new fully connected layer
vgg16.classifier[-1] = nn.Linear(num_features, num_classes)
vgg16.classifier.add_module("sigmoid", nn.Sigmoid()) #Add sigmoid activation for binary classification between diffs
print(vgg16.classifier)

# # Set requires_grad to False for the modified layer
# for param in vgg16.classifier[-1].parameters():
#     param.requires_grad = False

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=20, bias=True)
  (sigmoid): Sigmoid()
)


In [14]:
# NEW CODE CELL

# Create state_dict path
state_dict_path = os.path.join(test_path, os.pardir, "state_dict")

In [29]:
# NEW CODE CELL to perform fine-tuning

import torch.optim as optim

vgg16 = vgg16.to(device)
vgg16.train()
params_to_update = [vgg16.classifier[-2].parameters()]
print(params_to_update)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(*params_to_update, lr=0.01)

def finetune_model(model, data_loader, num_epochs, device:str):
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch + 1, num_epochs))
        print('-------------')
            
        for idx, batch in enumerate(data_loader):
            images_inputs, images_labels = batch
            images_inputs, images_labels = images_inputs.to(device), images_labels.to(device)

            # Convert labels to float type (also need to move to CUDA again!)
            images_labels = images_labels.to(torch.float64)

            # initialize optimizer
            optimizer.zero_grad()            
            outputs = model(images_inputs)
            
            # compute loss
            loss = criterion(outputs, images_labels)
            
            # predict labels
            pred_labels = (outputs > 0.5).float()

            # Calculate TP, FP, TN, FN and accuracy
            TP = torch.sum((pred_labels == 1) & (images_labels == 1)).item()
            FP = torch.sum((pred_labels == 1) & (images_labels == 0)).item()
            TN = torch.sum((pred_labels == 0) & (images_labels == 0)).item()
            FN = torch.sum((pred_labels == 0) & (images_labels == 1)).item()
            accuracy = ((TP + TN) / (TP + FP + TN + FN)) * 100.0                

            loss.backward()
            print(f"iter {idx} ---  Loss: {loss}    Accuracy: {accuracy}")
            optimizer.step()
        
        # Save parameters for each epoch
        torch.save(model.state_dict(), os.path.join(state_dict_path, "vgg16_finetune_params.pth"))

[<generator object Module.parameters at 0x000001A9C4A7AF10>]


In [30]:
# Let's do fine-tuning
finetune_model(model=vgg16, data_loader=train_loader, num_epochs=5, device=device)

Epoch 1/4
-------------
iter 0 ---  Loss: 2.963076278567314    Accuracy: 88.125
iter 1 ---  Loss: 2.757241867482662    Accuracy: 86.71875
iter 2 ---  Loss: 3.0281482860445976    Accuracy: 87.96875
iter 3 ---  Loss: 2.7170959785580635    Accuracy: 88.75
iter 4 ---  Loss: 3.2998723685741425    Accuracy: 87.1875
iter 5 ---  Loss: 3.0873119458556175    Accuracy: 87.8125
iter 6 ---  Loss: 2.757817395031452    Accuracy: 86.09375
iter 7 ---  Loss: 2.9092340767383575    Accuracy: 89.21875
iter 8 ---  Loss: 3.1889036670327187    Accuracy: 87.03125
iter 9 ---  Loss: 2.7469331547617912    Accuracy: 90.0
iter 10 ---  Loss: 3.21889116615057    Accuracy: 86.5625
iter 11 ---  Loss: 3.335458554327488    Accuracy: 88.28125
iter 12 ---  Loss: 3.2115119323134422    Accuracy: 87.34375
iter 13 ---  Loss: 3.0066206976771355    Accuracy: 87.03125
iter 14 ---  Loss: 3.027671255171299    Accuracy: 86.40625
iter 15 ---  Loss: 2.944521762430668    Accuracy: 87.8125
iter 16 ---  Loss: 2.5643158182501793    Accura

### Create a function that would evaluate the model.

Make sure it outputs all of the accuracies of all 20 conditions. <br>

In [33]:
import torch.nn.functional as F

def evaluate_model(model, data_loader, limit:int, device:str):
    """
    Instance method that would evaluate with a given
    data loader, the accuracies obtained by the VGGNET16
    """
    model.eval()
    threshold = 0.5
    accuracies = []
    precisions = []
    recalls = []
    f1_scores = []

    #Use no grad to not perform backpropagation for inference time
    with torch.no_grad():
        #Iterate through each of the images and labels
        for idx, batch in enumerate(data_loader):
    
            #See if it works
            images_inputs, images_labels = batch
            images_inputs, images_labels = images_inputs.to(device), images_labels.to(device)

            #Print the shape of each one of them
            print(f"Inputs shape: {images_inputs.shape}, Labels shape: {labels.shape}")

            #Send the outputs to model in device
            outputs = model(images_inputs)

            #Binarize the output with threshold
            pred_labels = (outputs > threshold).float()

            # Calculate TP, FP, TN, FN
            TP = torch.sum((pred_labels == 1) & (images_labels == 1)).item()
            FP = torch.sum((pred_labels == 1) & (images_labels == 0)).item()
            TN = torch.sum((pred_labels == 0) & (images_labels == 0)).item()
            FN = torch.sum((pred_labels == 0) & (images_labels == 1)).item()

            #_, predicted = torch.max(outputs, 1)  # Get the index of the maximum log-probability
            accuracy = ((TP + TN) / (TP + FP + TN + FN)) * 100.0
            precision = (TP / (TP + FP)) * 100.0 if (TP + FP) > 0 else 0.0
            recall = (TP / (TP + FN)) * 100.0 if (TP + FN) > 0 else 0.0
            f1_score = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0

            print("Accuracy: {:.2f}%".format(accuracy))
            print("Precision: {:.2f}%".format(precision))
            print("Recall: {:.2f}%".format(recall))
            print("F1 Score: {:.2f}%".format(f1_score))

            accuracies.append(accuracy)
            precisions.append(precision)
            recalls.append(recall)
            f1_scores.append(f1_score)

            if idx == limit:
                print("Limit reached")
                break
    return accuracies, precisions, recalls, f1_scores

In [40]:
vgg16.load_state_dict(torch.load(os.path.join(state_dict_path, "vgg16_finetune_params.pth")))

<All keys matched successfully>

In [41]:
# Evaluate on the eval set
accuracies, precisions, recalls, f1_scores = evaluate_model(vgg16, val_loader, 5, device=device)

Inputs shape: torch.Size([32, 3, 224, 224]), Labels shape: torch.Size([32, 20])
Accuracy: 83.91%
Precision: 15.05%
Recall: 36.84%
F1 Score: 21.37%
Inputs shape: torch.Size([32, 3, 224, 224]), Labels shape: torch.Size([32, 20])
Accuracy: 85.31%
Precision: 28.95%
Recall: 71.74%
F1 Score: 41.25%
Inputs shape: torch.Size([32, 3, 224, 224]), Labels shape: torch.Size([32, 20])
Accuracy: 85.31%
Precision: 20.20%
Recall: 57.14%
F1 Score: 29.85%
Inputs shape: torch.Size([32, 3, 224, 224]), Labels shape: torch.Size([32, 20])
Accuracy: 85.47%
Precision: 24.76%
Recall: 65.00%
F1 Score: 35.86%
Inputs shape: torch.Size([32, 3, 224, 224]), Labels shape: torch.Size([32, 20])
Accuracy: 78.59%
Precision: 16.15%
Recall: 42.86%
F1 Score: 23.46%
Inputs shape: torch.Size([32, 3, 224, 224]), Labels shape: torch.Size([32, 20])
Accuracy: 83.28%
Precision: 22.69%
Recall: 64.29%
F1 Score: 33.54%
Limit reached
