# 50.039 Theory and Practice of Deep Learning Project 2024

Group 10
- Issac Jose Ignatius (1004999)
- Mahima Sharma (1006106)
- Dian Maisara (1006377)


### Import all relevant libraries

In [None]:
# Matplotlib
# import matplotlib.pyplot as plt
# from matplotlib.lines import Line2D
# Numpy
import numpy as np
# Pandas
import pandas as pd
# Torch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torchvision
from torchvision.transforms import ToTensor
from torchvision.io import read_image

!pip install torchmetrics
from torchmetrics.classification import BinaryAccuracy

Collecting torchmetrics
  Downloading torchmetrics-1.3.2-py3-none-any.whl (841 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m841.5/841.5 kB[0m [31m59.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting lightning-utilities>=0.8.0
  Downloading lightning_utilities-0.11.1-py3-none-any.whl (26 kB)
Installing collected packages: lightning-utilities, torchmetrics
Successfully installed lightning-utilities-0.11.1 torchmetrics-1.3.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Motivation

Chest radiography is an essential diagnostic tool used in medical imaging to visualise structures and organs within the chest cavity. It is crucial for diagnosing various respiratory and heart-related conditions. However, with the increased demand for radiological reports within shorter timeframes to detect and treat illnesses, there have been insufficient radiologists available to perform such tasks at scale. Therefore, automated chest radiograph interpretation could provide substantial benefits supporting large-scale screening and population health initiatives. Deep-learning algorithms can be used to bridge this gap. They have been used for image classification, anomaly detection, organ segmentation, and disease progression prediction.
<br><br>

*In this project, we aim to train a deep neural network to perform multi-label image classification on a wide array of chest radiograph images that exhibit various pathologies.*<br><br>



---




## Data Exploration

The training and validation datasets are from the **CheXphoto dataset** (Philips et al., 2020). <br><br> CheXphoto comprises a training set of natural photos and synthetic transformations of 10,507 X-ray images from 3,000 unique patients (32,521 data points) sampled at random from the CheXpert training dataset and an accompanying validation set of natural and synthetic transformations applied to all 234 X-ray images from 200 patients with an additional 200 cell phone photos of x-ray films from another 200 unique patients (952 data points).

### DONT DELETE!!! Retrieving dataset from Google Cloud Storage (GCS)




In [None]:
#OLD CODE : not in use as we are bringing in training datasets from notebook 1
# Connect to GCS to access data
#from google.colab import auth
#auth.authenticate_user() # TODO: everyone to send me gmail so I can have you authed for bucket access

#project_id = 'tpdl-414711'
#bucket_name = 'chexphoto-v1'
#!gcloud config set project {project_id}

# Install Cloud Storage FUSE.
#!echo "deb https://packages.cloud.google.com/apt gcsfuse-`lsb_release -c -s` main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
#!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
#!apt -qq update && apt -qq install gcsfuse

# Mount a Cloud Storage bucket or location, without the gs:// prefix.
#mount_path = "chexphoto-v1"  # or a location like "my-bucket/path/to/mount"
#local_path = f"/mnt/gs/{mount_path}"

#!mkdir -p {local_path}
#!gcsfuse --implicit-dirs {mount_path} {local_path}

In [None]:
#!ls /datasets/chexphoto-v1/

#local_path = "/datasets/chexphoto-v1/"

### Loading dataset (image and labels)

In [None]:
# Testing image loader shd work! as of 21/2/2024 at 3:35am

#hardcode_image = local_path + "/validation/valid/film/VBSF00001/study1/view1_frontal.jpg"

#x = read_image(hardcode_image)

#print(f"Tensor image: {x}")


In [None]:
#OLD CODE not in use 
# Testing excel loader (shd work! as of 21/2/2024 at 3:35am)
hardcode_excel = local_path + "/validation/valid.csv"

class CheXDataset(torch.utils.data.Dataset):
    def __init__(self):
        self.dataframe = pd.read_csv(hardcode_excel)

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        # I know this isnt the most accurate just for the purpose of seeing if I can even load the data....
        image_path = self.dataframe.iloc[idx, 0]
        sex = self.dataframe.iloc[idx, 1]
        age = self.dataframe.iloc[idx, 2]
        FoL = self.dataframe.iloc[idx, 3]
        AoP = self.dataframe.iloc[idx, 4]
        y = torch.tensor(self.dataframe.iloc[idx, 5:], dtype=torch.float64)
        return [image_path, sex, age, FoL, AoP], y


cheX_data = CheXDataset()
[ path, sex, age, FoL, AoP ], y = cheX_data[787] # shd correspond with image loaded above (VBSF00001)
print(f"Non-tensor values: Image Path: {path} Sex: {sex} Age: {age} FoL ? {FoL} AoP? {AoP}")
print(f"Tensor labels: {y}")

NameError: name 'local_path' is not defined

## Data Preprocessing

For our task, we would need to transform the inputs and labels into a more appropriate form using X and one-hot encoding. This is to ensure \<insert justification here\>

### If we need to do anything to the images (greymap conversion etc.), do it here (remove if N/A)

In [None]:
# Implementation of custom Dataset 
class CheXDataset(torch.utils.data.Dataset):
    def __init__(self, df): #previously csv_path but after data preprocessing, we can accept directly
        self.dataframe = df #pd.read_csv(csv_path)

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        x_path = LOCAL_PATH + "/" + self.dataframe.iloc[idx, 0].split("CheXphoto-v1.0", 1)[-1]
        x_tensor =  read_image(x_path) / 255
        y = torch.tensor(self.dataframe.iloc[idx, 5:], dtype=torch.float64)
        return [x_path, x_tensor], y # sex, age, FoL, AoP is removed for now

In [None]:
# Create train, test and valid datasets using CheXDataset
# labels = [LOCAL_PATH +"/train.csv", LOCAL_PATH +"/test.csv", LOCAL_PATH +"/valid.csv"]
cheX_train_data = CheXDataset(train_df3) #CheXDataset(labels[0])
cheX_valid_data = CheXDataset(valid_df3) #CheXDataset(labels[2])

NameError: name 'train_df3' is not defined

In [None]:
# Retrieve train sample using custom Dataset above
[ image, x ], y = cheX_train_data[10]

# Print out values and display image
print(f"Non-tensor values: Image Path: {image}\n")
print(f" Tensor image: {x} Tensor labels: {y}\n")
print(f" Tensor image: {x*255} Tensor labels: {y}\n")
pil_img = Image(image)
display(pil_img)

# Retrieve valid sample using custom Dataset above
[ image, x ], y = cheX_valid_data[34]

# Print out values and display image
print(f"Non-tensor values: Image Path: {image}\n")
print(f" Tensor image: {x} Tensor labels: {y}\n")
pil_img = Image(image)
display(pil_img)

NameError: name 'cheX_train_data' is not defined

### One-hot encoding of labels

## Model Tuning

Our initial model is a simple feedforward neural network with multiple heads (14 heads) capable of classifying each observation for the various pathologies. We utilise the Cross-entropy loss function to optimise the model during training.

**This is a TODO since it can change**


### First iteration - Simple feedforward neural network

Maybe add a description here how the multi-head was implemented (with sources)

#### Model



In [None]:
# Write out our base model here

#### Training

#### Evaluation

### Second iteration - Convolutional neural network (CNN)

Gradually, we moved the model into a traditional CNN-based architecture to see if we can surpass the performance from above. Briefly discuss what we needed to add to the model (filtering, convolution blablabla)

#### Model

#### Training

#### Evaluation

## Observations

**TODO** Discuss whether its right for us to pluck all our evaluation and training together and discuss it here or break up the code without any descriptions


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d056a7b8-1929-4f43-a228-a643b0e765c5' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>