# Homework 2 - Convolutional Neural Networks

### Deep Learning in Medicine - Spring 2025



**Note:** If you need to write mathematical terms, you can type your answeres in a Markdown Cell via LaTex

**See:** <a href="https://stackoverflow.com/questions/13208286/how-to-write-latex-in-ipython-notebook">here</a> if you have issues. To see basic LaTex notation see: <a href="https://en.wikibooks.org/wiki/LaTeX/Mathematics"> here </a>.

**Submission instruction:** Upload and Submit a zipped folder named netid_hw2 consisting of your final jupyter notebook and necessary files in <a href='https://brightspace.nyu.edu/d2l/home/427921'>Brightspace</a>. If you use code or script from web, please give a link to the code in your answers. Not providing the reference of the code used will reduce your points!!

**Submission deadline: Saturday March 20rd, 2025**

### Topics & weightage -


1.   Convolutions (30)
2.   Network design (15)
3.   Literature review (19)
4.   Deep CNN design for disease classification (36)
5.   Analysis of Results (5)
6.   Bonus Questions (12) - optional!



## Question 1 Convolutions (Total 30 points)

### 1.1 Convolutions from **scratch** for image processing (11 points)

In [2]:
import numpy as np
from PIL import Image, ImageOps
import matplotlib.pyplot as plt

In [3]:
# functions to plot images
def plot_image(img: np.array):
    plt.figure(figsize=(6, 6))
    plt.imshow(img, cmap='gray');
    
def plot_two_images(img1: np.array, img2: np.array):
    _, ax = plt.subplots(1, 2, figsize=(12, 6))
    ax[0].imshow(img1, cmap='gray')
    ax[1].imshow(img2, cmap='gray')

#### 1.1.a (1 point)

In [4]:
# TODO: load any image of your choice and display (plot) the resized image (224*224) in grayscale using the plot_image function
# or you can also utilize the sample image provided --> cat.png
# (none of these transformations are mandatory, but they make our job a bit easier, 
# as there’s only one color channel to apply convolution to)

In [5]:
# defining filters 
sharpen = np.array([
    [0, -1, 0],
    [-1, 5, -1],
    [0, -1, 0]
])

blur = np.array([
    [0.0625, 0.125, 0.0625],
    [0.125,  0.25,  0.125],
    [0.0625, 0.125, 0.0625]
])

outline = np.array([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
])

#### 1.1.b (1.5 points)

In [6]:
def calculate_target_size(img_size: int, kernel_size: int) -> tuple:
  '''
  Helper function to calculate the image size after applying the convolution
  Basically calculates how many windows of the filter size you can fit to an image (assuming square image)
  Applying a convolution to an image will make it smaller (assuming no padding). 
  The filter size determined how smaller the image will be after convolving.

  Args:
    img_size: size of one dimension of the input image (assuming its a square image)
    kernel_size: size of one dimension of the kernel (a square kernel)

  Returns:
    size: dimensions of the output image

  '''
  # TODO: write a generic function that inputs an image size & kernel size to calculate the final size of the output
  return size

#### 1.1.c (3 points)

In [7]:
def convolve(img: np.array, kernel: np.array) -> np.array:
  '''
  The convolve() function calculates the target size and creates 
  a matrix of zeros with that shape, iterates over all rows and 
  columns of the image matrix, subsets it, and applies the convolution.

  Args:
    img: the input image as a numpy array
    kernel: kernel as a numpy array

  Returns:
  convolved_img: output image after sliding the kernel over the input image  
  '''
  # TODO: implement the convolve function 
  # iterate over all rows and columns of the input image matrix
  # subset the image based on the kernel size at each position and apply the convolution operation
  return convolved_img

#### 1.1.d (0.5 point)

In [8]:
# TODO: use the convolved function & the sharpen filter to obtain a sharpened image of your original input 
# TODO: print the sharpened image array named img_sharpened
# TODO: use the plot_two_images function to plot the original image and sharpened image side by side

#### 1.1.e (0.5 point)

In [9]:
def negative_to_zero(img: np.array) -> np.array:
  '''
  Args:
    img: numpy array of image
  
  Returns:
    img: all values less than zero are assigned zero in original image
  '''
  # TODO: the sharpened image is a little dull, thats because some values in the sharpened image 
  # are less than zero
  # write a function that uses 0 as a threshold and converts all pixel values less than zero to zero
  return img
  
# TODO: use the plot_two_images function to plot the original image and negative_to_zero sharpened image side by side

#### 1.1.f (1 point)

In [10]:
# TODO: use the convolved function & the blur filter to obtain a blurred image of your original input 
# TODO: print the blurred image array named img_blurred
# TODO: use the plot_two_images function to plot the original image and blurred image side by side

In [11]:
# TODO: use the convolved function & the outline filter to obtain a outlined image of your original input 
# TODO: print the outlined image array named img_outlined
# TODO: use the plot_two_images function to plot the outlined image and original image side by side

**Reminder:** Padding is essentially a “black” border around the image. It’s black because the values are zeros, and zeros represent the color black. The black borders don’t have any side effects on the calculations, as it’s just a multiplication with zero.

#### 1.1.g (0.5 point)

In [12]:
def get_padding_width_per_side(kernel_size: int) -> int:
    '''
    Function that returns the number of pixels we need to 
    pad the image with on a single side, depending on the kernel size

    Args:
    kernel_size: filter size 

    Returns:
    padding_width 
    '''
    # TODO: simple integer division by 2
    return padding_width

In [13]:
pad_3x3 = get_padding_width_per_side(3)
pad_5x5 = get_padding_width_per_side(5)
print("padding for kernel size 3 is", pad_3x3, "and padding for kernel size 5 is", pad_5x5)

#### 1.1.h (1.5 points)

In [14]:
def add_padding_to_image(img: np.array, padding_width: int) -> np.array:
    '''
    Function that adds padding to the image. 
    First, the function declares a matrix of zeros with a shape of image.shape + padding * 2. 
    The function then indexes the matrix so the padding is ignored and changes the zeros with the actual image values.

    Args:
      img: Original image numpy array
      padding_width: obtained in the get padding function earlier

    Returns:
      img_with_padding: padded image
    '''
    # TODO: take your image and a padding width as input and return the image with the padding added
    return img_with_padding

#### 1.1.i (1 point)

In the above function add_padding_to_image, explore the possible reason for the multiplication of padding_width by 2 in step 1

#### 1.1.j (0.5 point)

In [15]:
# TODO: use the add_padding_to_image function to obtain the padded image (kernel size of 3)
img_with_padding_3x3 =

print(img_with_padding_3x3.shape)
plot_image(img_with_padding_3x3)

In [16]:
# TODO: use the add_padding_to_image function to obtain the padded image (kernel size of 5)
img_with_padding_5x5 = 

print(img_with_padding_5x5.shape)
plot_image(img_with_padding_5x5)

#### 1.1.k (1 point)

In [17]:
# TODO: use the convolved function & the sharpen filter and negative to zero to obtain a sharpened image of your
# padded image (kernel size of 5) obtained from add_padding_to_image function 
# TODO: print the shape of the obtain sharpened image (obtained after padding)
# TODO: plot the original image and the sharpened image (obtained after padding) side by side using the
# plot_two_images function

### 1.2 Convolutional Layers (4 points)

We have a 3x5x5 image (3 channels) and three 3x3x3 convolution kernels as pictured. Bias term for each feature map is also provided. For the questions below, please provide the feature/activation maps requested, please provide the python code that you used to calculate the maps.

**Hint:** An image tensor should be [batch size, channels, height, weight], kernels/filters tensor should be [number of filters (output channels), filter_size_1 (input channels), filter_size_2, filter_size_3].

<img src="https://github.com/nyumc-dl/BMSC-GA-4493-Spring2022/blob/main/Homework2/HW2_picture1.png?raw=1">

What will be the dimension of the feature maps after we forward propogate the image using the given convolution kernels for the following (a) - (d)

#### 1.2.a stride=1, padding = 0 (1 point)

#### 1.2.b stride=2, padding = 1 (1 point) 

#### 1.2.c stride=3, padding = 2 (1 point)

#### 1.2.d stride=1, dilation rate=2, and padding=0 (1 point) 

### 1.3 Feature Dimensions of Convolutional Neural Network (4*0.5 points)

In this problem, we compute output feature shape of convolutional layers and pooling layers, which are building blocks of CNN. Let’s assume that input feature shape is C x W × H, where C is the number of channels, W is the width, and H is the height of input feature. 




#### 1.3.a (0.5 points)

A convolutional layer has 4 hyperparameters: the filter size(K), the padding size (P), the stride step size (S) and the number of filters (F). How many weights and biases are in this convolutional layer? And what is the shape of output feature that this convolutional layer produces?


#### 1.3.b (0.5 points)

A pooling layer has 2 hyperparameters: the stride step size(S) and the filter size (K). What is the output feature shape that this pooling layer produces?


#### 1.3.c (0.5 points)

Let’s assume that we have the CNN model which consists of L successive convolutional layers and the filter size is K and the stride step size is 1 for every convolutional layer. Then what is the receptive field size?


#### 1.3.d (0.5 points)

Consider a downsampling layer (e.g. pooling layer and strided convolution layer). In this problem, we investigate pros and cons of downsampling layer. This layer reduces the output feature resolution and this implies that the output features loose the certain amount of spatial information. Therefore when we design CNN, we usually increase the channel length to compensate this loss. For example, if we apply the max pooling layer with kernel size of 2 and stride size of 2, we increase the output feature size by a factor of 2. If we apply this max pooling layer, how much the receptive field increases? Explain the advantage of decreasing the output feature resolution with the perspective of reducing the amount of computation.

### 1.4 (6 points)
Use the pytorch package to calculate feature/activation maps. Write a code which takes 3x5x5 image and performs a 2D convolution operation (with stride = 1 and zero padding) using 3x3x3 filters provided on the picture. After convolution layer use leaky ReLU activation function (with negative slope 0.01) and Max-Pooling operation with required parameters to finally obtain output of dimension 3x1x1. Provide the code, feature maps obtained from convolution operation, activation maps, and feature maps after Max-Pooling operation.

**Hint:** You can refer to [AdaptiveMaxPool2d](https://pytorch.org/docs/stable/nn.html#adaptivemaxpool2d) to get desired dimension output from Pooling layer.

In [30]:
# starter code to load image:x, kernel weights:w and bias:b
# if you hit errors related to the long data type convert the values in your numpy arrays to floats
import numpy as np
import torch.nn.functional as f
import torch
x = np.load('q1_input.npy')
w = np.load('q1_Filters.npy')
b = np.load('q1_biases.npy')

### 1.5 (7 points)
Use the pytorch package to calculate feature/activation maps of a residual unit. Example of a residual unit are seen in figure 2 of https://arxiv.org/pdf/1512.03385.pdf as well as in the figure below.


<img src="https://github.com/nyumc-dl/BMSC-GA-4493-Spring2022/blob/main/Homework2/HW2_picture2.png?raw=1" width="150">

Write a code which takes 3x5x5 input image and performs two 2D convolution operations using the filters provided in the figure above. Please use the three 3x3x3 filters for the two Convolution layers. You need to set a suitable padding size for the convolution operations. After the convolution layers have the residual addition and use the ReLU activation function. Provide the code and feature maps obtained from each convolution operation, activation maps, and the last activation map obtained from the residual unit.

### 1.6 (2 points)
Describe the key design paramters of inception v3 (https://arxiv.org/pdf/1512.00567.pdf) and explain how it avoids overfitting of data.

## Question 2 Network design parameters for disease classification (Total 15 points)

Disease classification is a common problem in medicine. There are many ways to solve this problem. Goal of this question is to make sure that you have a clear picture in your mind about possible techniques that you can use in such a classification task.

Assume that we have a 10K images in a dataset of computed tomography (CTs). For each image, the dimension is 16x256x256 and we have the label for each image. The label of each image defines which class the image belongs (lets assume we have 4 different disease classes in total). You will describe your approach of classifying the disease for the techniques below. Make sure you do not forget the bias term. Please provide the pytorch code which designs the network for questions 2.1.a, 2.2.a, and 2.3.a.

**Hint:** See lab 4 for an example of how to make a class for a network (Implementing LeNet).


In [21]:
# starter code
# you can generate a random image tensor for batch_size 8
x = torch.Tensor(8,1,16,256,256).normal_().type(torch.FloatTensor)

#### 2.1.a (2 points)
Design a multi layer perceptron (MLP) with a two hidden layer which takes an image as input (by reshaping it to a vector: let's call this a vectorized image). Our network has to first map the vectorized images to a vector of 512, then to 256 in a hidden layer and then to 128 in a hidden layer and finally feeds this vector to a fully connected layer to get the probability of 5 tissue classes. 

#### 2.1.b (2 points)

Clearly mention the sizes for your input and output at each layer until you get final output vector with 5 tissue classes and an input of images of size 16x256x256.

#### 2.1.c (1 points)
How many parameters you need to fit for your design? How does adding another hidden layer (map to 64 after 128) will effect the number of parameters to use?

#### 2.2.a (2 points)
Design a one layer convolutional neural network which first maps the images to a vector of 256 and then 128 (both with the help of convolution and pooling operations) then feeds this vector to a fully connected layer to get the probability of 5 disease classes.

### 2.2.b (2 points)
Clearly mention the sizes for your input, kernel, pooling, and output at each step until you get final output vector with 5 probabilities.

#### 2.2.c (1 point) 
How many parameters you need to fit for your design?

### 2.2.d (2 points)
Now increase your selected convolution kernel size by 4 in each direction. Describe the effect of using small vs large filter size during convolution.

### 2.3 (3 points)
Explain your findings regading different types of neural networks and building blocks based on your observations from 2.1 and 2.2. 

## Question 3 Literature Review: ChestX-ray8 (Total 19 points)
Read this paper:

Pranav Rajpurkar, Jeremy Irvin, et al. 
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning https://arxiv.org/abs/1711.05225


We are interested in understanding the goal of the task performed, the methods proposed, technical aspects of the implementation, and possible future work. After you read the full article answer the following questions. Describe your answers in your own words.  

### 3.1 (2 points) 

What was the underlying goal of this paper? What were the challenges in detection of pneumonia that this paper aimed at solving? What was the key motivation?


### 3.2  (3 points)
Describe the machine learning task (segmentation, classification, regression, etc?) that was attempted in this paper. Further describe the learning algorithm used (supervised, unsupervised, ..etc.) and the reason was using this algorithm.

### 3.3 (2.5 points)
How does the proposed architecture in this paper compare with the previous State of the art? Give details on the modifications and improvements, and reasons for why you think these worked.

### 3.4 (2 points)
Describe the CNN architecture used along with training details (a flow that explains the entire training process with details on the batch_size, optimizer, loss function, model weights, learning rate, etc). Also try to infer why were these paramters and hyperparamters chosen for this specific task.


### 3.5 (2.5 points)

How was the model evaluated? What were the metrics utilized? List down reasons of using these metrics over all others.




### 3.6 (2.5 points)

Explain model interpretation through class activation mapping. Discuss the role of Class Activation Maps (CAMs) in CheXNet.?

### 3.7 (2 points)
What was the kind of preprocessing the dataset went through? Explain reasons for each data transformation/preprocessing step.

### 3.8 (2.5 points)

In the paper CAMs (class activation mappings) are used for visualisation. Can this method be used for any CNN? Describe the architectural requirements for getting CAM visualisations.

## Question 4 Deep CNN design for disease classification (Total 36 points)

In this part of the howework, we will focus on classifiying the lung disease using chest x-ray dataset provided by NIH (https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community). You should be familiar with the dataset after answering question 3.

You need to use HPC for training part of this question, as your computer's CPU will not be fast enough to compute learning iterations. Please read the HPC instruction first. In case you use HPC, please have your code/scripts uploaded under the questions and provide the required plots and tables there as well. If you run the HW2 jupter script with Squash File System and Singularity on GCP, you can find the data under /images folder. We are interested in classifying pneumothorax, cardiomegaly and infiltration cases. By saying so we have 3 classes that we want to identify by modelling a deep CNN.

First, you need to work on Data_Entry_2017_v2020.csv file to identify cases/images that has infiltration, pneumothorax, and cardiomegaly. This file can be downloaded from https://nihcc.app.box.com/v/ChestXray-NIHCC

### 4.1 Train, Test, and Validation Sets (0.5 point)
Write a script to read data from Data_Entry_2017.csv and process to obtain 3 sets (train, validation and test). By using 'Finding Labels' column, define a class that each image belongs to, in total you can define 3 classes:
- 0 cardiomegaly
- 1 pneumothorax
- 2 infiltration

Generate a train, validation and test set by splitting the whole dataset containing specific classes (0, 1, and 2)  by 70%, 10% and 20%, respectively. Test set will not be used during modelling but it will be used to test your model's accuracy. Make sure you have similar percentages of different cases in each subset. Provide statistics of the number of classess in your subsets (you do not need to think about splitting the sets based on subjects for this homework; in general, we do not want images from the same subject to appear in both train and test sets). 

Write a .csv files defining the samples in your train, validation and test set with names: train.csv, validation.csv, and test.csv. Submit these files with your homework. 

### 4.2 Data preparation before training (2 points)
From here on, you will use HW2_trainSet.csv, HW2_testSet.csv and HW2_validationSet.csv provided under github repo for defining train, test and validation set samples instead of the csv files you generate on question 4.1.


There are multiple ways of using images as an input during training or validation. Here, you will use torch Dataset class  (http://pytorch.org/tutorials/beginner/data_loading_tutorial.html). We provided an incomplete dataloader code below. Please add your code and complete it.

In [None]:
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import os
from skimage import io
import torch
from skimage import color

class ChestXrayDataset(Dataset):
    """Chest X-ray dataset from https://nihcc.app.box.com/v/ChestXray-NIHCC."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file filename information.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.data_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.data_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.data_frame.iloc[idx, 0])

        # TODO: read in image using io
        # TODO: normalize the image 
        # TODO: return dictionary of image and corresponding class
        # sample = {'x': , 'y': }
    
        if self.transform:
            sample = self.transform(sample)

        return sample

### 4.3 CNN model definition (5 points)
Since now we can import images for model training, next step is to define a CNN model that you will use to train disease classification task. Any model requires us to select model parameters like how many layers, what is the kernel size, how many feature maps and so on. The number of possible models is infinite, but we need to make some design choices to start.  Lets design a CNN model with 4 convolutional layers, 4 residual units (similar question 1.5) and a fully connected (FC) layer followed by a classification layer. Lets use 

-  5x5 convolution kernels (stride 1 in resnet units and stride 2 in convolutional layers)
-  ReLU for an activation function
-  max pooling with kernel 2x2 and stride 2 only after the convolutional layers.

Define the number of feature maps in hidden layers as: 8, 16, 32, 64, 64, 64, 128 (1st layer, ..., 7th layer). 

<img src="https://github.com/nyumc-dl/BMSC-GA-4493-Spring2022/blob/main/Homework2/HW2_picture3.png?raw=1" height="300">

Write a class which specifies this network details. 

### 4.4 (2 point)
How many learnable parameters of this model has? How many learnable parameters we would have if we replace the fully connected layer with global average pooling layer (Take a look at Section 3.2 of https://arxiv.org/pdf/1312.4400.pdf)?  

### 4.5 Loss function and optimizer (2 points)
Define an appropriate loss criterion and an optimizer using pytorch. What type of loss function is applicable to our classification problem? Explain your choice of a loss function.  For an optimizer lets use Adam for now with default hyper-parmeters.

**Some background:** In network architecture design, we want to have an architecture that has enough capacity to learn. We can achieve this by using large number of feature maps and/or many more connections and activation nodes. However, having a large number of learnable parameters can easily result in overfitting. To mitigate overfitting, we can keep the number of learnable parameters of the network small either using shallow networks or few feature maps. This approach results in underfitting that model can neither model the training data nor generalize to new data. Ideally, we want to select a model at the sweet spot between underfitting and overfitting. It is hard to find the exact sweet spot. 

We first need to make sure we have enough capacity to learn, without a capacity we will underfit. Here, you will need to check if designed model in 4.3 can learn or not. Since we do not need to check the generalization capacity (overfitting is OK for now since it shows learning is possible), it is a great strategy to use a subset of training samples. Also, using a subset of samples is helpful for debugging!!!

### 4.6 Train the network on a subset (5 points)
Lets use a script to take random samples from train set (HW2_trainSet.csv), lets name this set as HW2_randomTrainSet. Choose random samples from validation set (HW2_validationSet.csv), lets name this set as HW2_randomValidationSet. You used downsampling of images from 1024x1024 size to 64x64 in the Lab 4. This was fine for learning purpose but it will significantly reduce the infomation content of the images which is important especially in medicine. In this Homework, you MUST use original images of size 1024x1024 as the network input. 

In [None]:
# get samples from HW2_trainSet.csv
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('HW2_trainSet.csv')
_ , X_random, _, _ = train_test_split(df, df.Class, test_size=0.1, random_state=0)
print('Selected subset class frequencies\n',X_random['Class'].value_counts())
X_random.to_csv('HW2_randomTrainSet.csv',index=False)

df = pd.read_csv('HW2_validationSet.csv')
_ , X_random, _, _ = train_test_split(df, df.Class, test_size=0.1, random_state=0)
print('Selected subset class frequencies\n',X_random['Class'].value_counts())
X_random.to_csv('HW2_randomValidationSet.csv',index=False)

Use the random samples generated and write a script to train your network. Using the script train your network using your choice of weight initialization strategy. In case you need to define other hyperparameters choose them empirically, for example batch size. Plot average loss on your random sample set per epoch. (Stop the training after at most ~50 epochs).

### 4.7 Analysis of training using a CNN model (2 points)
Describe your findings. Can your network learn from small subset of random samples? Does CNN model have enough capacity to learn with your choice of emprical hyperparameters?
-  If yes, how will average loss plot will change if you multiply the learning rate by 15?
-  If no, how can you increase the model capacity? Increase your model capacity and train again until you find a model with enough capacity. If the capacity increase is not sufficient to learn, think about empirical parameters you choose in designing your network and make some changes on your selection. Describe what type of changes you made to your original network and how can you manage this model to learn.

### 4.8 Hyperparameters (2.5 points)
Now, we will revisit our selection of CNN model architecture, training parameters and so on: i.e. hyperparameters. In your investigations, define how you will change the hyperparameter in the light of model performance using previous hyperparameters. Provide your rationale choosing the next hyperparameter. Provide learning loss and accuracy curves, and model performance in HW2_randomValidationSet. You will use macro AUC as the performance metric for comparing CNN models for disease classification task.  Report macro AUC for each CNN model with different hyperparameters (Check http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings).

Investigate the effect of learning rate and batch size in the model performance (try atleast 5 learning rates and 3 batch sizes) and select optimal values for both. You only need to put your best result here.

### 4.9 Train the network on the whole dataset (4 points)
After question 4.7, you should have a network which has enough capacity to learn and you were able to debug your training code so that it is now ready to be trained on the whole dataset. Use the best batch size and learning rate from 4.8. Train your network on the whole train set (HW2_trainSet_new.csv) and check the validation loss on the whole validation set (HW2_validationSet_new.csv) in each epoch. Plot average loss and accuracy on train and validation sets. Describe your findings. Do you see overfitting or underfitting to train set? What else you can do to mitigate it?

### 4.10 Experiments with Resnet18

Let's use Resnet18 on our dataset and see how it performs. We can import the standard architectures directly using PyTorch's torchvison.models module. Refer to https://pytorch.org/docs/stable/torchvision/models.html to see all available models in PyTorch. You'll later, in this course, learn about a convenient and useful concept known as Transfer Learning. For now, we will  use the Resnet18 and train the architecture from scratch without any pre-training. Here is the link for the ResNet paper: https://arxiv.org/pdf/1512.03385.pdf .

#### 4.10.a (2 Point)

What is the reason of using 1x1 convolutions before 3x3 convolutions in the resnet architecture?

#### 4.10.b Train the ResNet18 on the whole dataset

We provide a new dataset class and a few additional transformations to the data for this new architecture. We have a new dataset class as ResNet18 architectures expect 3 channels in their primary input and other reasons which you'll later come to know - after the lecture on transfer learning. Nevertheless, for our case, we use them to reduce the required GPU usage as the Resnet18 architecture is significantly complex and GPU memory-intensive architecture than the CNN implemented above.

In [None]:
from torchvision import transforms
from sklearn.preprocessing import LabelEncoder

# torchvision models are trained on input images normalized to [0 1] range .ToPILImage() function achives this
# additional normalization is required see: http://pytorch.org/docs/master/torchvision/models.html

train_transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.RandomResizedCrop(896),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

validation_transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.CenterCrop(896),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

class ChestXrayDataset_ResNet(Dataset):
    """Chest X-ray dataset from https://nihcc.app.box.com/v/ChestXray-NIHCC."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file filename information.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.data_frame = load_data_and_get_class(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.data_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.data_frame.iloc[idx, 0])
        
        image = io.imread(img_name)
        if len(image.shape) > 2 and image.shape[2] == 4:
            image = image[:,:,0]
            
        image=np.repeat(image[None,...],3,axis=0)
            
        image_class = self.data_frame.iloc[idx, -1]

        if self.transform:
            image = self.transform(image)
            
        sample = {'x': image, 'y': image_class}

        return sample

def load_data_and_get_class(path_to_data):
    data = pd.read_csv(path_to_data)
    encoder = LabelEncoder()
    data['Class'] = encoder.fit_transform(data['Finding Labels'])
    return data

#### 4.10.c Architecture modification (4.5 points) 
In this question you need to develop a CNN model based on Resnet18 architecture. Please import the original ResNet18 model from PyTorch models (You can also implement this model by your own using the resnet paper). Modify the architecture so that the model will work with full size 1024x1024 image inputs and 3 classes of our interest:
- 0 cardiomegaly
- 1 pneumothorax
- 2 infiltration

Make sure the model you developed uses random weights!

#### 4.10.d Train the network on the whole dataset (4.5 points)
Similar to question 4.7 train the model you developed in question 4.10.b on the whole train set (HW2_trainSet_new.csv) and check the validation loss on the whole validation set (HW2_validationSet_new.csv) in each epoch. Plot average loss and accuracy on train and validation sets. Describe your findings. Do you see overfitting or underfitting to train set? What else you can do to mitigate it?

## Question 5 Analysis of the results from two networks trained on the full dataset (Total 5 points)
Use the validation loss to choose models from question 4.9 (model1) and question 4.10 (model2) (these models are trained on the full dataset and they learned from train data and generalized well to the validation set). 

### 5.1 Model selection by performance on test set (5 Points)
Using these models, plot confusion matrix and ROC curve for the disease classifier on the test set (HW2_TestSet_new.csv). Report AUC for this CNN model as the performance metric. You will have two confusion matrices and two ROC curves to compare model1 and model2.

In [None]:
# this is the place we predict the disease from a model trained, output for this function is 
# the target values and probabilty of each image having a disease 

# example of how to plot ROC curves
# https://stackoverflow.com/questions/25009284/how-to-plot-roc-curve-in-python

# example of how to calculate confusion matrix
# https://www.kaggle.com/grfiv4/plot-a-confusion-matrix



##  6 Bonus Questions (Maximum 12 points)

**Note:** this section is optional.

### 6.1 Understanding the network (Bonus Question maximum 5 points)

Even if you do both 6.1.a and 6.1.b, the max points for this question is 5.

#### 6.1.a Occlusion (5 points)
Using the best performing model (choose the model using the analysis you performed on question 5.1), we will figure out where our network gathers infomation to decide the class for the image. One way of doing this is to occlude parts of the image and run through your network. By changing the location of the ocluded region we can visualize the probability of image being in one class as a 2-dimensional heat map. Using the best performing model, provide the heat map of the following images: HW2_visualize.csv. Do the heap map and bounding box for pathologies provide similar information? Describe your findings.
Reference: https://arxiv.org/pdf/1311.2901.pdf

In [None]:
# you can use the code from: https://github.com/thesemicolonguy/convisualize_nb/blob/master/cnn-visualize.ipynb 


#### 6.1.b GradCAM (5 points)
An alternative approach to model interpretation is gradcam. Go through https://arxiv.org/pdf/1610.02391.pdf and create heatmaps of images in HW2_visualize.csv using this method. Repeat the analysis in 6.1.a and also compare the time-taken to generate occlusions and gradcams

### 6.2 Tiling and CNNs (Bonus Question 7 points)

When using CNNs it may be helpful to first tile the image, especially for segmentation and object detection tasks. Focus on the "Invasive Ductal Carcinoma Segmentation Use Case" section of this [paper](https://www.sciencedirect.com/science/article/pii/S2153353922005478?via%3Dihub#tbl1). The data is avaliable [here](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images/data).

#### 6.2.a (0.5 points)

Why is it helpful to tile an image and use the tiles as input for a CNN for segmentation?

#### 6.2.b (0.5 points)

Describe the hyperparameters that are introduced when you tile an image.

#### 6.2.c (0.5 points)

What are some metrics that can be used to evaluate segmenation of the full image (when tiles are recombined)?

#### 6.2.d (4 points)

Load the data, train a CNN, and evaluate the performance on the dataset.

**Note:** due to the size of this dataset, feel free to sample only part of the dataset to use to train and evaluate your model. Just please make sure all classes are represented, and that you do not train and test on the same patients.

#### 6.2.e (1.5 points)

Select a patch of 7x7 images and predict their classification. Then display them all together as one image, and denote the patches that are predicted as IDC. Diplay another image that denotes that patches that are IDC.