# Deep Learning in Medicine
## BMSC-GA 4493, BMIN-GA 3007
## Homework 3: RNNs


Note: If you need to write mathematical terms, you can type your answers in a Markdown Cell via LaTex
See: <a href="https://stackoverflow.com/questions/13208286/how-to-write-latex-in-ipython-notebook">here</a> if you have issues. To see basic LaTex notation see: <a href="https://en.wikibooks.org/wiki/LaTeX/Mathematics">here</a>.


Submission instruction: Upload your final jupyter notebook file, along with any figures that you may produce, in a zipped file named **netid_hw3** on Brightspace.

**Submission deadline: April 17th 2024 11:59pm.**



# Question 1: Literature Review: A combined deep CNN-LSTM network for the detection of COVID-19 (Total points 20 + 10 bonus points)

Read this paper:

#### Islam, Md Zabirul, Md Milon Islam, and Amanullah Asraf. "A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images." Informatics in medicine unlocked 20 (2020): 100412.

https://www.sciencedirect.com/science/article/pii/S2352914820305621

We are interested in the methods that is proposed in this publication, technical aspects of the implementation.

**1.1) (10 points)** Explore the "Development of combined network" section to outline the CNN-LSTM model's architecture proposed in the study. Detail the sequence and types of layers included, the dimensions of the input data, and the model's output characteristics, without focusing on the individual parameters of each layer.

**1.2) (5 points)** In the "Experimental results analysis" section, the paper discusses the accuracy, specificity, sensitivity, and F1-score of the model. How do the metrics of accuracy, specificity, sensitivity, and F1-score individually assess the effectiveness of the CNN-LSTM model in accurately detecting COVID-19 cases from X-ray images? Discuss the significance of these metrics.

**1.3) (5 points)** What advantages does LSTM offer over traditional RNNs, and which specific architectural features of LSTM address the limitations commonly associated with RNNs? Citing one example is sufficient for this quesiton.

**1.5) (Bonus, maximum 10 points)**. Within the field of NLP, transformers have outperformed LSTM networks in various tasks. Discuss at least two advantages that transformers offer over LSTM models. Additionally, explore at least two limitations of current transformer architectures.

# Question 2: : Literature Review: Vision transformer for generalized medical image classification. (20 points)

Read this paper: 


#### Omid Nejati Manzari, Hamid Ahmadabadi, Hossein Kashiani, Shahriar B. Shokouhi, Ahmad Ayatollahi .(2023). MedViT: A robust vision transformer for generalized medical image classification.

https://www.sciencedirect.com/science/article/pii/S0010482523002561?casa_token=hu4VVTckhpAAAAAA:6FYz2wUZ6eSTbBn_AiAUI6ROv12SeAMA3-ht5tL3rOR3Km74tVx5IvTQjsH35w0Vu8_S-cA

In this study, the authors propose a highly robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs as well as the global connectivity of vision Transformers. 


**2.1) (10 points)** Describe the architecture of MedViT. How does it integrate the strengths of both CNNs and Transformers? Discuss the role of Efficient Convolution Block, Local Transformer Block, and Transformer Augmentation Block in achieving robust and efficient medical image classification.

**2.2) (5 points)**) Discuss the strategies employed by MedViT to enhance adversarial robustness and learn smoother decision boundaries. How does augmenting the shape information of an image in the high-level feature space contribute to this goal?

**2.3) (5 points)** Based on the conclusions of the study, what future directions do the authors suggest for improving MedViT ? Discuss the potential for MedViT's application beyond the datasets evaluated in this study and briefly explain why MedViT would work well for those datasets.

# Question 3 - Programming: Build Classifiers on Medical Transcriptions - Recurrent Neural Networks and Self Attention(60 points + 10 bonus points)

Let's build some models now. In this homework, we will focus on a dataset which has around 5000 medical transcriptions and the corresponding medical specialty. The data is available <a href="https://www.kaggle.com/tboyle10/medicaltranscriptions">here</a>.

Here, we will focus on predicting top few classes of medical specialty, from the transcription text. <a href="https://github.com/nyumc-dl/BMSC-GA-4493-Spring2024/blob/main/lab6/Lab6_RNN.ipynb">Lab 6</a> will be very useful here.


**3.1) (5 points)** Read the csv using Pandas. Select the top 6 frequent classes ('medical_specialty') from the data. Only keep the rows that belong to one of these classes in your data. Which classes are there, and how many rows do you have after this filteration?

**3.2) (5 points)** Now convert your data into train, test and validation set. Shuffle the rows, and split them with ratios of (train:60%, valid:20%, test:20%). Set the random seed to 2024. Please follow the steps from https://pytorch.org/docs/stable/notes/randomness.html to set all the seeds to make the results reproducible.

**3.3) (5 points)** Create a function to create vocabulary from the training data. Only use the transcription column for this. Use the tokenization scheme of your choice and create a vocabulary.

**3.4) (10 points)** Write a dataloader and collate function so that we can begin to train our networks! You can choose to use either the complete transcription text or fix a maximum length of transcription text as input for your model.

**3.5) (10 points)** Now you are ready to build your sequence classification model!

First, Build a simple GRU model that takes as input the text indices from the vocabulary, and ends with a softmax over total number of classes. Use the embedding and hidden dimension of your choice. 

**Please train your model to reach at the least 50% accuracy on the test set.**

At each epoch, compute and print **Average Cross Entropy loss** and **Accuracy** on both **train and validation set** 

Plot your validation and train loss over different epochs. 

Plot your validation and train accuracies over different epochs. 

Finally print accuracy on the test set.

**3.6) (25 points)** Now, let's finetune a sequence classification model based on BERT. Please install the Huggingface's Transformers library for this. Use the Pretrained 'bert-base-uncased' model for this problem. Please use the BERT tokenizer from the pretrained built for 'bert-base-uncased' model . Use the AdamW optimizer from the transformers library for optimization. Remember BERT uses Attention masks for input so you need to create a separate dataloader for BERT. Please keep in mind that BERT can handle maximum of 512 tokens.

**Please finetune the model so that it reaches at least 60% accuracy on the test set.**

The rest of your experimental setting should be the same as 3.5:

At each epoch, compute and print **Average Cross Entropy loss** and **Accuracy** on both **train and validation set** 

Plot your validation and train loss over different epochs. 

Plot your validation and train accuracies over different epochs. 

Finally print accuracy on the test set.

**3.7) (Bonus maximum 10 points)** List 5 examples on the test set that BERT misclassified. Describe reasons identified for misclassification.