# Day 3 - Morning - Assignment

In this session, we will discuss your assignments. 

We will look at each assignment in detail, and you will get the chance to start your assignments and ask questions, and we can work through the assignments together in this morning session. 

You have the choice of one of two different assignments.

Here are your choices, choose **one** of the following: 

1. Medical image classification
2. Streamlit web application

## Deadline

**Deadline is the 25th of March!!**

**Note**: Send your assignments in a single Zip file, as the Med Uni Graz spam filter will block files ending in `.py` for example.

## 1. Medical Image Classificaiton

In this task, you are expected to classify histopathology images. 

For this, we will use MedMNIST, specifically **PathMNIST**.

You can find informaton about MedMNIST and specifically PathMNIST here: <https://medmnist.com>

Here is the information provided by the package:

> The PathMNIST is based on a prior study for predicting survival from colorectal cancer histology slides, providing a dataset of 100,000 non-overlapping image patches from hematoxylin & eosin stained histological images, and a test dataset of 7,180 image patches from a different clinical center. The dataset is comprised of 9 types of tissues, resulting in a multi-class classification task. We split the 100,000 images into training and validation set with a ratio of 9:1.

PathMNIST consists of more than 100,000 images.

It is a 9-class classification problem, where you should classify the type of tissue in to the following classes:

- '0': 'adipose' 
- '1': 'background'
- '2': 'debris'
- '3': 'lymphocytes'
- '4': 'mucus'
- '5': 'smooth muscle'
- '6': 'normal colon mucosa'
- '7': 'cancer-associated stroma'
- '8': 'colorectal adenocarcinoma epithelium'

Options:

1. Use large images (as large as possible, but at least 128)
2. Resampling to re-balance the dataset during training
3. Use a Pre-trained Network (PyTorch Hub), for example ResNet. ResNet is available in difference depths, with ResNet18 being the smallest. The deeper the network, the more memory you will need and the longer it will require to train, but potentially the deeper the network the better the performance.
4. Augmentation (Use the PyTorch `transforms.Compose` code from the seminar notebooks)
5. Change from 9 class to 2 class (malignant/benign): use this as a last resort. In this case, you reduce the problem down to a binary classification problem, which is much easier task than the 9-class classification problem.

The best and easiest solution might be to train a network using the full sized images. Here are the available image sizes for PathMNIST:

```python
from medmnist import PathMNIST

PathMNIST.available_sizes
```

returns the following: `[28, 64, 128, 224]`.

The size 224x224 is common among pre-trained networks as this was a size often used by competitions over the years.

For example, ResNet18 has been pre-trained on ImageNet, and these are 224x224 images. Hence, using PyTorch Hub to get a pretrained ResNet18 is a quick way to get going:

```python
import torchvision.models as models

model = models.resnet18(pretrained=True)
```

However, it is likely that an even larger ResNet will be needed to get good results. You can see the different sized ResNet models here: <https://pytorch.org/hub/pytorch_vision_resnet/> 

However, ImageNet is a 1,000 class classification dataset, and therefore ResNet18 will have 1,000 output neurons, while PathMNIST has 9 classes. You will need to adjust the output layer to have 9 output neurons instead of 1,000.

This can be done using the following syntax:

```python
num_classes = 9

model.fc = nn.Linear(model.fc.in_features, num_classes)
```

### Colab

If you want to work on the larger images, you might need to use a GPU, especially if you decide to use some of the very deep ResNet models. You can get a free GPU from Google Colab:

<https://colab.research.google.com/>

Change the runtime to GPU: Click Runtime -> Change runtime type and select T4 GPU.

**Google Colab users**: If you encounter issues with PyTorch complaining about a missing GPU device with the example code provided in the 'Day 2 - Morning.ipynb' notebook, you can run the following after you have imported Torch:

```python
import torch 

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```

### ResNet Sizes

ResNet is available in several depths:

```python
model = models.resnet18(pretrained=True)
model = models.resnet34(pretrained=True)
model = models.resnet50(pretrained=True)
model = models.resnet101(pretrained=True)
model = models.resnet152(pretrained=True)
```

The larger the network, the longer it will require to train and the memory it will require. However, larger networks may improve performance.

### Fine Tuning

If you wish to **fine-tune** a pre-trained network, the following code can be used:

```python
for param in model.parameters():
    param.requires_grad = False
```

Fine tuning 'freezes' most of the pre-trained network's layers so that they are not updating during fine-tuning. Therefore fine-tuning is the act of training a pre-trained network, where you only update the final layer or final several layers during training. The early layers of the network are frozen in that case.

The code above should be placed before you adjust the final output layer to 9 neurons for the 9 output classes of PathMNIST.

Fine tuning normally is quite a bit faster than training the entire network, as only the final layer's weights are updated during training.

### Submission
Send a Jupyter notebook with the accuracy, classification report, confusion matrix, etc (please send a `.zip` file as some code files are blocked by the spam filter). Please also provide a short summary of the results.

## 2. Streamlit Web Application

For Streamlist installation instructions see the end of this section.

### Overview

For this assignment, you must create a Streamlit application. The web app should let a user enter some details about the house, and it should predict the house price. 

A model is provided here [house_price_model.pickle](./house_price_model.pickle) - right click the file in the file browser to the left in Jupyter and click Download.

See the [House-Price-Model.ipynb](./House-Price-Model.ipynb) notebook for details on how the data and the model was generated.

### Training Your Own Model

**If you wish to train your own model**:

- You must train a model on some data about house prices.
- Here is a house price dataset: <https://www.kaggle.com/datasets/yasserh/housing-prices-dataset>
- Use only 3 or 4 of the features, e.g. area, number of bedrooms, and number of bathrooms. 

### Web Application Interface

The web app should contain 3 fields (area, number of bedrooms, number of bathrooms), and send the values to the model if all 3 fields are completed.

Use something like 

```python
text1 = st.text_input(label="Area Square Meters")
```

to create text fields.

You can control for user input in the Streamlit application. For example, you can check if all the fields (area, etc.) are filled in **and** check if number of bedrooms or number of bathrooms are all greater than 1 before sending the data to the model.

### Loading a Model

You can load a *pickled* model using:

```python
import pickle 

with open('house_price_model.pickle', 'rb') as f:
    loaded_model = pickle.load(f)

area = 200
bedrooms = 3
bathrooms = 2

# Because we trained with Pandas DataFrames, it is best to pass a DataFrame to the model
new_house = pd.DataFrame({
    'area_sqm': [area],
    'bedrooms': [bedrooms],
    'bathrooms': [bathrooms]
})

predicted_price = loaded_model.predict(new_house)
```

This will return the prediction of the house price given 200 sqaure metres for area, 3 bedrooms, and 2 bathrooms.

### Installing Streamlit

Streamlit needs to be run locally for you to do this assignment. 

1. First create a virtual environment in your project's directory. From the terminal/command line run:

```bash
$ python -m venv atsp
```

This will create a virtual environment called `atsp` in the current directory. 

2. Activate the virtual enviroment:

```bash
$ source atsp/bin/activate
```

3. Install Streamlit, Torch, and Watchdog:

```bash
$ pip install streamlit torch torchvision watchdog
```

This will install the pacakges to your virtual environment. 

Note, you need to reactivate the virtual environment every time you work on the project. Do this with the `source atsp/bin/activate` command. 

To start a Streamlit application, just run `streamlit run` followed by the Python filename from the command line, such as:

```bash
$ streamlit run my_app.py
```

The virtual environment must be active for this to work! 


### Submission

Python file with the Streamlit application code (please send a `.zip` file as `.py` files are blocked by the spam filter)