# Assignment 

As part of this course, you should complete an assignment. 

You have the choice of one of three different assignments.

Here are your choices, choose **one** of the following: 

1. Medical Image Classification
2. Streamlit web application
3. Tabular Data Classification or Regression

## Dealine

**Deadline is the 25th of March!!**

**Note**: Send file(s) in a Zip file, as the Med Uni Graz spam filter will block files ending in `.py` for example.

## 1. Medical Image Classificaiton

In this task, you are expected to classify dermatology medical images. 

For this, we will use MedMNIST, specifically DermaMNIST. The best approach would be to train on larger images, either 128x128 or even larger sized images. In the example during the seminar, we used the reduced size dataset. This resulted in poor performance. The bigger the image you choose, the better it will work.

Options:

1. Use larger images (as large as possible, but at least 128)
2. Resampling to re-balance the dataset during training
3. Use a Pre-trained Network (PyTorch Hub), for example ResNet. ResNet is available in difference depths, with ResNet18 being the smallest. The deeper the network, the more memory you will need and the longer it will require to train, but potentially the deeper the network the better the performance.
4. Augmentation (Use the PyTorch `transforms.Compose` code from the seminar notebooks)
5. Change from 7 class to 2 class (malignant/benign): prob use this as a last resort! 

The best and easiest solution might be to train a network using the full sized images. Here are the available image sizes for DermaMNIST:

```python
from medmnist import DermaMNIST

DermaMNIST.available_sizes
```

returns the following: `[28, 64, 128, 224]`.

The size 224x224 is common among pre-trained networks as this was a size often used by competitions over the years.

For example, ResNet18 has been pre-trained on ImageNet, and these are 224x224 images. Hence, using PyTorch Hub to get a pretrained ResNet18 is a quick way to get going:

```python
import torchvision.models as models

model = models.resnet18(pretrained=True)
```

However, ImageNet is a 1,000 class classification dataset, and therefore ResNet18 will have 1,000 output neurons, while DermaMNIST has 7 classes. You will need to adjust the output layer to have 7 output neurons instead of 1,000.

This can be done using the following syntax:

```python
num_classes = 7

model.fc = nn.Linear(model.fc.in_features, num_classes)
```

### Colab

If you want to work on the larger images, you might need to use a GPU, especially if you decide to use some of the very deep ResNet models. You can get a free GPU from Google Colab:

<https://colab.research.google.com/>

Change the runtime to GPU: Click Runtime -> Change runtime type and select T4 GPU.

**Google Colab users**: If you encounter issues with PyTorch complaining about a missing GPU device with the example code provided in the 'Day 2 - Morning.ipynb' notebook, you can run the following after you have imported Torch:

```python
import torch 

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```

### ResNet Sizes

ResNet is available in several depths:

```python
model = models.resnet18(pretrained=True)
model = models.resnet34(pretrained=True)
model = models.resnet50(pretrained=True)
model = models.resnet101(pretrained=True)
model = models.resnet152(pretrained=True)
```

The larger the network, the longer it will require to train and the memory it will require. However, larger networks may improve performance.

### Fine Tuning

If you wish to **fine-tune** a pre-trained network, the following code can be used:

```python
for param in model.parameters():
    param.requires_grad = False
```

Fine tuning 'freezes' most of the pre-trained network's layers so that they are not updating during fine-tuning. Therefore fine-tuning is the act of training a pre-trained network, where you only update the final layer or final several layers during training. The early layers of the network are frozen in that case.

The code above should be placed before you adjust the final output layer to 7 neurons for the 7 output classes of DermaMNIST.

Fine tuning normally is quite a bit faster than training the entire network, as only the final layer's weights are updated during training.

### Submission
Send a Jupyter notebook with the accuracy, classification report, confusion matrix, etc (please send a `.zip` file as some code files are blocked by the spam filter).

## 2. Streamlit Web Application

For this assignment, you must create a Streamlit application. The web app should let a user enter some details about the house, and it should predict the house price. 

A model is provided here as [price_model.pickle](https://github.com/imigraz/ATSP-2024/blob/master/Python-Content/price_model.pickle) in this directory. 

### Training Your Own Model

**If you wish to train your own model**:

- You must train a model on some data about house prices.
- Here is a house price dataset: <https://www.kaggle.com/datasets/yasserh/housing-prices-dataset>
- Use only 3 or 4 of the features, e.g. area, number of bedrooms, and number of bathrooms. 

### Web Application Interface

The web app should contain 3 fields (area, number of bedrooms, number of bathrooms), and send the values to the model if all 3 fields are completed.

Use something like 

```python
text1 = st.text_input(label="Area Square Meters")
```

to create text fields.

**Note**: the provided model requires that the **minimum number of bedrooms is 1** and the **minimum number of bathrooms is 1**, and that the **area is not 0**, or some very low and unrealistic number. The model will return negative or nonsense numbers if you have 0 bedrooms or 0 bathrooms. 

You can control for user input in the Streamlit application. For example, you can check if all the fields (area, etc.) are filled in **and** check if number of bedrooms or number of bathrooms are all greater than 1 before sending the data to the model.

### Loading a Model

You can load a *pickled* model using:

```python
import pickle 

model_from_file = pickle.load(open('price_model.pickle', 'rb'))

predicted_price = model_from_file.predict([[250, 2, 3]])  # Notice the double [] brackets!
```

This will return the prediction of the house price given 250 for area, 2 for num of bedrooms, and 3 for number of bathrooms.

### Submission

Python file with the Streamlit application code (please send a `.zip` file as `.py` files are blocked by the spam filter)

## 3. Tabular Data (Classification or Regression)

In this assignment, you are free to choose any tabular dataset, and train any machine learning algorithm on this data. It can be classification or regression. Do not use any of the datasets that come with Sci-Kit Learn, use a dataset you had to download, either from OpenML or a website such as Kaggle: <https://www.kaggle.com/datasets>

1. Choose an appropriate algorithm for the task
2. Split your data properly using a train/test split
3. Analyse the results
4. Discuss the metrics
5. Plot a confusion matrix

Submission: Jupyter notebook (please send a `.zip` file as some code files are blocked by the spam filter).