<a href="https://colab.research.google.com/github/antfolk/BMEN35/blob/main/Session5/BMEN35_Ex16_deep_neural_networks_assignment5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 5
## Fill in your name below
Alexander Andersson, BME4

## Your mission is now the following:

You will use a dataset named HAM10000 ("Human Against Machine with 10000 training images")  (https://doi.org/10.7910/DVN/DBW86T). The dataset contains dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes.  Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc).

More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). 

For this exercise we have downsampled the images to 64x64x3 and randomly taken 1000 samples from the original 10015 to make training times more reasonable. Make sure you have downloaded the `HAM1000_64_64.zip` file from Github to your computer.


We will start you off with loading the data and such.

In [None]:
from google.colab import files
file = files.upload() # Choose HAM1000_64_64.zip

In [None]:
!unzip -q /content/HAM1000_64_64.zip -d /content/

Now we have the data in our workspace. Lets do some imports.


In [None]:
import os
from glob import glob
import cv2
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from PIL import Image
from matplotlib import pyplot as plt

from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Flatten
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau
from keras.applications.vgg16 import VGG16, preprocess_input

Now we will read the data into Python.

In [None]:
# First read metadata into a dataframe
base_dir = '/content/HAM1000_64_64/'
skin_df=pd.read_csv(base_dir + 'dataframe.csv')
# How many rows, that is how many images are there
length = skin_df.shape[0]
# Create a dictionary for the labels
label_dict = {'nv': 0,'mel': 1,'bkl': 2, 'bcc': 3, 'akiec': 4,'vasc': 5,'df': 6 }
# Recode labels in dataframe to be numbers
skin_df['labels'] = skin_df['dx'].map(label_dict)
#Allocate space for X and y aka our data and labels
X = np.zeros((length,64,64,3))   
y = np.zeros((length))
k = 0
for i in skin_df['image_id']: # Get filename from dataframe
  #print(k)
  X[k,:,:,:] = np.asarray(Image.open(base_dir +  i + '.jpg'))
  k= k+1

y = np.asarray(skin_df['labels'])  


We will do the usual conversion as before.

In [None]:
# Convert data to float and scale to be between 0 and 1
X = X.astype('float32')
X /=255.0
yidx = y
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=0)

Let's plot the data and visualise what kind of data we are dealing with.

In [None]:
plt.figure(figsize=(16, 14)) 
plt.subplot(3, 3, 1)
plt.title('Example of melanocytic nevi (nv)')
plt.imshow(X[np.where(yidx == 0)[0][0], :, :])
plt.subplot(3, 3, 2)
plt.title('Example of melanoma (mel)')
plt.imshow(X[np.where(yidx == 1)[0][0], :, :])
plt.subplot(3, 3, 3)
plt.title('Example of benign keratosis-like lesions (bkl)')
plt.imshow(X[np.where(yidx == 2)[0][0], :, :])
plt.subplot(3, 3, 4)
plt.title('Example of basal cell carcinoma (bcc)')
plt.imshow(X[np.where(yidx == 3)[0][0], :, :])
plt.subplot(3, 3, 5)
plt.title('Actinic keratoses and intraepithelial carcinoma (akiec)')
plt.imshow(X[np.where(yidx == 4)[0][0], :, :])
plt.subplot(3, 3, 6)
plt.title('Example of vascular lesions (vasc)')
plt.imshow(X[np.where(yidx == 5)[0][0], :, :])
plt.subplot(3, 3, 7)
plt.title('Example of dermatofibroma (df)')
plt.imshow(X[np.where(yidx == 6)[0][0], :, :])
plt.show() 


Now, you have data X_train, X_test and labels y_train, y_test. Create a deep learning model (or use transfer learning on a pretrained model eg. VGG16 ), train it properly and make predictions using. You need get extract important metrics as well.

To be clear you need to define the model and the metrics.

Remember that this dataset is in color. That means your images has a "depth". Make sure you define this appropriately when you define your model. Keep in mind the number of classes/targets you have here.
