# mlcrashcourse - Deep Learning Practical

In this practical, we explore the use of a neural network for convolution.  

## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from random import randint
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score
from tensorflow.keras.datasets import fashion_mnist

import tensorflow.keras.backend as K
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (InputLayer, Dense, BatchNormalization, Dense, Dropout, 
                                     Conv2D, Flatten, MaxPool2D)
from tensorflow.keras.metrics import categorical_accuracy, categorical_crossentropy
from tensorflow.keras.optimizers import Adam

%load_ext autoreload
%autoreload 2
%matplotlib inline

## Defining the problem

Link to dataset: https://www.kaggle.com/zalando-research/fashionmnist

In this practical we will be using the **Fashion MNIST** dataset. The task is to classify the image into the following different fashion classes/labels.

> Basically, given an image of a boot, the model should be able to tell me that that is a boot.

| No. Label | Text Label |
| --- | --- |
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |


This is a supervised problem because there is an expected output we want to obtain from the model.

## Load the Data
Now we proceed with loading the data.
Thankfully, Keras provides use with a simple way to access the data. Most of work is already done for us by Keras:

1. Reading the images into numpy arrays
2. Converting the labels into integers
3. Splitting the dataset into train and test sets for model evaluation later

In [None]:
label_map = {
    0: "T-shirt/top",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle boot"
}

# load the data using keras datasets
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
print(f"{len(train_images)} training examples, {len(test_images)} test examples")

Take a look at a random image and its label

In [None]:
# TODO: pick a random image and show both the image and the label (=5 lines)


## Prepare the data
Neural Networks are fussy with the data that they take in, so we need to do some preprocessing:
1. Apply Feature Scaling to the images
2. One hot encode the labels

### Feature Scaling
Feature scaling is required to ensure each feature equally contribute to model, so that something like this wont happen
![image.png](attachment:image.png)

In [None]:
# TODO: create the feature scaler (StandardScaler) (=1 line)
scaler = 

# Perform feature scaling with standard scaler
# np.reshape() is needed to ensure that the np arrays have the correct shape
n_images = len(train_images)

flat_train_images = np.reshape(train_images, (n_images, -1))

# we need to tell the scaler about what data it will be dealing with
scaler.fit(flat_train_images)

# Define a function to scale features
def scale_features(images):
    flat_images = np.reshape(images, (len(images), -1))
    flat_features = scaler.transform(flat_images)
    features = np.reshape(flat_features, (len(images), 28, 28, 1))
    return features 

# Scale both train and test images
train_features = scale_features(train_images)
test_features = scale_features(test_images)

Before:

In [None]:
plt.imshow(train_images[idx], cmap='gray')

After scaling:

In [None]:
plt.imshow(train_features[idx].reshape(28, 28), cmap='gray')

### One Hot Encoding
One hot encoding converts a number into a vector:
9 -> [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]

This is required because label 4 should not be infered by the model to be "greater than" the label 0.

They are different but not "more than" or "less than". (removes the magnitude)

In [None]:
# create the encoder OneHotEncoding (=1 line)
encoder = 

# we need to tell the encoder about what data it will be dealing with
encoder.fit(np.reshape(train_labels, (len(train_labels), 1)))

# convert labels into one hot encoding
def encode_labels(labels):
    labels = np.reshape(labels, (len(labels), 1))
    features = encoder.transform(labels)
    return features
                        
train_one_hot_labels = encode_labels(train_labels)
test_one_hot_labels = encode_labels(test_labels)

Before:

In [None]:
train_labels[idx]

After encoding:

In [None]:
train_one_hot_labels[idx].toarray()

### Building the Model
Now we build the Neural Network model that we will train to classify fashion images.

A function that adds a convolution block to your model is provided so you don't have to worry about that:

In [None]:
# Add a convolution block to the given sequential model if enabled
# layers specify how many convolution layers to use
# the scale parameter allows us to scale up the convolution layers
def add_conv_block(model, layers, scale, dropout=0.2):
        for n_base_filters in range(2, layers * 2 + 1, 2):
            model.add(Conv2D(int(n_base_filters * scale), (3, 3),
                             padding="same",activation="relu"))
            model.add(Conv2D(int(n_base_filters * scale), (3, 3),
                             padding="same",activation="relu"))
            model.add(Conv2D(int(n_base_filters * scale), (3, 3), 
                             strides=(2,2), padding="same",activation="relu"))        
            model.add(BatchNormalization())
            model.add(Dropout(dropout))
            
        model.add(Flatten())


Instead of directly building the model, we create a class that exposes its hyperparameters. This will make hyperparameters tuning later easier.

In [None]:
# Represents a neural network model
class NNModel:
    # Create a model with the given hyperparametersz
    def __init__(self, 
                 n_layers,
                 learning_rate,
                 n_units=64,
                 conv_blocks=0,
                 conv_scale=0,
                 display_summary=False):
        
        self.input_shape = (28, 28, 1)
        self.n_units = n_units
        self.n_layers = n_layers
        self.learning_rate = learning_rate
        self.conv_scale = conv_scale
        self.conv_blocks = conv_blocks
        
        self.backend_model = self.build()
        if display_summary: self.backend_model.summary()
        
    # Fit the model to the given data
    # x - features, y - one hot encoded labels
    def fit(self, train_x, train_y, valid_data, n_epochs, batch_size=64):
        self.backend_model.fit(train_x, train_y,
                               validation_data=valid_data,
                               epochs=n_epochs,
                               batch_size=batch_size)

    # Generate predictions using the given features
    def predict(self, input_x):
        predict_probs = self.backend_model.predict(input_x, 
                                                 batch_size=64)
        predictions = np.argmax(predict_probs, axis=-1)
        return predictions
    

    # Build the model
    def build(self):
        K.clear_session()
        
        # Build model architecture (=12 lines)
   
        # Build model
        model.compile(optimizer=Adam(lr=self.learning_rate),
                      loss=categorical_crossentropy,
                      metrics=[categorical_accuracy])
        
        return model
    
    

# Training the model
Lets build a model with our model class and train it.
As with most machine learning algorithms, there are hyperparameters to tune:

| Hyperparameter | Description |
| --- | --- |
| n_layers | The number of dense/fully connnected layers to use in the model |
| n_units | The number of neurons to use each  dense/fully connnected layers |
| conv_blocks | The number of convolution blocks use in the model |
| conv_scale | The size of the convolution block used |
| learning_rate | The learning rate during training |

We train for 3 epochs, which means that we tell the model to learn from the data by looking at all the data 3 times.
> In real world, we train for way more epochs (30-200)


In [None]:
model = NNModel(n_layers=1,
                n_units=16,
                conv_blocks=0,
                conv_scale=12,
                learning_rate=3e-3, 
                display_summary=True)

Train model on the training data

In [None]:
hist = model.fit(train_features, train_one_hot_labels, 
                 valid_data=(test_features, test_one_hot_labels),
                 n_epochs=3)

## Evaluating the model
Once we have have trained the model, we need to evaluate how its doing. We will use the `categorical_accuracy` metric as a yard stick of evaluating how the model is doing 


In [None]:
predictions = model.predict(test_features)
accuracy = accuracy_score(predictions, test_labels)
print(f"The model is {accuracy * 100.0}% accurate")

## Hyperparameter Tuning
Now we make changes to the hyperparameters of models and evaluate if the model improves (Hyperparameter tuning), trying to get models with higher accuracy.

Lets try using more layers:

In [None]:
model = NNModel(n_layers=2,
                n_units=16, 
                conv_blocks=0,
                conv_scale=12,
                learning_rate=1e-3,
                display_summary=True)
                
hist = model.fit(train_features, train_one_hot_labels, 
                 valid_data=(test_features, test_one_hot_labels),
                 batch_size=64,
                 n_epochs=3)

predictions = model.predict(test_features)
accuracy = accuracy_score(predictions, test_labels)
print(f"The model is {accuracy * 100.0}% accurate")

Hmm doesn't really help how about some more units?

In [None]:
model = NNModel(n_layers=1,
                n_units=64, 
                conv_blocks=0,
                conv_scale=12,
                learning_rate=1e-3,
                display_summary=True)
                
hist = model.fit(train_features, train_one_hot_labels, 
                 valid_data=(test_features, test_one_hot_labels),
                 batch_size=64,
                 n_epochs=3)

predictions = model.predict(test_features)
accuracy = accuracy_score(predictions, test_labels)
print(f"The model is {accuracy * 100.0}% accurate")

How about we add some convolution layers?

In [None]:
model = NNModel(n_layers=1,
                n_units=64, 
                conv_blocks=1,
                conv_scale=12,
                learning_rate=1e-3,
                display_summary=True)
                
hist = model.fit(train_features, train_one_hot_labels, 
                 valid_data=(test_features, test_one_hot_labels),
                 batch_size=64,
                 n_epochs=3)

predictions = model.predict(test_features)
accuracy = accuracy_score(predictions, test_labels)
print(f"The model is {accuracy * 100.0}% accurate")

Convolution layers gave us ~4% accuracy increase!

Since we are on a time limit today thats all that we will will try. But i suggest that you try other values.

>Neural Networks are able scale to large amount of data, but require large amount of computing resources use effectively