# Classifiers
Compare and contrast classification on the MNIST dataset with artificial neural networks (ANNs) and support vector machines (SVMs).

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.optimizers import SGD

## Sorting the data

Download the MNIST dataset, getting features X and labels y:

In [3]:
X, y = fetch_openml('mnist_784', return_X_y=True)
n_classes = 10

Create a NumPy array and analyse the shape of the data:

In [4]:
X = np.array(X / 255)
y = np.array(y, dtype='int')

print(f"X shape: {X.shape}, y shape: {y.shape}")
print(f"Image shape: {X[0].shape}")

X shape: (70000, 784), y shape: (70000,)
Image shape: (784,)


From the shape we can see that we have the full MNIST dataset of 70000 hand written didgets. 

They are yet to be split into a training and testing set and each image has been flattened to a (784,) array rather than the standard (28,28) form.

We will need to create a train, validation and test set before we proceed using SKlearns `train_test_split`.

In [5]:
# train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# remove a validation set from the training data
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)

In [6]:
print(X.shape[0])
print(X_train.shape[0])
print(X_val.shape[0])
print(X_test.shape[0])

70000
44800
11200
14000


Now we can train the data and analyse using the validation set before testing on the test set. This will allow us to minimise overfitting by ensuring our model generalises well.

Later on we shall use cross-validation on the entire training set to ensure maximum generalisation, but for now we shall use the the validation set for this.

## Artificial Neural Networks (ANNs)

### The Perceptron

The original neural network came in the form of a perceptron...

In [6]:
# perceptron

### The Multi-Layer Perceptron (MLP)

Adding more layers to this we get a multi-layer perceptron:
- DNN
- chain rule
- activation functions
- layers
- backpropogation...

In [7]:
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=[784]))
model.add(Dense(100, activation="relu"))
model.add(Dense(10, activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy", optimizer='sgd', metrics=["accuracy"])

In [8]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


In [9]:
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### Deep Neural Networks (DNN)
Tuning:
- activation functions
- optimsier 
- learning rate

In [40]:
model = Sequential()
model.add(Dense(500, activation="relu", input_shape=[784]))
model.add(Dense(400, activation="relu"))
model.add(Dense(300, activation="relu"))
model.add(Dense(200, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(50, activation="relu"))
model.add(Dense(25, activation="relu"))
model.add(Dense(10, activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy", optimizer='sgd', metrics=["accuracy"])

In [41]:
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


- vanishing gradient problem
- autodiff?
- batch normalisation
- momentum
- compare optimisers
- learning rate scheduling
- l1, l2 regularisation
- droupout, monte carlo dropout
- max norm regularisation
- tensorflow graphs / tensorboard
- one-hot encoding MNIST?

In [None]:
# various examples from above

## Convolutional Neural Network (CNN)
- problems with DNNs for images
- convolutions, filters, convolutional layers
- stacking feature maps

In [None]:
# Basic CNN

- memory issues
- pooling layers
- dropout?
- pretrained models?
- fully convolutional network
- mAP?
