# Part 3. Multi Level Perceptron from Scratch 

In the previous section, we attempted to train a simple 2 layer MLP on Keras. Keras, being a high level abstracted framework, hides the details behind the model and simplifies the process. We will now try to build our own 2 layer MLP, purely out of NumPy, which will unveil the hidden components of neural network training. Similar to past from-scratch attempts, we will start by creating a class.

## 1. Create a class `MLPTwoLayers`

- One of the starting points to take care of while building your network is to initialize your weight matrix correctly. Consider appropriate sizes for your input, hidden and output layers - your __init__ method should take in the params `input_size`, `hidden_size`, `output_size`. Then, using these variables, initialise the weights for the hidden layers `w1`, `w2`, `b1`, and `b2`.

In [1]:
# %load_ext autoreload
# %autoreload 2
# from src.mlp import MLPTwoLayers as MLP

In [2]:
# mlp = MLP(3072, 100, 10)

## 2. Create a `forward ` method, which takes in a set of features
- Create the `forward` method to calculate the predicted class probabilities of an image. This is known as a forward pass.  You should wrap the hidden layer with a sigmoid function (or others if you prefer), and the output layer with a softmax function.

In [3]:
# # import your data preparation methods here, ensure your data is randomized
# preds = mlp.forward(X[0])
# preds

## 3. Create a `loss` method, which takes in the predicted probability and actual label
- Compute the loss function: This is a function of the actual label y and predicted label y. It captures how far off our predictions are from the actual target. The objective is to minimize this loss function. 

In [4]:
# train_loss = mlp.loss(preds, y[0])
# train_loss

## 4. Create a `backward` method, which takes in the loss
- Using the backpropogation algorithm, execute the backward pass and adjust the weights and bias accordingly
- You can use a default learning rate of 1e-3 for this exercise. If you would like do otherwise, you can try to implement it as a parameter.

In [5]:
# mlp.backward(train_loss)

Now, we can try training the model.

In [6]:
# # initial attempt at training
# test_loss = 0
# for i in range(3000, 3500):
#     test_loss += mlp.loss(mlp.forward(X[i]), y[i])
# print(test_loss / 500)

In [7]:
# for i in range(3000):
#     if i % 100:
#         print('Item {}'.format(i))
#     mlp.backward(mlp.loss(mlp.forward(X[i]), y[i]))

Finally, re-test your model.

In [8]:
# test_loss = 0
# for i in range(3000, 3500):
#     test_loss += mlp.loss(mlp.forward(X[i]), y[i])
# print(test_loss / 500)

Hopefully, you see that your test loss has decreased after training!

# Part 4. Convolutional Neural Network (CNN)
Please attempt this section only after you have completed the rest!

In the previous part, you implemented a multilayer perceptron network on CIFAR-10. The implementation was simple but not very modular since the loss and gradient were computed in a single monolithic function. This is manageable for a simple two-layer network, but would become impractical as you move to bigger models. Ideally, you want to build networks using a more modular design so that you can implement different layer types in isolation and then snap them together into models with different architectures.

In this part of exercise, you will implement a close to state-of-the-art deep learning model for CIFAR-10 with the Keras Deep Learning library. In addition to implementing convolutional networks of various depth, you will need to explore different update rules for optimization, and introduce **Dropout** as a regularizer, **Batch Normalization** and **Data Augmentation** as a tool to more efficiently optimize deep networks.

We saw models performing >98% accuracy on `CIFAR-10`, while most state-of-the-art models cross the 97% boundary. In general, models beyond **95%** are fairly decent.

## Reading resources

[Dropout](http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?utm_content=buffer79b43&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) is a regularization technique for overfitting in neural networks by preventing complex co-adaptations on training data. It is a very efficient way of performing model averaging with neural networks.

[Batch Normalization](https://pdfs.semanticscholar.org/c1ba/ed41e4bc9401b1b2ec8ef55ba45543f7a1a3.pdf) is a technique to provide any layer in a neural network with inputs that are zero mean/unit variance.

[Data Augmentation](https://medium.com/nanonets/how-to-use-deep-learning-when-you-have-limited-data-part-2-data-augmentation-c26971dc8ced) means increasing the number of data points. In terms of images, it may mean that increasing the number of images in the dataset.

- Enhancing the performance of you existing model in part 2 with convolutional neural networks
- The implementation of model should be done by using Keras (or PyTorch)
- Train your designed model 
- Improve performance with algorithm tuning: Dropout, Batch normalization, Data augmentation and other optimizers

In [22]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import seaborn as sns
from pathlib import Path
import tarfile
import pickle
import requests
from sklearn.metrics import accuracy_score
from keras.callbacks import EarlyStopping
from keras.optimizers import SGD
from keras.layers import Dense, Conv2D, Flatten, Activation, BatchNormalization, Dropout
from keras.models import Sequential
from keras.utils import to_categorical
from keras.layers.normalization import BatchNormalization
import pandas as pd

#### Initial Neural Net

In [10]:
data_folder = Path("data/raw/")
file_to_open = data_folder / "cifar-10-python.tar.gz"

def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

def getRawDictionary(fileName):
    batch = data_folder / "cifar-10-batches-py" / fileName
    data = unpickle(batch)
    return data

train_imgs = []
train_labels = []
test_imgs = []
test_labels = []

for i in range(1,6):
    data_batch = getRawDictionary("data_batch_" + str(i))
    if i == 1:
        train_imgs = data_batch[b'data']
        train_labels = np.asarray(data_batch[b'labels'])
    else:
        train_imgs = np.concatenate((train_imgs, data_batch[b'data']), axis=0)
        train_labels = np.concatenate((train_labels, np.asarray(data_batch[b'labels'])), axis=0)
#         train_imgs = numpy.append(train_imgs, data_batch[b'data'], axis=0)
#         train_labels = numpy.append(train_labels, np.asarray(data_batch[b'labels']), axis=0)
    
test_batch = getRawDictionary("test_batch")
test_imgs = test_batch[b'data']
test_labels= np.asarray(test_batch[b'labels'])

label_dict = getRawDictionary("batches.meta")
label_names = label_dict[b'label_names']

In [11]:
train_labels_encoded = to_categorical(train_labels)
test_labels_encoded = to_categorical(test_labels)

In [12]:
train_imgs = train_imgs / 255.0
test_imgs = test_imgs / 255.0

def toGrayScale(imgs):
    R = imgs[:,0:1024]
    G = imgs[:,1024:2048]
    B = imgs[:,2048:]

    imgs_grey = (R + G + B)/3
    return imgs_grey

train_imgs_grey = toGrayScale(train_imgs)
test_imgs_grey = toGrayScale(test_imgs)

In [13]:
n_cols = train_imgs_grey.shape[1]

model = Sequential()
model.add(Dense(100, activation='relu',input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

early_stopping_monitor = EarlyStopping(patience=3)

In [14]:
model.fit(train_imgs_grey, train_labels_encoded, epochs=40, validation_split=0.3, callbacks=[early_stopping_monitor])

Train on 35000 samples, validate on 15000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40


<keras.callbacks.History at 0x25a9786cc18>

In [15]:
model.evaluate(test_imgs_grey, test_labels_encoded, batch_size=32)



[1.7296526000976562, 0.3879]

#### Baseline CNN

In [16]:
def tensor_reshape(df):
    n_images = len(df.index)
    img_array = (df.iloc[:, 0:1024]).values
    img_tensor = img_array.reshape(n_images, 32,32,1)
    return img_tensor

train_imgs_grey_reshape = tensor_reshape(pd.DataFrame(train_imgs_grey))
test_imgs_grey_reshape = tensor_reshape(pd.DataFrame(test_imgs_grey))

In [17]:
cnn_model = Sequential()
cnn_model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(32,32,1)))
cnn_model.add(Conv2D(32, kernel_size=3, activation='relu'))
cnn_model.add(Flatten())
cnn_model.add(Dense(10, activation='softmax'))

cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

cnn_model.fit(train_imgs_grey_reshape, train_labels_encoded, epochs=40, validation_split=0.3, callbacks=[early_stopping_monitor])

Train on 35000 samples, validate on 15000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40


<keras.callbacks.History at 0x25bf38ea940>

In [18]:
cnn_model.evaluate(test_imgs_grey_reshape, test_labels_encoded, batch_size=32)



[1.3840419342041015, 0.5593]

#### CNN with normalisation

In [19]:
early_stopping_monitor_norm = EarlyStopping(patience=5)

cnn_model_norm = Sequential()
cnn_model_norm.add(Conv2D(64, kernel_size=3, input_shape=(32,32,1)))
cnn_model_norm.add(BatchNormalization())
cnn_model_norm.add(Activation("relu"))
cnn_model_norm.add(Conv2D(32, kernel_size=3))
cnn_model_norm.add(BatchNormalization())
cnn_model_norm.add(Activation("relu"))
cnn_model_norm.add(Flatten())
cnn_model_norm.add(Dense(10, activation='softmax'))

cnn_model_norm.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

cnn_model_norm.fit(train_imgs_grey_reshape, train_labels_encoded, epochs=40, validation_split=0.3, callbacks=[early_stopping_monitor_norm])

Train on 35000 samples, validate on 15000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40


<keras.callbacks.History at 0x25bfa13ae80>

In [20]:
cnn_model_norm.evaluate(test_imgs_grey_reshape, test_labels_encoded, batch_size=32)



[1.9523200534820557, 0.5113]

#### CNN with Dropout

In [23]:
early_stopping_monitor_dropout = EarlyStopping(patience=5)

cnn_model_dropout = Sequential()
cnn_model_dropout.add(Conv2D(64, kernel_size=3, input_shape=(32,32,1)))
cnn_model_dropout.add(Dropout(0.2))
cnn_model_dropout.add(Activation("relu"))
cnn_model_dropout.add(Conv2D(32, kernel_size=3))
cnn_model_dropout.add(Dropout(0.2))
cnn_model_dropout.add(Activation("relu"))
cnn_model_dropout.add(Flatten())
cnn_model_dropout.add(Dense(10, activation='softmax'))

cnn_model_dropout.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

cnn_model_dropout.fit(train_imgs_grey_reshape, train_labels_encoded, epochs=40, validation_split=0.3, callbacks=[early_stopping_monitor_dropout])

Train on 35000 samples, validate on 15000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40


<keras.callbacks.History at 0x25c15b5a7f0>

In [24]:
cnn_model_dropout.evaluate(test_imgs_grey_reshape, test_labels_encoded, batch_size=32)



[1.3496148273468018, 0.5766]

#### CNN with Batch Normalisation and Dropout

In [25]:
early_stopping_monitor_combi = EarlyStopping(patience=5)

cnn_model_combi = Sequential()
cnn_model_combi.add(Conv2D(64, kernel_size=3, input_shape=(32,32,1)))
cnn_model_combi.add(Dropout(0.2))
cnn_model_combi.add(BatchNormalization())
cnn_model_combi.add(Activation("relu"))
cnn_model_combi.add(Conv2D(32, kernel_size=3))
cnn_model_combi.add(Dropout(0.2))
cnn_model_combi.add(BatchNormalization())
cnn_model_combi.add(Activation("relu"))
cnn_model_combi.add(Flatten())
cnn_model_combi.add(Dense(10, activation='softmax'))

cnn_model_combi.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

cnn_model_combi.fit(train_imgs_grey_reshape, train_labels_encoded, epochs=40, validation_split=0.3, callbacks=[early_stopping_monitor_combi])

Train on 35000 samples, validate on 15000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40


<keras.callbacks.History at 0x25c270e1eb8>

In [26]:
cnn_model_combi.evaluate(test_imgs_grey_reshape, test_labels_encoded, batch_size=32)



[1.8176016143798828, 0.5277]

Ref:
https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/
<br>
https://towardsdatascience.com/dont-use-dropout-in-convolutional-networks-81486c823c16
<br>
https://towardsdatascience.com/building-a-convolutional-neural-network-cnn-in-keras-329fbbadc5f5