# ITMAL Exercise

REVISIONS| |
---------| |
2018-0318| CEF, initial.
2018-0321| CEF, synced with MLP moon exercise.
2018-0323| CEF, minor updated and spell checked.
2019-0930| CEF, updated for ITMAL E19.


## Keras Multi-Layer Perceptrons (MLP's) on MNIST-data


### Qa Using a Keras MLP on the MNIST-data

Now, make a Keras `Sequential` model and fit it to the MNIST data, re-using as much of the code form the `mlp_moon.ipynb` as you can.

Then try to change the number of hidden layers and the neurons in each layer, looking for increases in test accuracy via ``score``. 

Publish your best score for your model in Blackboard, see link under L06. We use categorical accuracy for score---eventhough a $F_1$ score could say more. Publish you result like
```
   ITMALGrpXY: score=0.76, a 10-20-30-20-10 MLP, takes looong to train
```
or similar


NOTE: you probably need to scale/normalize the MNIST data before a fit, and no 2D-decision boundaries can be drawn from the 784-dimension MNIST data.

### Import and normalize data from scikit-learn

In [4]:
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.datasets import fetch_openml
from keras.utils.np_utils import to_categorical
from keras.optimizers import Adam

np.random.seed(42)

# Get MNIST data:
X, y = fetch_openml('mnist_784', cache=True, return_X_y=True)
X = X.astype('float32') / 255.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

y_train_binary = to_categorical(y_train)
y_test_binary  = to_categorical(y_test)

assert y.ndim==1
assert y_train_binary.ndim==2
assert y_test_binary.ndim ==2

optimizer = Adam(lr=0.1)

### Build original model from moon notebook

In [5]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Build Keras model
model = Sequential()
model.add(Dense(input_dim=784, units=8, activation="tanh", kernel_initializer="normal"))
model.add(Dense(units=10, activation="softmax"))

model.compile(loss='categorical_crossentropy', 
              optimizer=optimizer, 
              metrics=['categorical_accuracy', 'mean_squared_error', 'mean_absolute_error'])

# Train
VERBOSE     = 0
EPOCHS      = 35

history = model.fit(X_train, y_train_binary, validation_data=(X_test, y_test_binary), epochs=EPOCHS, verbose=VERBOSE)







Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


### Evaluate original model

In [6]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

#print(history.history)
score = model.evaluate(X_test, y_test_binary, verbose=0)

print(f"Test loss:     {score[0]}") # loss is score 0 by definition?
print(f"Test accuracy: {score[1]}")
print(f"All scores in history: {score}")

Test loss:     0.5222885984239124
Test accuracy: 0.8703809523809524
All scores in history: [0.5222885984239124, 0.8703809523809524, 0.020940466293621632, 0.038392488046771006]


Using the original model designed for the moon data changed to take 784 inputs and have 10 outputs, the model have a loss of 0.6 and an accuracy score of 0.794. To optimize these results we tried another model.

In [None]:
#run this cell to see the score plots
N=4
FX=60
FY=4
A=0.4
S=4

# Plot loss
plt.figure(figsize=(FX, FY))
ax = plt.subplot(1, N, 2)
plt.plot(history.history["loss"]    , "b--x", markerfacecolor=(0, 0, 1, A), markersize=S)
plt.plot(history.history["val_loss"], "g-s" , markerfacecolor=(0, 1, 0, A), markersize=S)
plt.legend(loc="best", labels=("loss(train)","loss(val)"))
plt.xlabel("epoch")
plt.ylabel("loss")
plt.title("Loss-vs-epoch plot")
plt.show()

# Plot all metrics + loss
plt.figure(figsize=(FX, FY))
ax = plt.subplot(1, N, 3)
plt.plot(history.history["mean_squared_error"],      "r:x", markerfacecolor=(1, 0, 0, A), markersize=S)
plt.plot(history.history["val_mean_squared_error"],  "r-x", markerfacecolor=(1, 0, 0, A), markersize=S)
plt.plot(history.history["mean_absolute_error"],     "b:o", markerfacecolor=(0, 0, 1, A), markersize=S)
plt.plot(history.history["val_mean_absolute_error"], "b-o", markerfacecolor=(0, 0, 1, A), markersize=S)
plt.xlabel("epoch")
plt.ylabel("error")
plt.xlim((0, EPOCHS))
plt.legend(loc="best", labels=("mean_squared_error(train)",  "mean_squared_error(val)", 
                               "mean_absolute_error(train)", "mean_absolute_error(val)", 
                               "loss(categorical_crossentropy,train)", "loss(categorical_crossentropy,val)"))
plt.title("Error-vs-epoch plot")
plt.show()

# Plot accuracy
plt.figure(figsize=(FX, FY))
ax = plt.subplot(1, N, 4)
plt.plot(history.history["categorical_accuracy"],     "m-x", markerfacecolor=(1, 0, 1, A), markersize=S)
plt.plot(history.history["val_categorical_accuracy"], "m:x", markerfacecolor=(1, 0, 1, A), markersize=S)
ax.set_ylim([0,1])
plt.xlabel("epoch")
plt.ylabel("accuracy")
plt.xlim((0, EPOCHS))
plt.legend(loc="lower right", labels=("categorical_accuracy",))
plt.title("Accuracy-vs-epoch plot")
plt.show()

### Building alternative model
For optimization we tried utilizing a convolutional neural network found: https://medium.com/@mjbhobe/mnist-digits-classification-with-keras-ed6c2374bd0e

This model require a different input (28, 28, 1) so we are going to reshape the data

In [14]:
X, y = fetch_openml('mnist_784', cache=True, return_X_y=True)
X = X.astype('float32') / 255.

X = X.reshape(X.shape[0], 28, 28, 1)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

y_train_binary = to_categorical(y_train)
y_test_binary  = to_categorical(y_test)

In [15]:
 model = Sequential()
# add Convolutional layers
model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same',
                 input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))    
model.add(Flatten())
# Densely connected layers
model.add(Dense(128, activation='relu'))
# output layer
model.add(Dense(10, activation='softmax'))
# compile with adam optimizer & categorical_crossentropy loss function
model.compile( loss='categorical_crossentropy',optimizer=optimizer, 
              metrics=['categorical_accuracy', 'mean_squared_error', 'mean_absolute_error'])
# Train
VERBOSE     = 0
EPOCHS      = 15

history = model.fit(X_train, y_train_binary, validation_data=(X_test, y_test_binary),\
                    epochs=EPOCHS, batch_size=64, verbose=VERBOSE)

### Evaluate original model

In [16]:
#print(history.history)
score = model.evaluate(X_test, y_test_binary, verbose=0)

print(f"Test loss:     {score[0]}") # loss is score 0 by definition?
print(f"Test accuracy: {score[1]}")
print(f"All scores in history: {score}")

Test loss:     14.515496044340587
Test accuracy: 0.09942857142857142
All scores in history: [14.515496044340587, 0.09942857142857142, 0.18011429912135715, 0.18011429912135715]
