#Introduction
![Keras](https://keras.io/img/logo.png)
[Documentation](https://keras.io/)


In [None]:
import keras
print('Keras version:', keras.__version__)

Keras version: 2.15.0


In [None]:
from keras import backend as K
print('List of available GPUs:', K._get_available_gpus())

AttributeError: module 'keras.backend' has no attribute '_get_available_gpus'

#A "toy example"
**Problem**: We want to detect the handwritten digits.

**Data**: The MNIST (Mixed National Institute of Standards and Technology database) is one of the most popular databases used for training various image processing systems. The images look like this:

![MnistExamples](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)

Each image is 28 x 28 pixels.  Each comes with a label from 0 to 9. For example, images on the first row is labelled as 0, the second as 1, and so on.

Connect the Notebook to a Google Drive account

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Now go to the appropriate folder on your google drive. Note: you may need to change the folder name, depending on where on your drive you have the data files.

In [None]:
import os
os.chdir('drive/MyDrive/ML/Exercise02_Keras')

##Dataset loading
Now we load the dataset (based on the digits database) directly from keras:


In [None]:
from keras.datasets import mnist

# input image dimensions
img_rows, img_cols = 28, 28
num_classes = 10

# read the training and the test data sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print('Shapes:', X_train.shape, X_test.shape, y_train.shape, y_test.shape)
print('Values:', X_train.min(), X_train.max(), y_train.min(), y_train.max())


###Sample visualization
Now let's use the Matplotlib library for viewing a feature vector (digit).

In [None]:
from matplotlib import pyplot as plt

def plot_digit(img):
    plt.imshow(img, cmap=plt.cm.gray_r)

def plot_digit_vector(vector, size=(28, 28)):
    # Reshape the vector as a 2D matrix
    img = vector.reshape(size)
    plot_digit(img)

In [None]:
index = 0
plot_digit(X_train[index])
print(y_train[index])

##Dataset preparation

###Data reshape

In [None]:
print('X_train shape before:', X_train.shape)
X_train = X_train.reshape((-1, img_rows * img_cols))
X_test = X_test.reshape((-1, img_rows * img_cols))
print('X_train shape after:', X_train.shape)

###Data normalization

In [None]:
# preprocessing
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

###Labels one-hot encoding
We convert the y arrays to "one-hot encoding"

In [None]:
import numpy as np
def one_hot(a, n):
  e = np.eye(n) # Identity matrix n x n
  result=e[a.astype(np.uint8)]
  return result

print(y_train[:10])
y_train = one_hot(y_train, num_classes)
y_test = one_hot(y_test, num_classes)
print('\nShape after one-hot encoding', y_train.shape, y_test.shape)
print(y_train[:10])

#Network
Now let us build a Multilayer Perceptron.
To do this, we:

1.   instantiate an empty feed-forward ([sequential](https://keras.io/api/models/sequential/#sequential-class)) network
2.   add two fully-connected ([dense](https://keras.io/api/layers/core_layers/dense/)) layers to the network (i.e. one hidden layer and the output layer)

Other [Keras layers](https://keras.io/layers/convolutional/)

In [None]:
from keras.models import Sequential
from keras.layers import Dense

model=Sequential()

# A hidden layer with sigmoid activation
model.add(Dense(72, activation='sigmoid', input_shape=(img_rows * img_cols,)))

# A output layer with 10 nodes and softmax activation
model.add(Dense(num_classes, activation='softmax'))

print(model.summary())

#Training procedure preparation: loss, optimizer and metrics
Now we will [compile](https://keras.io/getting-started/sequential-model-guide/#compilation) the model, using the categorical cross-entropy loss function:

In [None]:
model.compile(optimizer='sgd',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

#Training procedure

##One batch at a time



In [None]:
model.trainable_weights[-1]

In [None]:
batch_size = 32
result = model.train_on_batch(X_train[:batch_size], y_train[:batch_size])
print(result)

In [None]:
model.trainable_weights[-1]

In [None]:
for i in range(batch_size, len(X_train), batch_size):
  result = model.train_on_batch(X_train[i:i+batch_size], y_train[i:i+batch_size])
  print(i, result)

##On the whole dataset

###Training monitoring
In order to use [TensorBoard](https://www.tensorflow.org/tensorboard), we will create a [TensorBoard callback](https://keras.io/api/callbacks/) for the model training:

Launch tensorboard

In [None]:
LOG_DIR = './logs'
from keras.callbacks import TensorBoard
tensorboard_cb = TensorBoard(log_dir=LOG_DIR)

# run tensorboard in background
! killall tensorboard
%load_ext tensorboard
%tensorboard --logdir ./logs

from tensorboard import notebook
notebook.list() # View open TensorBoard instances

Install and run localtunnel

In [None]:
! curl http://localhost:6006  # Make sure we're able to connect to the TensorBoard service.
! npm install -g localtunnel  # Install localtunnel
! rm url.txt                  # clear the url file
get_ipython().system_raw('lt --port 6006 >> url.txt 2>&1 &') # Tunnel port 6006 (TensorBoard assumed running)
! sleep 3

Get the localtunnel URL

In [None]:
! cat url.txt # Get url

###Fit
Now we train ([fit](https://keras.io/getting-started/sequential-model-guide/#training)) and save the network:

In [None]:
# First we remove the './logs' directory, to start from scratch
!rm -rf ./logs

model.fit(X_train, y_train, epochs=100, validation_split=0.15,
          callbacks=[tensorboard_cb])
model.save('mlp.model')

### Load a previously trained model

In [None]:
from keras.models import load_model

model=load_model('mlp_100.model')
print('Loaded')

## On the whole dataset, with automatic control
Other [Keras callbacks](https://keras.io/api/callbacks/tensorboard/)

###[Early stopping](https://keras.io/api/callbacks/early_stopping/)

In [None]:
from keras.callbacks import EarlyStopping

earlystopping_cb = EarlyStopping(monitor="val_loss",
                               min_delta=0,
                               patience=20)

###[Model checkpointing](https://keras.io/api/callbacks/model_checkpoint/)

In [None]:
from keras.callbacks import ModelCheckpoint

modelcheckpoint_cb = ModelCheckpoint("./checkpoints/",
                                     monitor="val_loss",
                                     verbose=1,
                                     save_best_only=True)

###Fit

In [None]:
model.fit(X_train, y_train, epochs=100, validation_split=0.15,
          callbacks=[earlystopping_cb, modelcheckpoint_cb])

###Load the "best" model

In [None]:
from keras.models import load_model

model = load_model("checkpoints")
print('Loaded')

#Inference

We [predict](https://keras.io/api/models/model_training_apis/#predict-method) the labels of some test set samples

In [None]:
X_test[index].shape

In [None]:
X_test[index:index+1].shape

In [None]:
index = 15
prediction = model.predict(X_test[index:index+1])
print('prediction', np.round(prediction,3))
print('target', y_test[index:index+1])

##Evaluation
Now let us check performance on the test set:

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test)

##Per-class evaluation
Compute the classification matrix:

In [None]:
from sklearn.metrics import confusion_matrix

# This function inverts the one-hot encoding
def de_one_hot(y):
  return np.argmax(y, axis=1)

y_pred=model.predict(X_test)

dey_test = de_one_hot(y_test)
dey_pred = de_one_hot(y_pred)

matrix=confusion_matrix(dey_test, dey_pred)
print(matrix)

In [None]:
dey_test[:10]

##Hard mining
Show some of the errors:

In [None]:
# Indices of the errors:
err=np.argwhere(dey_test!=dey_pred).reshape((-1,))
err[:5]

In [None]:
i=40
print('True:', dey_test[err[i]], 'Predicted:', dey_pred[err[i]])
plot_digit_vector(X_test[err[i]])
y_test[err[i]]

#Exercise
*   fixed 20% validation set
*   Network architecture hyperparameters
    * hidden layers
    * number of neurons
    * activation function
*   Optimizer:
    * type
    * learning rate
    * momentum
*   Interpret results!
*   Compare results

## More challenging problems
* The [Fashion MNIST](https://keras.io/api/datasets/fashion_mnist/) dataset
* The [CIFAR10](https://keras.io/api/datasets/cifar10/) dataset
