## Create the necessary imports
- Tensorflow which is a deep learning framework (uses Keras for cerating neural networks).
- Tensorboard will help us to visuaize the model key values like accuracy or loss. It can help debugging the neural network because we can prevent model memorizing.
- Numpy for high performance matrix manipulation.
- Random for generating a random seed.
- Matplotlib for plotting data and doing a visual exploration.
- mlxtend.plotting for the confusion matrix

In [0]:
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras import datasets, layers, models
import numpy as np
import random
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_confusion_matrix

## Install the tunneling tool.
- This will allow us to follow the statistics of our model just by clicking in the link provided above.
- The statistics used are provided by Tensorboard.


In [0]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

LOG_DIR = './log'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

get_ipython().system_raw('./ngrok http 6006 &')

!curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

## Download the data from MNIST
- Note that in this case Keras can import the data just with a single sentence, as this dataset is considered the hello word of deep learning.

In [0]:
mnist = tf.keras.datasets.mnist

## Separe the dataset into training and test
- Note that in this case the data is already separed just by executing the load_data() method.

In [0]:
(X_train, y_train),(X_test, y_test) = mnist.load_data()

## Preprocess the data:
- As we are working with images, and the pixels from the images are represented in the RGB standard, it can have a value from 0 to 255. But as we remember, we like to standarize the data to convert the values from 0 to 1, in order to improve the neural network performance.
- Check the frequency of the classes
- Do an exploration of the data we are using. We observe that at least to the human eye it's possible to distinguish the number after reducing the image to a 28x28 pixels image, despite of the resolution loss.

In [0]:
X_train, X_test = X_train / 255.0, X_test / 255.0

# Test the size of the training and test set
print('Train size: {} , Test size: {}'.format(len(X_train), len(X_test)))

# Test the frequency of the training set to detect if it was necessary to stratify.
x = np.random.randint(1, 100, 5)
plt.hist(np.array(y_train), bins=10)
plt.ylabel('Frequency')
plt.show()

# Test just a random image to see what we are working with:
random_sample = random.randint(0, len(X_test)-1)
label, pixels =  y_train[random_sample], X_train[random_sample]
img = np.array(pixels).reshape((28,28))
print('Label: {}'.format(label))
plt.imshow(img)

## Choose the model we are using to train our dataset
- We create a Sequential model, meaning that we build our networks as layers. For that we pass an array with the list of layers and sublayers to execute.
- A model needs an optimizer, in order to set some parameters like the learning rate. A too high learning ratio could hinder convergence, and limit the accuracy of our model. Adam is an optimization algorithm, very much used in neural networks, but we could use another or even our own configuration.
- Cross entropy is a metric that can be used to reflect the accuracy of probabilistic predictions. In this case we use sparse because we have more than two classes.
- The activation function defines the kind of output when the neuron compares it's output to the threshold. For example, Relu will create outputs either zero or a lineal function.
- The first we note is the Flatten layer. This is working as numpy.reshape works, that is giving a new shape to the image without changing its data. We choosed images of 28x28 based on the intuition that we can identify the image plotten.
- The next is a Dense layer, with is a regular neural network layer. About the number of nodes to choose is based on a test. We can define variables to iterate over in order to define the best combination between number of hidden layers and the number of nodes per layer. About why the number of layers is a power of two, is just a convention, it just could be 511 instead of 512.
- The dropout is a process to prevent the overfitting by setting to zero a percentage of inputs choosen randomly. It's  a not really intuitive technique that actually gives good results.
- The last layer has to have an exact number of neurons, the number classification we want to obtain. Of course if we want to increase the number of sub-layers before arriving to this point, we can do it, in that case we are free to define the number of neurons we get the best result as soon as we respect that the last layer has to have te exact number of neurons that the number of classes we have.
- Five epochs have been enought to obtain up to 96% accuracy.


In [0]:
tensorboard = TensorBoard('./log')

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=5, callbacks=[tensorboard])
model.evaluate(X_test, y_test)

## Test the results with a random image from the test dataset

In [0]:
seed = random.randint(0, len(X_test)-1)
size_i = len(X_test[seed])
print(size_i)
test_img = np.array([X_test[seed].reshape((28,28))])

predictions = model.predict(test_img)
accuracy = np.amax(predictions) * 100

prediction = np.unravel_index(predictions.argmax(), predictions.shape[1])[0]
print('Prediction: {} - Accuracy {}% - Ground Truth {}'.format(prediction, round(accuracy,2), y_test[seed]))

img = np.array(test_img).reshape((28,28))
plt.imshow(img)

## Let's plot the results in a confusion matrix
Now if we are interested in knowing the prediction versus the real value of all the test examples, not just a single one as we saw, we can build a confusion matrix to plot them all.
- First we obtain the predictions over the X_test examples.
- Then, we pre-process the results obtained in order to plt the confusion matrix.

In [0]:
y_pred = []

predictions_test = model.predict(X_test)

for prediction in predictions_test:
  y = prediction.argmax()
  y_pred.append(y)

In [0]:
sumarize = {}
for t in range(0, 10):
  sum_pred = {}
  
  for p in range(0, 10):
    sum_pred[p] = 0
  
  sumarize[t] = sum_pred

for i, t in enumerate(y_test):
  sumarize[t][y_pred[i]] = sumarize[t][y_pred[i]] + 1
  

sum_array = []
for label in sumarize.values():
  values = []
  for result in label.values():
    values.append(result)
  sum_array.append(values)

plot_confusion_matrix(conf_mat=np.array(sum_array),
                      show_absolute=True,
                      show_normed=False,
                      colorbar=True)