## Hello World in Deep Learning (MNIST Classification)

In [None]:
# Imports
import datetime, os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
from tensorflow.keras.models import Sequential, Model, clone_model
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

### Mnist classification in a nutshell

In the next cell we train a simple fully connected neutal network to classify digits (0-9) form the mnist dataset. We use 20'000 images as our train dataset and 10'000 images are in our testset. Finally we plot the learning cuves and look at some predictions and the accuracy on the testset, you see that we already have an accuracy of around 96%.

In [None]:
(x_digits_train, y_digits_train), (x_digits_test, y_digits_test) = mnist.load_data()

# Make train data smaller
np.random.seed(72)
train_data_idx=np.random.choice(range(0,len(x_digits_train)),20000,replace=False)
x_digits_train=x_digits_train[train_data_idx]
y_digits_train=y_digits_train[train_data_idx]

# Preprocess data 
x_digits_train = x_digits_train.astype('float32') 
x_digits_test = x_digits_test.astype('float32')
x_digits_train = x_digits_train/ 255 
x_digits_test = x_digits_test/ 255
y_digits_train = to_categorical(y_digits_train, 10) 
y_digits_test = to_categorical(y_digits_test, 10)
x_digits_train=x_digits_train.reshape((len(x_digits_train),28,28,1))
x_digits_test=x_digits_test.reshape((len(x_digits_test),28,28,1))

# Define model 
model_digits = Sequential()
model_digits.add(Flatten(input_shape=(28,28,1))) 
model_digits.add(Dense(500, activation='relu')) 
model_digits.add(Dense(50, activation='relu')) 
model_digits.add(Dense(10, activation='softmax')) 

# Compile model
model_digits.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# train model
history=model_digits.fit(x_digits_train, y_digits_train,
                         validation_data=(x_digits_test, y_digits_test),
                         batch_size=128, epochs=10, verbose=1)

In [None]:
# summarize history for accuracy
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
plt.plot(history.history['accuracy']) 
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
plt.subplot(1,2,2)
plt.plot(history.history['loss']) 
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')
plt.show()

In [None]:
# prediction of an image of the test set
i=np.random.choice(range(0,len(x_digits_test))) 
plt.imshow(x_digits_test[i,:,:,0],cmap="gray")
pred=model_digits.predict(x_digits_test[i:i+1]) 
print("predicted probabilities",pred)
print("max probability",np.max(pred))
print("predicted label",np.argmax(pred))
print("true label",np.argmax(y_digits_test[i]))


In [None]:
# evaluation on the test set
model_digits.evaluate(x_digits_test,y_digits_test)

### Mnist classification in more detail

Now let's look at the code above in more detail.
First we load the mnist dataset and look at the size of the train and test dataset. We have 60'000 train images and 10'000 test iamges. The images are greyscale images and the size is 28x28 pixels.

In [None]:
#Load pre-shuffled MNIST data into train and test sets
(x_digits_train, y_digits_train), (x_digits_test, y_digits_test) = mnist.load_data()

In [None]:
print(x_digits_train.shape)
print(x_digits_test.shape)

print(y_digits_train.shape)
print(y_digits_test.shape)


In the next few cells we make the train dataset smaller by sampling 10'000 random images of the 60'000. We look at the distribution of the labels in both datasets and you can see that both dataset are more or less balanced.

In [None]:
np.random.seed(72)
train_data_idx=np.random.choice(range(0,len(x_digits_test)),10000,replace=False)
x_digits_train=x_digits_train[train_data_idx]
y_digits_train=y_digits_train[train_data_idx]
print(x_digits_train.shape)
print(y_digits_train.shape)


In [None]:
np.unique(y_digits_train,return_counts=True)

In [None]:
np.unique(y_digits_test,return_counts=True)

Let's look at the pixelvalues of a train image, you can see that the values are between 0 and 255.  We normalize the values to be in the range from 0 to 1, by values with 255. If you look at the labels, you see that the lables are values form 0 to 9, to train a neural network we need to transform it to the so called one hot encoding. 

In [None]:
#print the pixel values of the first "image"
print(x_digits_train[0])


In [None]:
#print the label of the first "image"
print(y_digits_train[0])


In [None]:
# Preprocess data (normalize to be in the range [0,1])
x_digits_train = x_digits_train.astype('float32')
x_digits_test = x_digits_test.astype('float32')
x_digits_train = x_digits_train/ 255
x_digits_test = x_digits_test/ 255

In [None]:
# Preprocess class labels -- one hot encoding
y_digits_train = to_categorical(y_digits_train, 10)
y_digits_test = to_categorical(y_digits_test, 10)

In [None]:
#print the pixel values of the first "image"
# now the values are form 0 to 1
print(x_digits_train[0])


In [None]:
#print the label of the first "image"
#the 2 form above, one hot encoded
print(y_digits_train[0])


In [None]:
print(x_digits_train.shape)
print(x_digits_test.shape)

print(y_digits_train.shape)
print(y_digits_test.shape)


Let's plot a few images to get a feeling for the dataset and see how hard the task is.
We plot the first 9 images of the training dataset.

In [None]:
plt.figure(figsize=(10,10))
for i in range(0,9):
    sample_img = x_digits_train[i];
    # plot the image
    plt.subplot(3,3,i+1)
    plt.imshow(sample_img,cmap="gray")
    plt.title ("Label: %s"%np.where(y_digits_train[i]));

In the next few cells we reshape the train and test dataset to be a 4 dim array. We have grayscale images and only one channel so we add one channel in the last dim. We define a neural network with keras, it has two fully connected layers with 500 and 50 nodes with the relu activation function. The last layer has 10 nodes and the softmax activation function, so we can interpret the output as probability for the predicted label. 

In [None]:
x_digits_train=x_digits_train.reshape((len(x_digits_train),28,28,1))
x_digits_test=x_digits_test.reshape((len(x_digits_test),28,28,1))

print(x_digits_train.shape)
print(x_digits_test.shape)


In [None]:
# Define model architecture
model_digits = Sequential()

model_digits.add(Flatten(input_shape=(28,28,1)))

model_digits.add(Dense(500, activation='relu'))
model_digits.add(Dense(50, activation='relu'))
model_digits.add(Dense(10, activation='softmax'))
 
# Compile model
model_digits.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])



In [None]:
model_digits.summary()

In the next two cells we evaluate the untrained model. As you can see the predictions of the untrained model are completely random, we have an accuracy of around 10%. If you look at single image preditions, you see that the predictions are random and wrong for most of the time. This will change when we train the model with our training dataset. To visualize the trainig process, the computational graph and development of the weights you can use Tensorboard.

In [None]:
# evaluation of the untrained model
model_digits.evaluate(x_digits_test,y_digits_test)
# you get the loss "categorical_crossentropy" and the accuracy 

In [None]:
# prediction of an image with the untrained model
i=np.random.choice(range(0,len(x_digits_test)))
plt.imshow(x_digits_test[i,:,:,0],cmap="gray")
pred=model_digits.predict(x_digits_test[i:i+1])
print("predicted probabilities",pred)
print("max probability",np.max(pred))
print("predicted label",np.argmax(pred))
print("true label",np.argmax(y_digits_test[i]))


In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [None]:
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

# train the model
history=model_digits.fit(x_digits_train, y_digits_train,
                         validation_data=(x_digits_test,y_digits_test),
                         batch_size=128, epochs=10, verbose=1,
                         callbacks=[tensorboard_callback])

In [None]:
# Open Tensorboard

# 1) Local installation 
#  - in your anaconda env, in your project dir, type"tensorboard --logdir logs" 
#  - open browser and goto http://localhost:6006

# 2) Google Colab 
#  - show tensorboard inline
%tensorboard --bind_all --logdir logs

In [None]:
# summarize history for accuracy
plt.figure(figsize=(14,6))
plt.subplot(1,2,1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
plt.subplot(1,2,2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')
plt.show()

In [None]:
# evaluation of the trained model
model_digits.evaluate(x_digits_test,y_digits_test)
# you get the loss "categorical_crossentropy" and the accuracy 

In [None]:
# prediction of an image with the trained model
i=np.random.choice(range(0,len(x_digits_test)))
plt.imshow(x_digits_test[i,:,:,0],cmap="gray")
pred=model_digits.predict(x_digits_test[i:i+1])
print("predicted probabilities",pred)
print("max probability",np.max(pred))
print("predicted label",np.argmax(pred))
print("true label",np.argmax(y_digits_test[i]))


In the next cells we calculate the accuracy on the test dataset and look at the confusion matrix. We have an accuracy of around 95% which is already very good

In [None]:
from sklearn.metrics import confusion_matrix

predict=model_digits.predict(x_digits_test) 
predict_classes=np.argmax(predict,axis=1)
true_classes=np.argmax(y_digits_test,axis=1)
confusion_matrix(true_classes,predict_classes)

In [None]:
print(np.average(true_classes==predict_classes)) #this should again be accuracy

### Now it's your turn

Train the same neural network with fewer and more training data. train with 100,1000 and the full training data. Look at the learning curves of each model and evaluate the performace on the test dataset, what do you observe? Play around with the nr of the hidden layer and with the nr of nodes. What do you observe?  
*Hint: You might need to train for more than just 10 epochs*

In [None]:
# your code here