# Image classification
In this notebook we are solving an image classification problem using Densly coupled neural net. 

The data we are going to work with is the MNIST dataset which contains 70000 handwritten numbers evenly distributed between 0-9. 

The aim is to show the concept of neural networks and give the user some basic knowledge of how to work with artificial neural networks in tensorflow.

If you are working in colabs you can try to wotk on a GPU by click Edit -> Notebook Settings -> Hardware acceleration



In [None]:
# First we load all libraries that are needed for the notebook
# There are also some functions created to make life easier later on
# If there are some warnings try to excecute the cell once again
import tensorflow as tf
import numpy as np

from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
# Some common filses used in laboration

import numpy as np
from matplotlib import pyplot as plt
from matplotlib.colors import LogNorm
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.utils.multiclass import unique_labels

def plot_confusion_matrix(y_true, y_pred, classes,
                          normalize=False,
                          title=None,
                          cmap=plt.cm.Blues,
                          fig_size=10):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if not title:
        if normalize:
            title = 'Normalized confusion matrix'
        else:
            title = 'Confusion matrix, without normalization'

    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    # Only use the labels that appear in the data
    classes = classes[unique_labels(y_true, y_pred)]
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]*100
    else:
        cm = cm
    

    fig, ax = plt.subplots(figsize=(fig_size,fig_size))
    im = ax.imshow(cm,norm=LogNorm(), cmap=cmap,
                interpolation='nearest')
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ylim=ax.get_ylim()
    ax.set(
        ylim=ylim,
        xticks=np.arange(cm.shape[1]),
        yticks=np.arange(cm.shape[0]),
        # ... and label them with the respective list entries
        xticklabels=classes, 
        yticklabels=classes,
        title=title,
        ylabel='True label',
        xlabel='Predicted label')

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
              rotation_mode="anchor")

    # Loop over data dimensions and create text annotations.
    fmt = '.1f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return ax

def plot_errors(x_test, y_test, output,n_max=600):
    """ Function the reporting in a script in order to
    breake the script if it is estimated to take to long time"""
    n_not_corr = np.sum(output != y_test )
    n = int(np.ceil(np.sqrt(n_not_corr)))
    j = 0
    if n_not_corr > n_max:
        print('more then '+str(n_max),n**2)
        return
    f, ax = plt.subplots(n, n, figsize=(25, 25))
    ax = ax.flatten()
    
    for i in range(np.shape(output)[0]):
            if output[i]!=y_test[i]:
                ax[j].set_title(str(y_test[i]) + ' as ' + str(output[i]))
                ax[j].imshow(x_test[i,:,:,0], cmap='gray')
                ax[j].axis('off')
                j+=1
    for x in ax.ravel():
        x.axis("off")
    plt.subplots_adjust(bottom=-0.09, wspace=0.03)
    plt.show()

### Load the dataset
Then take a first look of the dataset by looking at the shape which gives us an idaea of the structure and size. We also show the dataset distribution. 



In [None]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print('Shape before reshape:',np.shape(x_train))
x_train = x_train.reshape(-1,28, 28, 1)   #Reshape for CNN !!
x_test = x_test.reshape(-1,28, 28, 1)     #The added dimension is to account for RRG images

##
print('Train data:',np.shape(x_train))
print('Test data:', np.shape(x_test))

y_count = np.bincount(y_train)
ii = np.nonzero(y_count)[0]
plt.bar(ii, y_count)
plt.xticks(ii)
plt.title('Distribution of numbers')
plt.show()

### Exercise

In [None]:
# As exercise look at the distribution of the test dataset
# Write the code here...

# Look at the dataset
It is always good to get a good overview of the dataset and explore it a little. In this case when images are going to be classified a good way is by looking at the images. If you don't trust the dataset you should go through the images and see if the classification is correct. 

Remember that there is a huge job done classifying all images. If you are going to do a classification task youself you first need to collect or create the images and then classify them. 

Note that some of the numbers are quite hard to classify even for the human eye. 

In [None]:
f,ax=plt.subplots(10, 10, figsize=(10, 10))
ax=ax.flatten()
for i in range(100):
    ax[i].imshow(x_test[i, :, :, 0], cmap='gray')
    ax[i].set_title(y_test[i])
[axi.set_axis_off() for axi in ax.ravel()]
plt.subplots_adjust(bottom=-0.09, wspace=0.03)
plt.show()

### Exercise
Look at the letters above and think a little about what numbers can be mixed up by a computer model.

In [None]:
# Scale the dataset to have values between 0 and 1
if np.max(x_train)>1:
    x_train = x_train / 255.0
    x_test = x_test / 255.0
else:
    print('Already scaled once')

### Create a first artificial neural network
The network below is a simple network with a first layer that flattens the input image to the same amount of neurons as pixles in the image. Then there is a densly coupled layer with 64 neurons followed by a dropout layer. At the end there is a classification layer.


In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
  tf.keras.layers.Dense(64, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.3),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.3),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

history=model.fit(x_train, y_train, validation_data=(x_test,y_test),epochs=20,batch_size=512)
model.evaluate(x_test, y_test)

In [None]:
# summarize history for accuracy
f,ax=plt.subplots(1,2,figsize=(12,4))
ax[0].plot(history.history['acc'], label='train')
ax[0].plot(history.history['val_acc'], label='test')
ax[0].set_title('Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend()

# summarize history for loss
ax[1].plot(history.history['loss'], label='train')
ax[1].plot(history.history['val_loss'], label='test')
ax[1].set_title('Model loss / cost')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[0].legend()
plt.show()


### Exercise
Run the code above without dropout layers. How does it change the model accuracy? Why?

In [None]:
# Define a Neural network without dropout layers and run the fitting. Then plot ther result in this cell
# 



# Evaluate the result
One way to evaluate the result is throug the confusion matrix.

In [None]:
class_names= np.unique(y_test)
predicted = model.predict_classes(x_test, verbose=1, batch_size=512)
plot_confusion_matrix(y_test, predicted, classes=class_names, normalize=True,
                      title='Normalized confusion matrix',fig_size=10)
plt.show()
# plot_confusion_matrix
print(classification_report(y_test, predicted, digits = 3))

### Exercise 
By looking at the confusion matrix above what letters are most common to mix up? 



In [None]:
plot_errors(x_test, y_test, predicted, 600)

# Using a deeper net
We can experiment with different configurations of the artificial neural network to get some improvements. The parameters that can be modified within each layer are:
* Activation function (relu, sigmoid, etc.)
* Number of neuron in each layer (32,64,128,256)
* Fraction of Dropouts in each layer (0.1, 0.2, 0.3)

Resulting in $2*5*3=30$ combinations

Then we might want to evaluate the number of layers
* Number of layers (1, 2, 3, 5)

Note the quickly rise of configurations to run. Parameter optimization is an important question to have in mind!

### Exercise
Edit this network and run it as you like and compare the output with the previous

In [None]:
model1 = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
  # Define some layers and check the difference
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model1.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy','acc'])

history1=model1.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=20, batch_size=256)


In [None]:
# summarize history for accuracy
f,ax=plt.subplots(1,2,figsize=(12,4))
ax[0].plot(history1.history['acc'], label='train')
ax[0].plot(history1.history['val_acc'], label='test')
ax[0].set_title('Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend()

# summarize history for loss
ax[1].plot(history1.history['loss'], label='train')
ax[1].plot(history1.history['val_loss'], label='test')
ax[1].set_title('Model loss / cost')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[0].legend()
plt.show()

predicted1 = model1.predict_classes(x_test, verbose=1, batch_size=512)
###
plot_confusion_matrix(y_test, predicted1, classes=class_names, normalize=True,
                      title='Normalized confusion matrix',fig_size=10)
plt.show()
###
conf_mat1 = confusion_matrix(y_test.flatten(), predicted1)

print(classification_report(y_test, predicted1, digits=3))

In [None]:
report.plot_errors(x_test, y_test, predicted1,600)