# Multi-Class Image Classification
MNIST dataset - This is a multi-class classification problem with 10 output classes, one for each digit.

## Learning Objectives:

  * Understand the MNIST dataset.
  * Create and tune a deep neural network for multi-class classification.


## MNIST Dataset
  
* The MNIST training set contains 60,000 examples.
* The MNIST test set contains 10,000 examples.

Each example contains a pixel map showing how a person wrote a digit. For example, the following images shows how a person wrote the digit `1` and how that digit might be represented in a 14x14 pixel map (after the input data is normalized).

![image](https://www.tensorflow.org/images/MNIST-Matrix.png)

Each example in the MNIST dataset consists of:

* A label specified by a human.  Each label must be an integer from 0 to 9.  For example, in the preceding image, the rater would almost certainly assign the label `1` to the example.
* A 28x28 pixel map, where each pixel is an integer between 0 and 255. The pixel values are on a gray scale in which 0 represents white, 255 represents black, and values between 0 and 255 represent various shades of gray.  


## Use the right version of TensorFlow

The following hidden code cell ensures that the Colab will run on TensorFlow 2.X.

In [None]:
#@title Run on TensorFlow 2.x
%tensorflow_version 2.x
from __future__ import absolute_import, division, print_function, unicode_literals

In [None]:
#Import relevant libraries
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
from matplotlib import pyplot as plt

# The following lines adjust the granularity of reporting. 
pd.options.display.max_rows = 10
pd.options.display.float_format = "{:.1f}".format

# The following line improves formatting when ouputting NumPy arrays.
np.set_printoptions(linewidth = 200)

## Load the dataset

Just like `sklearn`, `tf.keras`

* Loads both the training set and the test set.
* Separates each set into features and labels.

The relevant convenience function for MNIST is called `mnist.load_data()`:

In [None]:
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()


The `mnist.load_data()` returned four separate values:

* `x_train` contains the training set's features.
* `y_train` contains the training set's labels.
* `x_test` contains the test set's features.
* `y_test` contains the test set's labels.

The MNIST .csv training set is already shuffled.

## View the dataset

MNIST does not contain column names. It is best to think of `x_train` and `x_test` as three-dimensional NumPy arrays:  



In [None]:
# Output example #1 of the training set. this is probably the digit 0
x_train[1]
#or 
#plt.imshow(x_train[1])


In [None]:
#check the digit label for example #1
y_train[1]

##Normalize feature values

Normalize the x_train and x_test dataset to bring values between 0 and 1.0. Store the values in `x_train_normalized` and `x_test_normalized`.

In [None]:
x_train_normalized = x_train/255
x_test_normalized = x_test/255

## Define a plotting function

The following function plots an accuracy curve:

In [None]:
#@title Define the plotting function
def plot_curve(epochs, hist, list_of_metrics):
  """Plot a curve of one or more classification metrics vs. epoch."""  
  # list_of_metrics should be one of the names shown in:
  # https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#define_the_model_and_metrics  

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Value")

  for m in list_of_metrics:
    x = hist[m]
    plt.plot(epochs[1:], x[1:], label=m)

  plt.legend()

print("Loaded the plot_curve function.")

## Create a deep neural net model

Specify:
* The number of `layers` in the deep neural net.
* The number of `nodes` in each layer.
* Any `regularization` layers.

The activation function of the output layer is `softmax`, which will yield 10 different outputs for each example. Each of the 10 outputs provides the probability that the input example is a certain digit.


In [None]:
#instantiate the model
model = None
model = tf.keras.models.Sequential()

#The features are stored in a two-dimensional 28X28 array. Flatten that two-dimensional array into a a one-dimensional 784-element array.
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))

#Define the first hidden layer.   
model.add(tf.keras.layers.Dense(units=32, activation='relu',kernel_regularizer=tf.keras.regularizers.l1(0.04)))

#For dropout regularization layer. 
#model.add(tf.keras.layers.Dropout(rate=0.2)

# Define the output layer. The units parameter is set to 10 because the model must choose among 10 possible output values (representing the digits from 0 to 9, inclusive).
model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

#The loss function for multi-class classification is different than the loss function for binary classification
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.003),
                loss="sparse_categorical_crossentropy",
                metrics=['accuracy'])

In [None]:
#Train the model
history = model.fit(x=x_train_normalized, y=y_train, batch_size=4000,
                      epochs=50, shuffle=True, 
                      validation_split=0.2, verbose = 0)

In [None]:
#Plot the loss curve
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Accuracy")

plt.plot(history.history['accuracy'] , label='Accuracy')
print("Accuracy:", history.history['accuracy'][-1])
plt.legend()

Optimize the model

Experiment with the following:

* number of hidden layers 
* number of nodes in each layer
* dropout regularization rate


#### Adding more nodes (at least until 256 nodes) to the first hidden layer can improve accuracy. Adding a second hidden layer also improves accuracy. When the model contains a lot of nodes, the model overfits unless the dropout rate is at least 0.5. 
