# A tutorial introduction into deep learning with Keras and Tensorflow.  We will use the MNIST dataset which is the 'Hello world' problem of deep learning

I always like to start my jupyter notebooks with this code because it fits the display window to my screen.

In [None]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

### This tutorial was adapted from Deep Learning with Python Chollet, F. (2021). Deep Learning with Python (2nd ed.). Greenwich, CT, USA: Manning Publications Co.

Start with some definitions.
Numerical data in an array are called [tensors](https://en.wikipedia.org/wiki/Tensor).

Scalars are 0-dimensional tensors (a single digit). 

In [None]:
import numpy as np
x = np.array(12)
print('The value of x is', x)
print('The dimension of this tensor is', x.ndim) # 0 dimensions

A 1-dimensional tensor is also called a vector

In [None]:
x = np.array([12, 1, 2, 3]) #create a vector
print('The value of x is', x)
print('The dimention of this tensor is', x.ndim) # 1 dimensions

A 2-dimensional tensor is also called a matrix

In [None]:
x = np.array([[12, 1, 2, 3],
              [5, 6, 7, 8,],
              [10, 11, 12, 12]])
print('The value of x is', x) # Print the 3 x 4 matrix
print('The dimension of this tensor is', x.ndim) # 2 dimensions

We can create n-dimensional tensors easily, although they become difficult to visualize.
This 3D tensor is like a cube of data.  

In [None]:
x = np.array([[[12, 1, 2, 3],
               [5, 6, 7, 8,],
               [10, 11, 12, 12]],
              [[2, 2, 2, 2,],
               [3,3,3,3],
               [4,4,4,4]],
              [[5,5,5,5],
               [6,6,6,6],
               [7,7,7,7]]])
print('The value of x is', x)
print('The dimension of this tensor is', x.ndim) # 3 dimensional array

#### Reshaping tensors is an important concept to understand.  We can reshape a tensor as long as it has the same number of elements as the initial tensor.

In [None]:
x = x.reshape(3*3*4,1)
print(x)
x = x.reshape(4, 3*3)
print(x)
x = x.reshape(2, 18)
print(x)

##### Tensors have three attributes:
- Number of axes (dimensions)
- Shape (length of each axis)
- Data type (typically we will use `float32`)

We can also manipulate tensors with TensorFlow

In [None]:
# Import Tensorflow as tf
import tensorflow as tf

In [None]:
x = tf.ones(shape=(2,1))
print(x)

In [None]:
# Create a Tensorflow variable
v = tf.Variable(initial_value=tf.random.normal(shape=(3,1)))
print(v)

In [None]:
# Once the variable is created it can be modified using assign
v2 = v.assign(tf.random.normal(shape=(3,1)))
print(v2)

In [None]:
# Now we can perform some math operations on the tensors
np.dot(v, v2)

In [None]:
# We get the above error because the shapes of v and v2 do not align properly for a dot product.
print(v.shape)
print(v2.shape)

# For a dot product to alight the column rows of X must match the rows of Y. (See Figure 2.5 in book)
# Therefore, we must transpose v2

In [None]:
v3 = np.transpose(v2)
print(v.shape)
print(v3.shape)

In [None]:
# Now the rows of v match the columns of v3 we can take the dot product
np.dot(v, v3)

# Let's build our first neural net

Load the MNIST library which is part of [Keras](https://keras.io/datasets/).  MNIST stands for [Modified National Institute of Standards and Technology](https://en.wikipedia.org/wiki/MNIST_database). It is a collection of 60,000 training and 10,000 test images of the digits 0-9. We will build a deep learning nerual net model to classify the 10 digits. This is the 'Hello World' problem of deep learning. 

In [None]:
from keras.datasets import mnist

In [None]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
train_images.shape #60,000 images that are 28 pixles by 28 pixels.

In [None]:
train_images.ndim #3D tensor

In [None]:
print('The maximum value in the array is', train_images.max()) # The maximum value in the array is 255
print('The minimum value in the array is', train_images.min()) # The minimum value in the array is 0

In [None]:
# Get the shape, dimensions, max and min value of the test images
print('test image shape:', test_images.shape)
print('number of dimensions:', test_images.ndim)
print('maximum value', test_images.max())
print('minimum value:', test_images.min())

In general, the first axis in a tensor is the samples, the second axis is height, the third axis is the width, and the fourth is color channels (3 for RGB data, and 1 for black and white). So image data will typically be a 4D tensor -- `[samples, height, width, channels]`, while the MNIST data is 3D because the color channel is black and white and can thus be ignored.
Video data will be a 5D tensor -- `[samples, frames, height, width, channels]`. By convention, time series data will be placed on the secod axis when present

Let's view one of the images.  We need to import matplotlib to view the digits 

In [None]:
import matplotlib.pyplot as plt

In [None]:
digit = train_images[4] # Select the fourth sample.
plt.imshow(digit, cmap=plt.cm.binary) # Show the sample.  cmap is the color map.  We will keep it black and white (binary)
plt.show()

The 4th train image looks like the number 9.  Lets make sure the label matches.

In [None]:
train_labels[4]

In [None]:
# Import models and layers from the keras library
from keras import models
from keras import layers

We will be working with sequential models and *dense layers*. More on what those mean later.  Another name for a dense layer is a *fully connected layer*.  The dense layer must be one-dimensional. Therefore, the input image matrix must be reshaped into a vector. There are 60,000 test images with a shape of 28 x 28. We will reshape each image into a vector of length 28 * 28 == 784.

We pick the `relu` activation function for our first layer and our output layer activation function is `softmax` because we have a multiclass classification problem.

In [None]:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(10, activation='softmax'))

Now we compile the model.  We use the `adam` optimizer and choose `categorical_crossentropy` for the loss function because it is a multiclass classification problem. We will evaluate our model accuracy.

In [None]:
model.compile(optimizer = 'adam',
               loss = 'categorical_crossentropy',
               metrics = ['accuracy'])

Now the model is built and compiled we need to process the images for the model.  The images need to be reshaped into a vector of the same dimentions as the input shape above.  We also normalize the values of the images to be between 0 and 1.

In [None]:
scale_factor = train_images.max()
train_images =  train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32')/scale_factor

test_images =  test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32')/scale_factor

print(train_images.ndim)

In [None]:
print('train image shape:', train_images.shape)
print('number of dimensions:', train_images.ndim)
print('maximum value', train_images.max())
print('minimum value:', train_images.min())

In [None]:
# We can always get our images back by reshaping to a matrix.
plt.imshow(train_images.reshape((60000,28,28))[4], cmap=plt.cm.binary)

In [None]:
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [None]:
# We need to convert the labels into catergorical values.
# We can check the train lables for the 4th value to ensure
# it is labeled as 9
print(train_labels[4])

In [None]:
# Batch size is how many images to process at once. 
# Epoch is how many times to repeat the analysis.  
# Each epoch performs 500 gradient updates (60,000/120 = 500)
model.fit(train_images, train_labels, epochs = 5, batch_size = 120) 

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

# Your Turn.
# Below is your assignment
####  Build 3 different models with a dense layer with `relu` activation.  The output layer activation must be `softmax` since we have a multiclass problem.  You will compile the three different models with different optimizers.

In [None]:
model1 = models.Sequential()
model1.add(layers.Dense(512, activation='', input_shape=(28 * 28,)))
model1.add(layers.Dense(10, activation=''))

model2 = models.Sequential()
model2.add(layers.Dense(512, activation='', input_shape=(28 * 28,)))
model2.add(layers.Dense(10, activation=''))


model3 = models.Sequential()
model3.add(layers.Dense(512, activation='', input_shape=(28 * 28,)))
model3.add(layers.Dense(10, activation=''))

#### Compile your three models with three different optimizers. Page 89 - 90 of the tetbook list some different optimizers.  You can also find more optimizers and documentation here: https://keras.io/api/optimizers/
#### Use `categorical_crossentropy` for loss since this problem is a multiclass classification problem. The metric will be `accuracy`.

In [None]:
model1.compile(optimizer='',
               loss='',
               metrics=[''])

model2.compile(optimizer = '',
               loss='',
               metrics=[''])

model3.compile(optimizer='',
               loss='',
               metrics=[''])

#### Fit the models with epochs = 5 and  batch_size = 150

In [None]:
print('Fitting model 1...')
model1.fit(train_images, train_labels, epochs=5, batch_size=120)
print('Fitting model 2...')
model2.fit(train_images, train_labels, epochs=5, batch_size=120)
print('Fitting model 3...')
model3.fit(train_images, train_labels, epochs=5, batch_size=120)


#### Test the accuracy of the model on the test images and test labels.

In [None]:
test_loss, test_acc = model1.evaluate(test_images, test_labels)
print('model1_test_acc:', test_acc)

test_loss, test_acc = model2.evaluate(test_images, test_labels)
print('model2_test_acc:', test_acc)

test_loss, test_acc = model3.evaluate(test_images, test_labels)
print('model3_test_acc:', test_acc)

# Which optimizer gave the highest accuracy? Write you answer below



### Using the optimizer that gave the highest accuracy compile 3 different models with 3 hidden layers and varying units in each hidden layer.  The first  layer is given to you.

In [None]:
h1_model = models.Sequential()
h1_model.add(layers.Dense(512, activation='relu',input_shape=(28 * 28,)))
h1_model.add(layers.Dense(, activation=''))
h1_model.add(layers.Dense(, activation=''))
h1_model.add(layers.Dense(, activation=''))
h1_model.add(layers.Dense(, activation=''))

h2_model = models.Sequential()
h2_model.add(layers.Dense(512, activation='relu',input_shape=(28 * 28,)))
h2_model.add(layers.Dense(, activation=''))
h2_model.add(layers.Dense(, activation=''))
h2_model.add(layers.Dense(, activation=''))
h2_model.add(layers.Dense(, activation=''))

h3_model = models.Sequential()
h3_model.add(layers.Dense(512, activation='relu',input_shape=(28 * 28,)))
h3_model.add(layers.Dense(, activation=''))
h3_model.add(layers.Dense(, activation=''))
h3_model.add(layers.Dense(, activation=''))
h3_model.add(layers.Dense(, activation=''))

#### Complie the three models with the best optimizer from above.

In [None]:
h1_model.compile(optimizer='',
               loss='',
               metrics=[''])

h2_model.compile(optimizer='',
               loss='',
               metrics=[''])

h3_model.compile(optimizer='',
               loss='',
               metrics=[''])

#### Fit the models with epochs = 5 and  batch_size = 120

In [None]:
print('Fitting model h1...')
h1_model.fit()
print('Fitting model h2...')
h2_model.fit()
print('Fitting model h3...')
h3_model.fit()

#### Test the accuracy of the 3 models

In [None]:
test_loss, test_acc = h1_model.evaluate(test_images, test_labels)
print('h1_model_test_acc:', test_acc)

test_loss, test_acc = h2_model.evaluate(test_images, test_labels)
print('h2_model_test_acc:', test_acc)

test_loss, test_acc = h3_model.evaluate(test_images, test_labels)
print('h3_model_model_model_test_acc:', test_acc)

#### Which model gave the highest accuracy? Write you answer below.