<a href="https://colab.research.google.com/github/jdhaecker/Training/blob/master/IntroToDNNwKeras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**STEP 1**

###Clone the repo to get the slides and the model for the course

In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/IntroToDNNwKeras.git cloned-repo
%cd cloned-repo
!ls

In [None]:
from IPython.display import Image
def page(num):
    return Image("Intro to Deep Neural Networks with Keras ("+str(num)+ ").png")
print("done")

**STEP 2**

In [None]:
from IPython.display import Image
Image("Intro to Deep Neural Networks with Keras (2).png")

In [None]:
from IPython.display import Image
Image("Intro to Deep Neural Networks with Keras (23).png")

##Why are we using Keras?

Keras is the most popular framework for deep learning. 

In [None]:
page(4)

Keras is a high level API, which means it is very easy to use and there is often little need to debug code. It is great for fast prototyping, you can often create a model in just a few minutes. 

It is not a high performance framework and is better suited to smaller datasets. So in addition to Keras, you may want to learn TensorFlow or PyTorch. 


In [None]:
page(5)

##Install the necessary libraries

https://www.codeastar.com/visualize-convolutional-neural-network/



In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf

!pip install keras

##Load the MNIST dataset from Keras

In [None]:
page(6)

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. </br>

The database is also widely used for training and testing in the field of machine learning. The black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.</br>

The MNIST database contains 60,000 training images and 10,000 testing images.</br> 

There have been a number of scientific papers on attempts to achieve the lowest error rate; one paper, using a hierarchical system of convolutional neural networks, manages to get an error rate on the MNIST database of 0.23%. The original creators of the database keep a list of some of the methods tested on it. In their original paper, they use a support-vector machine to get an error rate of 0.8%. 

An extended dataset similar to MNIST called EMNIST has been published in 2017, which contains 240,000 training images, and 40,000 testing images of handwritten digits and characters.

In [None]:
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
print("done")

Print the number of images for the train and test sets. 

There should be 60,000 images in the training set. 
The test set should have 10,000 images. 

In [None]:
page(9)

In [None]:
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])

When we print the shape of the training set, we see it is 60,000 images of size 28 x 28

In [None]:
page(10)

Print out the array for training image #7777. 

The values printed are the values for the pixels in the 28x28 array. 
If you squint your eyes, you can kind of see the non-zero values form a figure '8'. 

In [None]:
image_index = 7777
print(x_train[image_index])

Next, plot the 28x28 pixel images. First using color, then using greyscale. 

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline  
print(y_train[image_index]) 

In [None]:
plt.imshow(x_train[image_index], cmap='Greys')

Check the test images by plotting a sample test image.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline  
test_image_index = 2222
print(y_train[test_image_index]) 


In [None]:
plt.imshow(x_train[test_image_index], cmap='Greys')

##Prepare the data for the model

In [None]:
page(11)

Right now the data is a series of 28x28x1 arrays. We need 4 dimensions to use the Keras API, so add another dimension using the reshape function. 

Divide the range by 255. The range can be described with a 0.0-1.0 where 0.0 means 0 (0x00) and 1.0 means 255 (0xFF). 

Normalization will help you to remove distortions caused by lights and shadows in an image

In [None]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# Reshaping the array to 4-dims so that it can work with the Keras API
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
#print(x_train[image_index])
print("done")

##Transform the labels to one-hot encoding

If the labels are left as digits the model could interpret them as 0< 1 <2 <3 <4 <5 <6 <7 <8 <9. Which would be incorrect. 

To solve this problem, the digits are one-hot encoded.

In [None]:
page(12)

In [None]:
from keras.utils import to_categorical
import numpy as np

#The digits are encoded into 10 categories
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
print("done")

In [None]:
#Check the digit for your chosen test 
digit = np.argmax(y_train[image_index])
#print("The one hot encoding for the chosen digit is " + str(y_train[image_index]))
print("The chosen digit is " + str(digit) + "\n")

In [None]:
#This code was added to get inputs for predictions. 
#Delete this code when using your own models

import numpy as np
x_check = x_test[9995:9999]
y_check = y_test[9995:9999] 
x_test = np.delete(x_test, [9995,9996,9997,9998,9999], 2 )
y_test = np.delete(y_test, [9995,9996,9997,9998,9999] )
print(x_test.shape)
print(y_test.shape)
print(x_check.shape)
print(y_check.shape)

##Understanding of the layers used in Convolutional Neural Networks

In [None]:
page(15)

##Convolution
Convolution is basically filtering the image with a smaller pixel filter (also called kernel) to decrease the size of the image without loosing the relationship between pixels. The relationship between pixels is crucial for the model to learn about the image.

In the figures below the 2x2 kernal is convolved with a 4x4 image. The kernal 'strides' across the image one pixel at a time. <br><br>
In this instance the stride is 1, but the stride can be any number.  The kernal can also be any size, but it is usually smaller than the image and larger than a 1x1 matrix. The most commonly used kernel size is 3x3. <br><br>

To calculate the convoluted feature: multiply the corresponding image pixel with the kernal pixel. 

In [None]:
Image("Intro to Deep Neural Networks with Keras (116).png",width=690)

In [None]:
Image("Intro to Deep Neural Networks with Keras (16).png",width=690)

This image and kernel requires 9 steps to completely filter the image. So on the 9th convolution, the convoluted feature is complete. <br><br>
Notice the convoluted feature is smaller than the original image. For this case, this is good, the smaller convoluted feature makes the network smaller and easier to train. <br><br>
To keep the convoluted image the same size as the original image, rows and columns of zeros are added to the original image and the kernal uses these during convolution. These additional rows and columns are called 'padding'. 

In [None]:
page(18)

Keras will randomly initialize the kernals when the model is trained. Which is good because it reliefs you from writing the code to initialize all the parameters. <br><br>
The downside is, each time the model is trained the initialization values change, so your model performance will change. <br><br>
**It is a good practice to always save your models after training, then select the best performing model to re-use**. 

Just for your information, there are a number of filters that are used for specific tasks, you may use these when you do certain types of image classification.  <br>

A few of these filters are listed below. 

In [None]:
page(24)

##Pooling

Convolution maps a region of an image to a feature map. This helps the network detect features of the image. <br><br>
The next step after convolution is called 'pooling'. Pooling is used to reduce the resolution of the feature map while retaining the features required for classification. <br><br>
Backpropagation (which we will not discuss in this class) is used to train the pooling operation. <br><br>
In the figure below, two types of pooling are demonstrated. The top example is max pooling. In max pooling a filter slides over the image and the maximum value within the filter is saved, and the other pixel values are discarded. <br><br>
With average pooling the average of the values is saved, the values themselves are discarded. <br><br>
In convolution, the kernel partially overlaps itself as it slides across the image. With pooling, there is no overlap. 

In [None]:
page(19)

##Flatten

Flattening a tensor means removing all of the dimensions except for one. This is exactly what the Flatten layer does.

In [None]:
page(28)

##Dense Layers

A dense layer is also known as a fully connected layer.<br><br>

Dense Layer is regular layer of neurons in Neural Network. Each neuron receives input from all previous neurons. Hence it forms Dense Layer. This layer represents matrix vector multiplication. 

In [None]:
page(29)

Layers are made of a collection of neurons. <br><br>
A neuron takes a group of weighted inputs, applies an activation function, and returns an output.<br><br>

Inputs to a neuron can either be features from a training set or outputs from a previous layer’s neurons. Weights are applied to the inputs as they travel along synapses to reach the neuron. The neuron then applies an activation function to the “sum of weighted inputs” from each incoming synapse and passes the result on to all the neurons in the next layer.

In [None]:
page(30)

##Activation Functions

In [None]:
page(31)

Activation functions are an extremely important feature of the artificial neural networks. They basically decide whether a neuron should be activated or not. Whether the information that the neuron is receiving is relevant for the given information or should it be ignored.<br><br>

The activation function is the non linear transformation that we do over the input signal. This transformed output is then sent to the next layer of neurons as input.<br><br>

Can we do without an activation function?<br>
Now the question which arises is that if the activation function increases the complexity so much, can we do without an activation function?<br><br>

When we do not have the activation function the weights and bias would simply do a linear transformation. A linear equation is simple to solve but is limited in its capacity to solve complex problems. A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks. We would want our neural networks to work on complicated tasks like language translations and image classifications. Linear transformations would never be able to perform such tasks.
<br><br>
Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases. Without the differentiable non linear function, this would not be possible.<br><br>


##Softmax
The Softmax function is an activation function that turns numbers into probabilities that sum to one. The function outputs a vector that represents the probability distributions of a list of potential outcomes.

In [None]:
page(25)

In [None]:
page(26)

##Dropout

In [None]:
page(33)

Dropout refers to ignoring neurons during training. The dropped neuron are selected at random. Ignoring means these units are not considered during a particular forward or backward pass.<br><br>

At each training stage, individual nodes are either dropped out of the net so that a reduced network is left.  The incoming and outgoing edges to a dropped-out node are also removed.<br><br>

Why do we need Dropout?<br>
Dropout is used “to prevent over-fitting”.
A fully connected layer occupies most of the parameters, and hence, neurons develop co-dependency amongst each other during training which curbs the individual power of each neuron leading to over-fitting of training data.

In [None]:
page(35)

###Underfitting a model

An underfit model performs poorly even on the training set, we say that the model has a high bias.

According to Andrew Ng, the best methods of dealing with an underfitting model is trying a bigger neural network (adding new layers or increasing the number of neurons in existing layers) or training the model a little bit longer.

###Overfitting a model


Overfitting occurs when the model performs well when it is evaluated using the training set, but cannot achieve good accuracy when the test dataset is used. This kind of problem is called “high variance,” and it usually means that the model cannot generalize the insights from the training dataset.

Andrew Ng suggests that the best solution to overfitting is getting more data and using regularization.

Such a solution is suggested, because the model may not get enough training examples to learn the patterns properly, so adding news observations to the training dataset may increase the chance of getting a better model.

On the other hand, it is possible that the neural network is too complicated and because of that, it can deal adequately only with the training set examples. If it were a human writing an exam at school, we would say that he/she has memorized the homework, but he/she did not learn the concept. The same may happen to a machine learning model.

##Create a Deep Neural Network model 

In [None]:
page(13)

The convolutional neural network has 9 layers. 
The first six layers are for feature extraction and the last 3 are for classifying the image. <br><br>
Notice the convoultional layers, what is the kernel size, the activation function, and the number of filters for each layer?





In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout, BatchNormalization

#keras.layers.BatchNormalization()

model = Sequential()
#create the feature extraction layers
#layer1,2,3 (Dropout is not considered a layer)
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=x_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
#layer4,5,6
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25))
#layer7
model.add(Flatten())
#Create the classification layers
#layer8,9
model.add(Dense(256, activation='relu'))
model.add(Dropout(rate=0.5))
model.add(Dense(10, activation='softmax'))

model.compile(
    loss='categorical_crossentropy', 
    optimizer='adam', 
    metrics=['accuracy']
)
print("done")

##Examine the model

In [None]:
page(14)

In [None]:
model.summary()

##Data augmentation for images


In [None]:
page(100)
Image("Intro to Deep Neural Networks with Keras (100).png", width=700)

This is a large dataset, so you do not need to do any data augmentation, so in the code in the next cell, the values are all zero.  

If you need to increase the size of your dataset, a quick way to do it is with augmentation. An image can be rotated, flipped vertically or horizontally, cropped, or shifted to create additional data instances. 

The code in the cell below is included as an example of how to do data augmentation. 

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

#when doing data augmentation, change the values 
datagen = ImageDataGenerator(
  rotation_range=0,
  zoom_range=0,
  width_shift_range=0,
  height_shift_range=0
)
print("done")

##Train the model
Training this model takes some time. If we did one epoch it would 
take us at least 10 minutes. 

</br>When training a model, you might want to start at 5 or 10 epochs.  For the sake of time, you download a model already trained on 20 epochs. 

In [None]:
#This code is commented it out because we are not training the model. 
#We are loading a trained model.

#One epoch is not enough to train the model. For the sake of time, you only train
#on one epoch. When training a model, start with at least 4 epochs. 
epochs = 20
#When training, chaning the batch size can improve model performance. 
#Batch sizes to try: 32, 64, 128
batch_size = 32
#history = model.fit(x_train, y_train, validation_data=(x_test,y_test), verbose=0)
#history = model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), epochs=epochs,
#                              validation_data=(x_test, y_test), steps_per_epoch=x_train.shape[0]//batch_size)
print("done")

##Load the saved model
<br><br>
The saved model was trained on 20 epochs with dropout. 

In [None]:
# load the model

from numpy import loadtxt
from keras.models import load_model
#model.save("drive/My Drive/modelMNIST-10epochs.h5")
model = tf.keras.models.load_model('modelMNIST20epochsDropOut5.h5')
print("done")

##Visualize the training

In [None]:
#This code was added because we are loading a saved model. 
#Delete this code when training your own model

history={
'loss' : [0.02159241828986839, 0.024359366491305854, 0.024075006447355083, 0.022434302305513597, 0.022502201637798196, 0.02092262319721639, 
        0.022628609957231833, 0.023984545621410218, 0.0229671287720994, 0.022892831495716928, 0.02297513877024704, 0.02238998557519702, 
        0.023669260091022035, 0.021758925482126285, 0.02549580325261595, 0.024763063155473206, 0.02372065919294233, 0.025702197297606425, 
        0.023723514277891884, 0.02281903999319464],
'accuracy' : [0.99403334, 0.9931333, 0.99298334, 0.9931833, 0.9935833, 0.9935667, 0.99338335, 0.99328333, 0.9931167, 0.99343336, 0.9938833, 
            0.99371666, 0.99333334, 0.9942667, 0.9931833, 0.9935333, 0.99355, 0.9932, 0.9934833, 0.9939167],

'val_loss' : [0.026281631059093, 0.028667511040699006, 0.02301221162546947, 0.025282778917519604, 0.024452570569623546, 0.035854992368808306, 
            0.02532163344535977, 0.02743238721277157, 0.02539914317318045, 0.029287450911669267, 0.0317203404111008, 0.02501243616821786, 
            0.02125193558843274, 0.025320859121663195, 0.02633974252908173, 0.028473211738133632, 0.0272574734327402, 0.029718255650349197, 
            0.03425914892460101, 0.025203502631789627],
'val_accuracy' : [0.9931, 0.9928, 0.9938, 0.9943, 0.9946, 0.9927, 0.994, 0.9933, 0.9945, 0.9938, 0.9931, 0.9934, 0.9952, 0.994, 0.9944, 
                0.9934, 0.9935, 0.994, 0.993, 0.9942]
}

#This code was modified because we are loading a saved model. 
#Change the code to: 
#print(history.history.keys())
#print(history.history)
print(history.keys())
print(history)

In [None]:
#This code was modified because we are loading a saved model. 
#Change the code to: 
#plt.plot(history.history['accuracy'], label='training accuracy')
#plt.plot(history.history['val_accuracy'], label='testing accuracy')

import matplotlib.pyplot as plt

axes = plt.gca()
axes.set_ylim([0.9,1])
plt.plot(history['accuracy'], label='training accuracy')
plt.plot(history['val_accuracy'], label='testing accuracy')
plt.title('Accuracy')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.legend()

In [None]:
#This code was modified because we are loading a saved model. 
#Change the code to: 
#plt.plot(history.history['loss'], label='training loss')
#plt.plot(history.history['val_loss'], label='testing accuracy')

axes = plt.gca()
axes.set_ylim([.0,.5])
plt.plot(history['loss'], label='training loss')
plt.plot(history['val_loss'], label='testing loss')
plt.title('Loss')
plt.xlabel('epochs')
plt.ylabel('loss')
plt.legend()

##Check the model's prediction capabilities

In [None]:
test_index = test_image_index
test_img = x_test[test_index]
plt.imshow(test_img.reshape(28,28), cmap='gray')
plt.colorbar()
#plt.title("Index:[{}] Value:{}".format(test_index, y_train.values[7777]))
plt.show()

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
import matplotlib as mpl
inp = model.input 

layer_outputs = [layer.output for layer in model.layers]
activation_model = Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(test_img.reshape(1,28,28,1))

def display_activation(activations, col_size, row_size, act_index):
  activation = activations[act_index]
  activation_index=0
  fig, ax = plt.subplots(row_size, col_size, figsize=(row_size*2.5,col_size*1.5))
  #plt.colorbar('gray')
  for row in range(0,row_size):
    for col in range(0,col_size):
      ax[row][col].imshow(activation[0, :, :, activation_index], cmap='gray')
      activation_index += 1

print("done")


In [None]:
import numpy as np

act_dense_3 = activations[11]
y = act_dense_3[0]
x = range(len(y))
objects = np.arange(0, 9, step=1)
plt.xticks(x, objects)
plt.bar(x,y,align='center')
plt.show()

In [None]:
print(activations[11])

###Use the model to predict digits

In [None]:
#Pick a number (0-3) from the check set

choose = 0 
test_img = x_check[choose] 
#print(x_check[choose])
#Determine what is the digit for your chosen test 
digit = np.argmax((y_check[choose]))
print("The chosen digit is " + str(digit))

#Use the model to predict the chosen digit
pred = model.predict(test_img.reshape(1,28,28,1))
#Print the array of probabilites for the digits 0 -9
print("\nThe model prediction is: ")
print(pred)

#Print the index of the max value of the probabilities array, 
#This is the predicted digit
answer = np.argmax(pred)
print("Which is " + str(answer))

In [None]:
activation_model = Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(test_img.reshape(1,28,28,1))


act_dense_3 = activations[11]
y = act_dense_3[0]
x = range(len(y))
objects = np.arange(0, 9, step=1)
plt.xticks(x, objects)
plt.bar(x,y,align='center')
plt.show()

In [None]:
page(36)

#Please give feedback for this course: 

[Intro to DNN with Keras](https://docs.google.com/forms/d/e/1FAIpQLSceD-PVXYfXTDFBRd6UrhtEO0jxPS_iXtSEQEy3POQ3lh8l-A/viewform?usp=pp_url)


##[TensorFlow Playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.50576&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)