# Convolutional Neural Network

So far you have been working with densely connected networks. With enough training data and enough neurons, a densely connected network could solve just about any problem. 

However, a convolutional neural network is especially effective for image classification problems. This is because convolutional networks are really good at understanding how data is connected. The structure of the network is such that it inherently learns what parts of the data matter most, and how they are connected. Additionally, for more complicated data, a convolutional network helps keep the number of neurons needed down.

**Convolutional neural networks** imitate the structure of neurons that process images in the brain and use techniques to reduce neuron count, as well as maintaining positional relationships in the data.

You are going to build a convolutional network for the MNIST set of data to recognize handwritten digits, and then be able to see what works differently.

## Data Preparation

Since you have already done this part, you can re-use the same code for the data prep. The cell below has that code already created for you. 

1. Run the cell below to import and prepare the MNIST data for your Convolutional Neural Network.

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras import backend as K
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# helper functions
def show_min_max(array, i):
  random_image = array[i]
  print("min and max value in image: ", random_image.min(), random_image.max())


def plot_image(array, i, labels):
  plt.imshow(np.squeeze(array[i]))
  plt.title(" Digit " + str(labels[i]))
  plt.xticks([])
  plt.yticks([])
  plt.show()


img_rows, img_cols = 28, 28  

num_classes = 10 

(train_images, train_labels), (test_images, test_labels) = mnist.load_data() 
(train_images_backup, train_labels_backup), (test_images_backup, test_labels_backup) = mnist.load_data() 

print(train_images.shape) 
print(test_images.shape) 

train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, 1)
test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

train_images = train_images.astype('float32')
test_images = test_images.astype('float32')

train_images /= 255
test_images /= 255

train_labels = keras.utils.to_categorical(train_labels, num_classes)
test_labels = keras.utils.to_categorical(test_labels, num_classes)

print(train_images[1232].shape)

(60000, 28, 28)
(10000, 28, 28)
(28, 28, 1)


## Creating Your Network

Like with the densely connected network, you are going to set up epochs. Remember, epochs are how many rounds the network will have, or how many times it should pass over the data.

You will also define this network as Sequential. Convolutional networks, like densely connected ones, take the output from one layer to feed in as input in the next layer. However, the type of layers that this network uses will be different. 



In the cell below, add epochs and define the model. 

1. Set the epoch variable to 10
2. Set the model to `Sequential()`

In [3]:
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout

# Set your variables here: 
epochs =  10
model = Sequential()

## Adding Layers

With your densely connected network, you added three fully connected (or dense) layers. The layers in this network are convolutional layers and work a little differently. 

Convolutional layers use small clusters of neurons called filters that are moved across the image and activate based on the pixels they see. These clusters learn to recognize features in the data.

You can adjust the number and size of filters in the layer — larger filters observe larger areas of the image at once, while smaller filters spot finer detail. A higher filter count will allow recognizing a wider range of features.

There are multiple advantages to having filters that work this way:

*  Small filters are more computationally efficient since they only examine a small portion of the image at once.

*  Just like in real life, the filter can do a better job by focusing on a small simple problem and ignoring the distraction of the rest of the image.
*  Because the filters are moved across the entire image, convolutional networks are good at identifying two images that have the same object but in different places in each image.

Your network will use multiple layers of convolutions to learn its task.

### Implementing Convolutional Layers

Keras provides functionality to easily create convolutional layers for your neural networks. You'll use the function  Conv2D  to create the first convolutional layer of your network.

1. In the cell below, fill in the following information: 
  * filters: 32
  * kernal_size: (3, 3)
  * activations: 'relu'
  * input_shape: input_shape

In [4]:
model.add(Conv2D(filters=32, kernel_size=(3,3),activation='relu',input_shape=input_shape))



> The Conv2D class creates two-dimensional convolutional layers, meaning they scan across a flat surface, like an image.

> There's also a Conv3D class, that scans across a three-dimensional volume.  Can you think of any cases where the Conv3D class would be used?



## Pooling Layers

Processing images with convolutional layers can get computationally intense rather quickly. Successive layers of convolutions increase the number of neurons and computation time required.

> Convolutional networks use **pooling layers** to manage that growth of complexity by simplifying and shrinking the data set.

Pooling layers use a filter that moves across the data with a specified stride, simplifying the contents of each filter into a single value. This shrinks the size of the layer's output based on the filter's size.

This also helps reduce the network's translation variance, which is how sensitive the network is to an object's exact position in an image.

The most common pooling layer is a 2x2 filter with a stride of 2. This results in the width and the height of the input layer being reduced by half. This simplifies the data without too much loss of specificity in the image.

### Adding a Pooling Layer

1. In the cell below, set the pool size to `(2,2)` and run the cell.

In [5]:
model.add(MaxPooling2D(pool_size=(2,2)))

Convolutional Neural Networks often have more then one set of alternating Convolutional and Pooling Layers.

## More Convolutional Layers

By design, convolutional layers can examine the low-level features of an image.  By adding more convolutional layers the network can start to work with higher-level features.

This layer is defined the same way as the last one, except for more filters, 64 here vs. 32 before.  Also, the input shape doesn't need to be defined, since it's inferred from the previous layer.

1. In the cell below add another conv2d layer to the network, with `64` filters and a kernel size of `(3,3)`

In [6]:
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))

## Dropout Layers

A dropout layer takes a percentage of all the neurons in the input and deactivates them at random. This random dropout of neurons forces more of the network to adapt to the task.

Without a dropout layer, larger networks run the risk of growing overdependent on a small set of competent neurons rather than the whole network learning the task. Becoming overdependent on certain neurons is called Overfitting and it can skew the output of your network. 

1. Set the rate of the dropout layer to `0.3` or 30%
2. Run the cell to add this layer to your network.

In [7]:
model.add(Dropout(rate=0.3))

3. Add another convolutional layer with `32` filters and a kernel size of `(3,3)` and an activation of `relu`. This will be the last calculation layer.
4. Run the cell to add this layer to your network. 

In [8]:
# Add your Conv2D layer here:
model.add(Conv2D(32, (3,3), activation='relu'))

## Dense and Flatten Layers

At the end of the convolutional and pooling layers, you'll need to set up some neurons to help make your final classification decision. This will be a standard, fully connected layer of neurons. In order to connect these layers, you'll first need to flatten the 2D image's filters.

Keras uses the  `Flatten`  function to create a flattening layer, and the  `Dense`  function to create dense layers.

` units=32 ` is how big the output of the Dense layer will be, in this case, 32 neurons.

1. Look at the cell below to see the Flatten() layer. 
2. Set the units (`32`) and activation `'relu'` for the Dense layer below.
3. Run the cell to add these layers to your network. 

In [9]:
model.add(Flatten()) 
model.add(Dense(units=32 ,activation='relu'))

## Output Layers
Just like with your fully connected network, the final layer needs to shrink the previous layer down to just the number of possible classes. Like before, the decision is represented by the class with the highest weight. 

1. In the cell below, add a Dense layer with these stats: 
  * units: the number of output classes
  * activation: 'softmax'

In [10]:
model.add(Dense(units=10, activation='softmax'))

Finally, print out a summary of your network. 

2. Run the cell below.

In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 dropout (Dropout)           (None, 11, 11, 64)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 9, 9, 32)          18464     
                                                                 
 flatten (Flatten)           (None, 2592)              0         
                                                        

## Compiling the Network

Just like with your first network, you are going to compile this network. The loss and metrics will be the same as your last network: `categorical_crossentropy`, and `accuracy`. This time, however, you are going to use a different training algorithm called **RMSProp**. 

RMSProp is one of several different training algorithms Keras defines to do the computation that actually teaches the network how to improve. For the neural network, the goal is to optimize the loss by making it as small as possible. RMSProp is one way the network can do this. 

1. Set each argument in the compile function below. 
  * loss: `'categorical_crossentropy'`
  * optimizer: `'rmsprop'`
  * metrics: `['accuracy']`

In [12]:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',  metrics=['accuracy'])

## Training

The fit function does the actual work of running the training.  Now that the data has been processed and the model has been defined and compiled it is time to actually train the model on the data. In this network, you are going to use validation data to get an idea of how the network is performing with each epoch, instead of just once at the end. 

Let's look at the parameters of this function:

First, train_images and train_labels define the data that this network will be trained on. The images are the input and the labels are the expected output. 

Next, the batch_size allows you to put the data into the network in batches. For this you can set it to `64` for the first training. You can always change this later. 

Then, you will tell the network how many training rounds or epochs this network will be trained with. 

Now, set your validation data to the test_images and test_labels so that the network knows what data to test itself against. 

Finally, you will set the shuffle option to True so that each epoch is different and the model isn't relying on the order of the images to learn about them. 

1. Run the cell below to start training your network


In [None]:
model.fit(train_images, train_labels, batch_size=64, epochs=epochs, validation_data=(test_images, test_labels), shuffle=True) 

Epoch 1/10
Epoch 2/10
Epoch 3/10

## Evaluating and Returning the Model


Now that you have created your model, compiled it, and trained it, you need to figure out a way to test it and see how accurate it is on data it hasn't seen yet! Just like with your other network, you will need to evaluate your network.

The evaluate function returns an object that stores the results of the evaluation. You will be adding a few arguments to this function so the network knows what data to test itself on. 

Just like with the other stages, Tensorflow has some tools to help you out.

1. Add a print statement to see the test accuracy.
2. Run the cell.

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
scores = model.evaluate(test_images, test_labels, verbose=0)
# Add a print statement for the test accuracy
print('Test accuracy:', scores[1])

## Saving your Model

Like before, you might want to save a copy of this trained model to use later.

1. Run this cell to save the model to your student folder.

In [14]:
model.save('cnn_model.h5')