# P04 Introduction to Keras and CNN

In this practical, we will learn how to develop a deep learning model using a popular deep learning tools called Keras. We will implement a model that achieves near state-of-the art performance on the MNIST handwritten digit recognition task.

After completing this tutorial, you will be able to do the following:

* Learn how to use Keras to implement a baseline neural network model
* Learn how to use Keras to implement a simple Convolutional Neural Network (CNN)
* Implement a close to state-of-the-art deep learning model for MNIST.

In [1]:
import numpy as np

---

# <u> The MNIST Handwritten Digit Recognition Problem </u>

The MNIST is a dataset for handwritten digit classification that is commonly used to evaluate machine learning models. 

* The images have been normalized in size and centered. Very little data cleaning or preparation are required, allowing the developer to focus on the machine learning.
* It comprises images of 10 digits (0 to 9) taken from a variety of scanned documents. The task is to predict the digits (0 to 9) in each image.
* There are 60,000 images for training and 10,000 images for testing. Each image has a size of 28 x 28 pixel (784 pixels total). 
* Results are reported using classification error (ratio of wrongly classified images). The lower the better.
* A good machine learning algorithm is expected to achieve a prediction error of less than 1%. State-of-the-art prediction error of approximately 0.2% can be achieved with large Convolutional Neural Networks.

### Loading the MNIST dataset in Keras

The Keras deep learning library provides a convenience method (`keras.datasets.mnist.load_data`) for loading the MNIST dataset. The MNIST dataset can be found in the file `~/.keras/datasets/mnist.pkl`.

In [2]:
# load (download if necessary) the MNIST dataset
from keras.datasets import mnist
import os
(X_train, y_train), (X_test, y_test) = mnist.load_data(os.sep.join([os.getcwd(),'data', 'mnist.npz']))

Using TensorFlow backend.


**Exercise 1: Explore the dataset**

1. Get the shape of `X_train`, `X_test`, `y_train`, `y_test` 
2. Get the type of `X_train` and `y_train`
3. Get the value range for `X_train`
4. Print out the all possible values of `y_train` and their frequencies
5. Display four samples (given for you)

In [3]:
# Your code here
print("X_train: ", X_train.shape)
print("X_test: ", X_test.shape)
print("y_train: ", y_train.shape)
print("y_test: ", y_test.shape)
print("X_train range: ", X_train.min(), " - ", X_train.max())
print("y_train: ", np.bincount(y_train))


X_train:  (60000, 28, 28)
X_test:  (10000, 28, 28)
y_train:  (60000,)
y_test:  (10000,)
X_train range:  0  -  255
y_train:  [5923 6742 5958 6131 5842 5421 5918 6265 5851 5949]


In the following, we print out the four samples of `X_train` to depict what we are working on. 

In [4]:
import matplotlib.pyplot as plt
plt.figure()
plt.subplot(221)
plt.axis('off')
plt.imshow(X_train[0], cmap=plt.get_cmap('gray'))
plt.subplot(222)
plt.axis('off')
plt.imshow(X_train[1], cmap=plt.get_cmap('gray'))
plt.subplot(223)
plt.axis('off')
plt.imshow(X_train[2], cmap=plt.get_cmap('gray'))
plt.subplot(224)
plt.axis('off')
plt.imshow(X_train[3], cmap=plt.get_cmap('gray'))
plt.show()

<matplotlib.figure.Figure at 0x21cc6fb1518>

---
# <u> PART 1: Baseline Model with Multi-Layer Perceptrons (MLP) </u>

You can get very good results using a very simple **neural network model** with a **single hidden layer**. 

In this section we will create a simple multi-layer perceptron model that achieves an error rate of less than 1.74%. We will use this as a baseline for comparing more complex convolutional neural network models. 

We shall build the following Neural Network:
* **input layer**: Every input is flattened to form a 784-D vector
* **hidden layer**: 784 units in the hidden layer 
* **output layer**: 10 possible classes
* **loss**: Cross-entropy loss

<img src="imgs\P04_MLP1.png" width="50%">

**<u>TODO</u>: Your task for this part is to understand how Keras to build a Neural Network model.**

## (1) Loading and preparing the dataset

In [5]:
# Let’s start off by importing the classes and functions we will need.

%load_ext autoreload
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils

In [6]:
# Initialize the random number generator using a fixed seed to ensure that
# the results of your script are repeatable.
seed = 7
np.random.seed(seed)

# load the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data(os.sep.join([os.getcwd(),'data', 'mnist.npz']))

num_pixels = X_train.shape[1]*X_train.shape[2]


**Exercise 2: Data Preprocessing**
1. Flatten the 28x28 images (`X_train` and `X_test`) to a 784 vector for each image
2. Convert the type from `uint8` to `double`
3. Normalize the inputs (`X_train` and `X_test`) from 0-255 to 0-1.
4. Encode the output vector (`y_train` and `y_test`) into one-hot embedding representation. For example, if for digits '1', it should encoded as (0, 1, 0, 0, 0, 0, 0, 0, 0, 0). Hints: use `np.utils.to_categorical`)

In [7]:
# Your code here
X_train = X_train.reshape(X_train.shape[0], -1).astype("double")
X_test = X_test.reshape(X_test.shape[0], -1).astype("double")

print(X_train.shape, X_test.shape)

X_train -= X_train.min()
X_train /= X_train.max()
X_test -= X_test.min()
X_test /= X_test.max()

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

(60000, 784) (10000, 784)


## (2) Constructing the MLP model

We are now ready to create our simple neural network model. The model is a simple neural network with one hidden layer with the same number of neurons as there are inputs (784). 

**The Keras Sequential Model**

We shall build our model by building a **`Sequential`** model. The `Sequential` model is a linear stack of layers. You can create a `Sequential` model by passing a list of layer instances to the constructor. For example, the following code creates two layers (Dense + Activation -> Dense + Softmax).

``` Python
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])
```

* You can also start with an empty `Sequential` model and then add layers via the **`.add()`** method:
``` Python
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
```

In [8]:
model = Sequential()

**The first layer**

* For the *first layer*, we need to specify the *shape of its input* through the argument `input_shape` or `input_dim` so that the model knows the input shape it should expect. 
* For the *following layers*, the shapes would be *automatically* inferred and there is no need to specify its shape
* Notes on `batch size`: 
  * In `input_shape`, the batch dimension should not be included.
  * If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a `batch_size` argument to a layer. If you pass both `batch_size=32` and `input_shape=(6, 8)` to a layer, it will then expect every batch of inputs to have the batch shape  `(32, 6, 8)`.


First, we add the hidden layer to our MLP. The hidden layer is a `dense` (fully connected) layer. A layer is added to the `Sequential` model through the `.add()` method. A rectifier activation function is used for the neurons in the hidden layer.

* The first parameter specify the number of neurons in the hidden layer. We set the number of neurons in this layer to be the same as the input layer.
* Since this is the first layer, we need to specify the input shape (`input_dim=num_pixels`). 
* Use the ReLU activation layer (`activation='relu'`)
* Intialize the kernel (weight values) using  from a truncated normal distribution centered on zero (`kernel_initializer='normal'`).

In [9]:
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))

**The output layer**

Next, we add the output layer. 
* Since this is not the first layer, the input shape of this layer would be computed automatically. 
* Use the softmax activation on the output layer to turn the output scores into probability-like values and allow one class of the 10 to be selected as the model’s output prediction. 

In [10]:
num_classes = 10
model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))

**List of core layers in Keras:**

In this section, we have used only the `Dense` layer. Keras provides the API for many core layers. They can be found [here](https://keras.io/layers/core/):
* Dense
* Activation
* Dropout
* Flatten
* Input
* Reshape
* Permute
* Repeat Vector
* Lambda
* ActivityRegularization
* Masking

**Showing the model summary**

After you have completed building the model, we can display what we have built through the command **`summary`**. 
* The first layer should be a dense layer of shape `(None, 784)` where the size of the batch dimension is set to `None`. This means that the batch size is unknown until runtime.
* The second layer should be a dense layer of shape `(None, 10)`.

The total number of parameters would also be shown (expected value around 623, 290). The number of parameters indicates the size of our model. 

In [11]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 784)               615440    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                7850      
Total params: 623,290
Trainable params: 623,290
Non-trainable params: 0
_________________________________________________________________


## (3) Configure the learning process (Compile)

Before training a model, you need to configure the learning process, which is done via the **`compile`** method. It receives three arguments:

* *An optimizer*. This could be the string identifier of an existing optimizer (such as rmsprop or  adagrad), or an instance of the Optimizer class. See: [optimizers](https://keras.io/optimizers/). The optimizers available in Keras include: `sgd`, `adagrad`, `adam`, `adadelta`, `nadam` and `rmsprop`.
* *A loss function*. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as categorical_crossentropy or mse), or it can be an objective function. See: [losses](https://keras.io/losses/). The loss functions available in Keras include: `mean_squared_error`, `mean_absoute_error`, `hinge`, `squared_hinge`, `categorical_hinge`, `categorical_crossentropy`, `sparse_categorical_crossentropy`, `binary_crossentropy`, `kullback_leibler_divergence`, etc.

* *A list of metrics*. For any classification problem you will want to set this to  metrics=['accuracy']. A metric could be the string identifier of an existing metric or a custom metric function.

For our model, we shall use the *cross entropy* as our loss function and *ADAM gradient descent* algorithm as our optimizer. Since this is a classification problem, we shall use the *accuracy* as our evaluation metric.

In [12]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

## (4) Training

We can now fit and evaluate the model. Keras models receives `Numpy arrays` of input data and labels. For training a model, you will typically use the **`fit`** function.

**`fit`**: Full documentation on the function is available [here](https://keras.io/models/sequential/)

The model is fit over 10 epochs with updates every 200 images. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains. A verbose value of 2 is used to reduce the output to one line for each training epoch.

Remember that we *must never use the testing set to optimize our model*. Here, we have used the testing set as our validation set since our purpose here is simply to show that the model improves over time, and not to fine-tune the model.

In [13]:
# Fit the model

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=1)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Running the example, the accuracy on the training and validation test is printed each epoch. We can see that both the training and validation accuracy increase over the epochs. On the other hand, the training and validation loss decrease over the iterations.  

## (5) Testing and Evaluation

Finally, the test dataset is used to evaluate the model and a classification error rate is printed. The error rate is somewhere around 1.7%.

In [14]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))

Baseline Error: 1.82%


---
# <u>  Part 2 : Simple Convolutional Neural Network for MNIST </u>

In the previous section, we have seen how to load the MNIST dataset and train a simple multi-layer perceptron model on it.  In this section, we shall develop a more sophisticated convolutional neural network or CNN model.

Keras does provide a lot of capability for creating convolutional neural networks. 
* The list of *convolutional layers* provided by Keras are listed [here](https://keras.io/layers/convolutional/):
  * Conv1D, Conv2D, Separable Conv2D, Conv3D
  * Cropping1D, Cropping2D, Cropping3D
  * UpSampling1D, UpSampling2D, UpSampling3D
  * ZeroPadding1D, ZeroPadding2D, ZeroPadding3D
  

* The List of *pooling layers* provided by Keras are listed [here](https://keras.io/layers/pooling/)
  * MaxPooling1D, MaxPooling2D, MaxPooling3D
  * AveragePooling1D, AveragePooling2D, AveragePooling3D
  * GlobalMaxPooling1D, GlobalMaxPooling2D
  * GlobalAveragePooling1D, GlobalAveragePooling2D

In this section, we will create a simple CNN for MNIST that demonstrates how to use all of the aspects of a modern CNN implementation, including Convolutional layers, Pooling layers and Dropout layers.

We shall create the following CNN model:
* **input layer**: Every sample has a size of (?, 32, 32, 1)
* **CONV1 layer**: 32 5x5 filters, stride = 1, padding = 0, activation = ReLU. Output size = (?, 24, 24, 32)  
* **POOL1 layer**: 2x2 filters, stride = default to pool size, padding = 0. Output size = (?, 12, 12, 32)
* **Dropout layer**: Regularization layer. Set the dropout rate to 0.2. Output size = (?, 12, 12, 32)
* **Flatten**: Not a real layer. Flatten the previous 2D layer into a 1D vector. Output size = (?, 4608) 
* **FC1**: Fully connected layer. 128 units. Output size = (?, 128)
* **FC2**: Fully connected layer. 10 units. Output size = (?, 10)

<img src="imgs\P04_SimpleCNN.png" width="50%">

**<u>TODO</u>: Go through this part and understand how to use the Keras library to build a simple CNN model.**

In [15]:
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

In [16]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

## (1) Loading and preprocessing

In [19]:
# load data
def load_preprocess_MNIST():
    (X_train, y_train), (X_test, y_test) = mnist.load_data(os.sep.join([os.getcwd(),'data', 'mnist.npz']))

    # reshape to be [samples][width][height][pixels]
    X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32')

    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255

    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)

    return (X_train, y_train), (X_test, y_test)

(X_train, y_train), (X_test, y_test) = load_preprocess_MNIST()
num_classes = y_test.shape[1]

## (2) Constructing the Simple CNN 

** Creating a Sequential Model**

First, we create an empty Sequential model

In [20]:
model = Sequential()

**Adding CONV1**

First, we add the **CONV1** layer:

* `filters`: The first parameter is the number of filters.
* `kernel_size`: The second parameter is the size of each filter.
* `input_shape = (28, 28, 1)`: Since this is the first layer, the shape of the input must be specified. Here, we do not specify the batch volume to allow arbitrary batch sizes during training. 
* `strides = 1`: This argument is not really necessary. Default value for strides is 1. 
* `padding = valid` : This argument is not really necessary. Default value for padding is `valid` which means that there is no padding. Without padding, the output volume has  a smaller receptive field. To maintain the receptive field, set padding to `same`. 
* `data_format = 'channels_last'` indicates that the position of the channel dimension. Some tools like Tensorflow expects the channel at the last dimension (28, 28, 1) (Set to `'channels_last'`) whereas some tools like Theano expects the channel at the first dimension (1, 28, 28) (Set to `'channels_first'`).
* `activation = 'relu'`: use ReLU activations which has been proven to work very well for deep models. 

Since the input receptive field N = 28, the filter size F = 5 and stride S = 1, the output receptive field = (N - F)/S + 1 = (28 - 5)/1 + 1 = 24. Since we use 32 filters, the output layer of CONV1 has a size of (24, 24, 32) 

In [21]:
model.add(Conv2D(32, 
                 (5, 5), 
                 input_shape=(28, 28, 1), 
                 strides = 1,     
                 padding = 'valid', 
                 data_format = 'channels_last', 
                 activation='relu'))

**Adding POOL1**

Next, we add a max pooling layer:
* `pool_size = (2, 2)`: the first parameter is the size of the filter.
* `strides = 1`: This argument is not really necessary. Default value for strides is 1. 
* `padding = valid` : This argument is not really necessary. Default value for padding is `valid` which means that there is no padding. Without padding, the output volume has  a smaller receptive field. To maintain the receptive field, set padding to `same`. 


In [22]:
model.add( MaxPooling2D(pool_size=(2, 2),
                       strides = 2, 
                       padding = 'valid'))

**Adding DROPOUT1**

Then, we apply dropout to the output of POOL1. This helps to prevent overfitting. We set the dropout rate (ratio of units to be be drop) to 0.2.

In [23]:
model.add(Dropout(0.2))

**Flattening and Adding FC1**

First, we flatten the 2D input (shape = (12, 12, 32)) to a 1D vector (shape = (4608)). Then, we add a fully connected layer with 128 units with `ReLU` activation. Therefore, the output is a vector of size 128 per sample.

In [24]:
model.add(Flatten())
model.add(Dense(128, activation='relu'))

**Adding the output layer FC2**

Lastly, we insert the output layer. 
* Since there are 10 classes, the output layer has 10 units. 
* Use the softmax activation on the output layer to turn the output scores into probability-like values and allow one class of the 10 to be selected as the model’s output prediction.

In [25]:
model.add(Dense(num_classes, activation='softmax'))

We are done constructing our model. Let's show what we have built.

In [26]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 32)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4608)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               589952    
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1290      
Total params: 592,074
Trainable params: 592,074
Non-trainable params: 0
_________________________________________________________________


## (3) Configure the learning process (Compile)

In [27]:
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

## (4) Training the Model

The CNN is fit over 10 epochs with a batch size of 200.

In [29]:
# Fit the model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=1)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## (5) Evaluating the Model

Finally, the test dataset is used to evaluate the model and a classification error rate is printed. The error rate is somewhere around 1.03%. This is better than our simple multi-layer perceptron model above.

In [30]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))

CNN Error: 0.98%


---
# <u>  Part 3 : Larger Convolutional Neural Network for MNIST </u>

Now that we have seen how to create a simple CNN, let’s take a look at a model capable of close to state of the art results.

In this section, we will create a simple CNN for MNIST that demonstrates how to use all of the aspects of a modern CNN implementation, including Convolutional layers, Pooling layers and Dropout layers.

We shall create the following CNN model:
* **input layer**: Every input images has a shape of (28, 28, 1)
* **CONV1 layer**: 30 5x5 filters, stride = 1, padding = 0, activation = ReLU  
* **POOL1 layer**: 2x2 filters, stride = default to pool size, padding = 0. 
* **CONV2 layer**: 30 3x3 filters, stride = 1, padding = 0, activation = ReLU  
* **POOL2 layer**: 2x2 filters, stride = default to pool size, padding = 0. 
* **Dropout layer**: Regularization layer. Set the dropout rate to 0.2.
* **Flatten**: Not a real layer. Flatten the previous 2D layer into a 1D vector.
* **FC1**: Fully connected layer. 128 units, ReLU. 
* **FC2**: Fully connected layer. 50 units, ReLU. 
* **FC3**: Fully connected layer. 10 units, Softmax.

<img src="imgs\P04_LargerCNN.png" width="50%">

**Exercise #3: Identify the size of the output volume for each layer**
* **input layer**: <u>(?, 28, 28, 1)</u>
* **CONV1 layer**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **POOL1 layer**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **CONV2 layer**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **POOL2 layer**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **Dropout layer**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **Flatten**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **FC1**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **FC2**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
* **FC3**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

The code to load  first step is to import the classes and functions needed.

In [None]:
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load the dataset
(X_train, y_train), (X_test, y_test) = load_preprocess_MNIST()
num_classes = y_test.shape[1]

**Exercise #4: Construct the larger CNN described above.**  

In the following, we shall build the larger CNN model described above. Complete the function `larger_model`. 

In [36]:
def larger_model():
    ############################################################
    # Create the larger CNN model described above
    ############################################################
    model = None
    # Your code here
    model = Sequential()
    model.add(Conv2D(30, (5, 5), 
                 input_shape=(28, 28, 1), 
                 strides = 1,     
                 padding = 'valid', 
                 data_format = 'channels_last', 
                 activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2),
                       strides = 2, 
                       padding = 'valid'))
    model.add(Conv2D(15, (3, 3), 
                 strides = 1,     
                 padding = 'valid', 
                 data_format = 'channels_last', 
                 activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2),
                   strides = 2, 
                   padding = 'valid'))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(10, activation='softmax'))
    ############################################################
    # Add the learning modules (use the crossentropy loss 
    # function, adam optimizer and accuracy metric)
    ############################################################
    # Your code here
    
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

In [37]:
model = larger_model()

In [38]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 24, 24, 30)        780       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 12, 12, 30)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 10, 10, 15)        4065      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 5, 5, 15)          0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 5, 5, 15)          0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 375)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 128)               48128     
__________

** Exercise #5: Train the model**

Train the model using 10 epochs and a batch size of 200. Validate your data with the test set. Remember not to perform any model optimization using the test set.**

In [39]:
# Your code here
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=1)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


**Exercise #6: Evaluate your model**

Evaluate your model using the test set. You can expect an error rate of around 0.8%.

In [40]:
# Your code here
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))

CNN Error: 0.86%
