## Convolutional Neural Network

The neural networks in previous notebooks were all constructed with fully connected layers, or dense layers. That is, each node in a layer connects to every node in the subsequent layer. It is expensive and slow to train a dense model, with one of the main obstacles being [vanishing gradient](https://en.wikipedia.org/wiki/Vanishing_gradient_problem). Convolutional Neural Network (CNN) uses filters to scan across input domain and learn multiple distinct features. The total number of parameters needed for training is much lower than that of a fully connected neural network. CNN was initially developed for image classification, and later applied to learning tasks in other fields. 

There are a few main steps for constructing CNN:
1. convolution
2. nonlinear activation
3. max-pooling
4. classification

Each step is crucial to carry out a learning task. The convolution step can be braodly interpreted as edge detection, and max-pooling can viewed as sampling/dimension reduction. As we go through one convolution layer after another, learned edges can be combined into shapes to become representations of class labels. Below example is adapted from https://github.com/keras-team/keras/tree/master/examples.  

In [None]:
import keras
from keras.datasets import mnist
from keras.models import Sequential 
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D

In [None]:
#hyperparameters
batch_size = 128
num_classes = 10
epochs = 12

#input image dimensions
img_rows, img_cols = 28, 28 

In [None]:
#load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

In [None]:
#convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#normalize input to ensure activation function is effective
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

### Train Model

In [None]:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy, 
              optimizer=keras.optimizers.Adadelta(), 
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))


In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

### References

http://cs231n.github.io/convolutional-networks/

http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/