## Convolutional Neural Network

<br>

The ANN in previous notebooks were all constructed with fully connected layers, or dense layers. That is, each node in a layer connects to every node in the subsequent layer. For image classification tasks, the number of parameters needed to learn an ANN with dense layers are often millions. Convolutional Neural Networks (CNN) use convolution filters to scan across input space and learn multiple distinct features. CNN is locally connected, each node only connects to a subset of nodes in the subsequent layer. The total number of parameters needed for training is much lower than that of a fully connected neural network, and is much easier to train. CNN was initially developed for computer visions tasks, and later applied to other fields. 

There are a few main steps for constructing CNN:
1. convolution
2. nonlinear activation
3. max-pooling
4. classification

Each step is crucial to carry out a learning task. The convolution step can be braodly interpreted as edge detection, and max-pooling can viewed as sampling/dimension reduction. As we go through one convolution layer after another, learned edges can be combined into shapes to become representations of class labels. Below code example is adapted from https://github.com/keras-team/keras/tree/master/examples.  

In [1]:
import keras
from keras.datasets import mnist
from keras.models import Sequential 
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D

Using TensorFlow backend.


In [2]:
#hyperparameters
batch_size = 128
num_classes = 10
epochs = 12

#input image dimensions
img_rows, img_cols = 28, 28 

In [3]:
#load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [4]:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

In [5]:
#convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#normalize input to ensure activation function is effective
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


### Train Model from Scratch

To train a Covolutional Neural Network we:
 1. Define architechture of CNN model.  
 2. ReLu is default activation function for CNN. 
 3. Dropout can be added to avoid overfitting.  
 4. The last layer is most often a Dense layer. 
 5. Train model, adjust hyperparameters such as epoch, filter size, stride, optimization. 
 6. Repeat. 

In [7]:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy, 
              optimizer=keras.optimizers.Adadelta(), 
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))


In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Overall MNIST dataset is a relatively easy classification task. The hand-written digits in images were centered, and there is no other objects in the images. A comparison of classify MNIST dataset using different algorithms http://yann.lecun.com/exdb/mnist/. Most of the top performing algorithms are CNN. The full power of CNN are better seen from [ImageNet](https://en.wikipedia.org/wiki/ImageNet) classification tasks. The best deep learning models that achieved high accuracy include VGG16, Inception, ResNet, etc., which all used convolutional filters but with different network structure.     

### References

http://cs231n.github.io/convolutional-networks/

http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/