# Convolutional neural networks (CNNs)

The simplest possible way to create a deep neural network is to connect the node in each layer with the node in the subsequent layer. We can say all the layers in the network structure are fully connected. The model is called Multi-Layer Perceptron which can be graphically represented as follows:

![](../img/multilayer-perceptron.png)

However, this can require a lot of parameters! If our input were a 256x256 color image (still quite small for a photograph), and our network had 1,000 nodes in the first hidden layer, then our first weight matrix would require (256x256x3)x1000 parameters. That’s nearly 200 million. Moreover the hidden layer would ignore all the spatial structure in the input image even though we know the local structure represents a powerful source of prior knowledge.

Convolutional neural networks incorporate convolutional layers. These layers associate each of their nodes with a small window, called a receptive field, in the previous layer, instead of connecting to the full layer. This allows us to first learn local features via transformations that are applied in the same way for the top right corner as for the bottom left. Then we collect all this local information to predict global qualities of the image (like whether or not it depicts a dog).

![](../img/depthcol.jpeg)
<pre>         (Image credit: Stanford cs231n <a>http://cs231n.github.io/assets/cnn/depthcol.jpeg)</a></pre>

In short, there are two new concepts you need to grok here. First, we’ll be introducting convolutional layers. Second, we’ll be interleaving them with pooling layers.

## Parameters

Each node in convolutional layer is associated with a 3D block (height x width x channel) in the input tensor. Moreover, the convolutional layer itself has multiple output channels. So the layer is parameterized by a 4 dimensional weight tensor, commonly called a convolutional kernel.

The output tensor is produced by sliding the kernel across the input image skipping locations according to a pre-defined stride (but we’ll just assume that to be 1 in this tutorial).

## Layers

A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers. Let's see how succinctly we can use BigDL to express a CNN.

In [11]:
from __future__ import print_function
import numpy as np
from bigdl.nn.keras.topology import Sequential
from bigdl.nn.keras.layer import *

### Convolutional Layer

In BigDL, we use Convolutiona2D() to apply a 2D convolution over an input image composed of several input planes. This function takes a few important parameters: number of convolutional filters(```nbFilter```), number of rows in the convolutional kernal(```nbRow```), number of columns in the convolutional kernal(```nbCol```), string representation of the activation function(```activation```) and the shape of the input layer(```input_shape```).

In [40]:
model = Sequential()
model.add(Convolution2D(20, 3, 3, activation="relu", input_shape=(1, 28, 28)))
input = np.random.random([2, 1, 28, 28])
output = model.forward(input)

creating: createKerasSequential
creating: createKerasConvolution2D


Note the shape. The number of examples (64) remains unchanged. The number of channels (also called filters) has increased to 20. And because the (3,3) kernel can only be applied in 26 different heights and widths (without the kernel busting over the image border), our output is 26,26. There are some weird padding tricks we can use when we want the input and output to have the same height and width dimensions, but we won’t get into that now.

### Average pooling

The other new component of this model is the pooling layer. Pooling gives us a way to downsample in the spatial dimensions. Early convnets typically used average pooling, but max pooling tends to give better results. The default value for downscaling the weights vertically and horizontally is (2, 2).

In [31]:
model = Sequential()
model.add(MaxPooling2D(input_shape = (1, 28, 28)))
input = np.random.random([2, 1, 28, 28])
output = model.forward(input)

creating: createKerasSequential
creating: createKerasMaxPooling2D


Note that the batch and channel components of the shape are unchanged but that the height and width have been downsampled from (26,26) to (13,13).

### Flatten Layer

It flattens the input without affecting the batch size.

In [37]:
model = Sequential()
model.add(Flatten(input_shape=(2, 2, 3)))
input = np.random.random([2, 2, 2, 3])
output = model.forward(input)
print(input)
output

creating: createKerasSequential
creating: createKerasFlatten
[[[[ 0.68587314  0.8255428   0.85463511]
   [ 0.44956081  0.07858544  0.15813265]]

  [[ 0.57645196  0.83081943  0.13454646]
   [ 0.75277687  0.3152822   0.78203292]]]


 [[[ 0.42641051  0.3740926   0.03601874]
   [ 0.38221221  0.68646959  0.55002778]]

  [[ 0.22202366  0.88417323  0.01115601]
   [ 0.71851614  0.65895761  0.78885971]]]]


array([[ 0.68587315,  0.82554281,  0.85463512,  0.44956082,  0.07858545,
         0.15813264,  0.57645196,  0.83081943,  0.13454646,  0.75277686,
         0.3152822 ,  0.78203291],
       [ 0.42641053,  0.37409261,  0.03601874,  0.38221222,  0.68646961,
         0.55002779,  0.22202367,  0.88417321,  0.01115601,  0.71851611,
         0.6589576 ,  0.78885972]], dtype=float32)

# Solve A Practical Problem

Now, we can use the previous knowledge to solve the classical problem "Handwritten Digit Classfication" using Convolutional Neural Network with BigDL.

## Get MINIST Data

In [56]:
from bigdl.dataset import mnist
from bigdl.util.common import *

mnist_path = "datasets/mnist"
(X_train, Y_train), (X_test, Y_test) = mnist.load_data(mnist_path)

print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

('Extracting', 'datasets/mnist/train-images-idx3-ubyte.gz')
('Extracting', 'datasets/mnist/train-labels-idx1-ubyte.gz')
('Extracting', 'datasets/mnist/t10k-images-idx3-ubyte.gz')
('Extracting', 'datasets/mnist/t10k-labels-idx1-ubyte.gz')
(60000, 28, 28, 1)
(10000, 28, 28, 1)
(60000,)
(10000,)


## Define The Model

Now we’re ready to define our model．

In [65]:
num_fc = 512
num_outputs = 10
model = Sequential()
model.add(Reshape((1, 28, 28), input_shape=(28, 28, 1)))
model.add(Convolution2D(20, 3, 3, activation="relu", input_shape=(1, 28, 28)))
model.add(MaxPooling2D())
model.add(Convolution2D(50, 3, 3, activation="relu", name="conv2_5x5"))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(num_fc, activation="relu", name="fc1"))
model.add(Dense(num_outputs, activation="softmax", name="fc2"))

print(model.get_input_shape())
print(model.get_output_shape())

creating: createKerasSequential
creating: createKerasReshape
creating: createKerasConvolution2D
creating: createKerasMaxPooling2D
creating: createKerasConvolution2D
creating: createKerasMaxPooling2D
creating: createKerasFlatten
creating: createKerasDense
creating: createKerasDense
(None, 28, 28, 1)
(None, 10)


## Configure Training

In [66]:
from bigdl.nn.criterion import *

model.compile(loss='sparse_categorical_crossentropy',
                  optimizer='sgd',
                metrics=['accuracy'])

creating: createDefault
creating: createSGD
creating: createClassNLLCriterion
creating: createTop1Accuracy


## Execute Training

*Note: See the loss and accuracy in the terminal. We will provide performance visualization in later topics.*

In [67]:
model.fit(X_train, Y_train, batch_size=8, nb_epoch=1,
validation_data=(X_test, Y_test))