# Convolutional Neural Network

In this second exercise-notebook we will play with Convolutional Neural Network (CNN). 

As you should have seen, a CNN is a feed-forward neural network tipically composed of Convolutional, MaxPooling and Dense layers. 

If the task implemented by the CNN is a classification task, the last Dense layer should use the **Softmax** activation, and the loss should be the **categorical crossentropy**.

Reference: [https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py]()

## Network Topology Model

A simple CNN, with one input branch and one output branch can be defined using a [Sequential](http://keras.io/models/#sequential) model and stacking together all its layers. 

In this exercise we want to build a (_quite shallow_) network which contains two 
[Convolution, Convolution, MaxPooling] stages, and two Dense layers.

To test a different optimizer, we will use [AdaDelta](http://keras.io/optimizers/), which is a bit more complex than the simple Vanilla SGD with momentum.

In [1]:
import os
from __future__ import print_function
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten, Activation
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.convolutional import Conv2D
from keras.optimizers import Adadelta
from keras.optimizers import SGD
from keras.layers.normalization import BatchNormalization
os.environ["CUDA_VISIBLE_DEVICES"]="1"


Using TensorFlow backend.


In [2]:

input_shape = (3, 32, 32)
nb_classes = 10


nb_epoch = 10 # kept very low! Please increase if you have GPU

batch_size = 64
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
nb_pool = 2
# convolution kernel size
nb_conv = 3

# sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
adadelta = Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)

## [conv@32x3x3+relu]x2 --> MaxPool@2x2 --> DropOut@0.25 -->
## [conv@64x3x3+relu]x2 --> MaxPool@2x2 --> DropOut@0.25 -->
## Flatten--> FC@512+relu --> DropOut@0.5 --> FC@nb_classes+SoftMax
## NOTE: each couple of Conv filters must have `border_mode="same"` and `"valid"`, respectively

## your code here



In [None]:
nb_classes = 10
img_rows, img_cols = 32, 32
shape_ord = (img_rows, img_cols, 3)
nb_filters_small = 4
nb_pool_small = 4
model = Sequential()

model.add(Conv2D(nb_filters_small, (nb_conv, nb_conv), padding='valid', 
                 input_shape=shape_ord))  # note: the very first layer **must** always specify the input_shape

model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(nb_pool_small, nb_pool_small)))

model.add(Flatten())
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

In [17]:
# %load solutions/sol_223.py
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import Adadelta

input_shape = (32, 32, 3)
nb_classes = 10

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=input_shape))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer=Adadelta(),
              metrics=['accuracy'])

In [None]:
img_rows, img_cols = 32, 32
shape_ord = (img_rows, img_cols, 3)
nb_filters_small = 4
nb_pool_small = 4
model = Sequential()

model.add(Conv2D(32, (3, 3), padding='valid', 
                 input_shape=shape_ord))  # note: the very first layer **must** always specify the input_shape

# model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(nb_pool_small, nb_pool_small)))
model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), padding='valid', 
                 input_shape=shape_ord))  # note: the very first layer **must** always specify the input_shape

# model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(nb_pool_small, nb_pool_small)))

model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adadelta,
              metrics=['accuracy'])

### Understanding layer shapes

An important feature of Keras layers is that each of them has an `input_shape` attribute, which you can use to visualize the shape of the input tensor, and an `output_shape` attribute, for inspecting the shape of the output tensor.

As we can see, the input shape of the first convolutional layer corresponds to the `input_shape` attribute (which must be specified by the user). 

In this case, it is a `32x32` image with three color channels. 

Since this convolutional layer has the `border_mode` set to `same`, its output width and height will remain the same, and the number of output channel will be equal to the number of filters learned by the layer, 16. 

The following convolutional layers, instead, have the default `border_mode`, and therefore reduce width and height by $(k-1)$, where $k$ is the size of the kernel. 

MaxPooling layers, instead, reduce width and height of the input tensor, but keep the same number of channels. Activation layers, of course, don't change the shape.

In [4]:
for i, layer in enumerate(model.layers):
    print ("Layer", i, "\t", layer.input_shape, "\t", layer.output_shape)

Layer 0 	 (None, 32, 32, 3) 	 (None, 32, 32, 32)
Layer 1 	 (None, 32, 32, 32) 	 (None, 32, 32, 32)
Layer 2 	 (None, 32, 32, 32) 	 (None, 30, 30, 32)
Layer 3 	 (None, 30, 30, 32) 	 (None, 30, 30, 32)
Layer 4 	 (None, 30, 30, 32) 	 (None, 15, 15, 32)
Layer 5 	 (None, 15, 15, 32) 	 (None, 15, 15, 32)
Layer 6 	 (None, 15, 15, 32) 	 (None, 15, 15, 64)
Layer 7 	 (None, 15, 15, 64) 	 (None, 15, 15, 64)
Layer 8 	 (None, 15, 15, 64) 	 (None, 13, 13, 64)
Layer 9 	 (None, 13, 13, 64) 	 (None, 13, 13, 64)
Layer 10 	 (None, 13, 13, 64) 	 (None, 6, 6, 64)
Layer 11 	 (None, 6, 6, 64) 	 (None, 6, 6, 64)
Layer 12 	 (None, 6, 6, 64) 	 (None, 2304)
Layer 13 	 (None, 2304) 	 (None, 512)
Layer 14 	 (None, 512) 	 (None, 512)
Layer 15 	 (None, 512) 	 (None, 512)
Layer 16 	 (None, 512) 	 (None, 10)
Layer 17 	 (None, 10) 	 (None, 10)


### Understanding weights shape
In the same way, we can visualize the shape of the weights learned by each layer. In particular, Keras lets you inspect weights by using the `get_weights` method of a layer object. This will return a list with two elements, the first one being the weight tensor and the second one being the bias vector.

Of course, MaxPooling layer don't have any weight tensor, since they don't have learnable parameters. Convolutional layers, instead, learn a $(n_o, n_i, k, k)$ or **$(k, k, n_i, n_o)$** weight tensor, where $k$ is the size of the kernel, $n_i$ is the number of channels of the input tensor, and $n_o$ is the number of filters to be learned. For each of the $n_o$ filters, a bias is also learned. Dense layers learn a $(n_i, n_o)$ weight tensor, where $n_o$ is the output size and $n_i$ is the input size of the layer. Each of the $n_o$ neurons also has a bias.

In [5]:
for i, layer in enumerate(model.layers):
    if len(layer.get_weights()) > 0:
        print("Layer", i, "\t", layer.get_weights()[0].shape, "\t", layer.get_weights()[1].shape)

Layer 0 	 (3, 3, 3, 32) 	 (32,)
Layer 2 	 (3, 3, 32, 32) 	 (32,)
Layer 6 	 (3, 3, 32, 64) 	 (64,)
Layer 8 	 (3, 3, 64, 64) 	 (64,)
Layer 13 	 (2304, 512) 	 (512,)
Layer 16 	 (512, 10) 	 (10,)


In [6]:
from keras import backend as K
K.image_data_format()

'channels_last'

# Training the network

We will train our network on the **CIFAR10** [dataset](https://www.cs.toronto.edu/~kriz/cifar.html), which contains `50,000` 32x32 color training images, labeled over 10 categories, and 10,000 test images. 

As this dataset is also included in Keras datasets, we just ask the `keras.datasets` module for the dataset.

Training and test images are normalized to lie in the $\left[0,1\right]$ interval.

In [7]:
from keras.datasets import cifar10
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255

In [None]:
nb_epoch = 40
batch_size = 2048
hist = model.fit(X_train, Y_train, batch_size=batch_size, 
                 epochs=nb_epoch, verbose=1, 
                 validation_data=(X_test, Y_test))

In [None]:
% matplotlib inline
from matplotlib import pyplot as plt
imgplot = plt.imshow(X_train[100])

In [None]:
X_train.shape

To reduce the risk of overfitting, we also apply some image transformation, like rotations, shifts and flips. All these can be easily implemented using the Keras [Image Data Generator](http://keras.io/preprocessing/image/).

#### Warning: The following cells may be computational Intensive....

In [8]:
from keras.preprocessing.image import ImageDataGenerator

generated_images = ImageDataGenerator(
    featurewise_center=True,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=True,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False)  # randomly flip images

generated_images.fit(X_train)

Now we can start training. 

At each iteration, a batch of 500 images is requested to the `ImageDataGenerator` object, and then fed to the network.

In [9]:
X_train.shape

(50000, 32, 32, 3)

In [14]:
gen = generated_images.flow(X_train, Y_train, batch_size=1024, shuffle=True)
X_batch, Y_batch = next(gen)

In [15]:
X_batch.shape

(1024, 32, 32, 3)

In [None]:
from keras.utils import generic_utils

n_epochs = 20
for e in range(n_epochs):
    print('Epoch', e)
    print('Training...')
    progbar = generic_utils.Progbar(X_train.shape[0])
    
    for i, (X_batch, Y_batch) in enumerate(generated_images.flow(X_train, Y_train, batch_size=500, shuffle=True)):
        
        loss = model.train_on_batch(X_batch, Y_batch)
        
        if i % 10 ==0:
            print (i, 'loss', loss[0], 'acc', loss[1])
        
#         progbar.add(X_batch.shape[0], values=[('train loss', loss[0])] )
#         progbar.add(X_batch.shape[0], values=[('accuracy', loss[1])] )
        

Epoch 0
Training...
0 loss 2.33742 acc 0.088
10 loss 2.15112 acc 0.2
20 loss 2.12803 acc 0.21
30 loss 2.05215 acc 0.238
40 loss 2.01737 acc 0.282
50 loss 1.95301 acc 0.294
60 loss 1.9382 acc 0.272
70 loss 1.94697 acc 0.294
80 loss 1.91359 acc 0.268
90 loss 1.87365 acc 0.324
100 loss 1.9896 acc 0.278
110 loss 1.85196 acc 0.3
120 loss 1.75253 acc 0.348
130 loss 1.77503 acc 0.356
140 loss 1.75843 acc 0.36
150 loss 1.76229 acc 0.362
160 loss 1.78028 acc 0.338
170 loss 1.7559 acc 0.352
180 loss 1.77524 acc 0.36
190 loss 1.78416 acc 0.37
200 loss 1.72367 acc 0.382
210 loss 1.74691 acc 0.396
220 loss 1.71814 acc 0.378
230 loss 1.60921 acc 0.432
240 loss 1.63372 acc 0.412
250 loss 1.75295 acc 0.338
260 loss 1.63457 acc 0.406
270 loss 1.59478 acc 0.4
280 loss 1.56465 acc 0.444
290 loss 1.5996 acc 0.402
300 loss 1.61516 acc 0.432
310 loss 1.74865 acc 0.358
320 loss 1.73145 acc 0.386
330 loss 1.60641 acc 0.412
340 loss 1.59576 acc 0.446
350 loss 1.58065 acc 0.406
360 loss 1.55378 acc 0.432
370 lo

In [None]:
print loss

In [None]:
loss = model.train_on_batch(X_batch, Y_batch)

In [None]:
loss

In [None]:
model.metrics_names