**Group-08**<br/>
<font style="color:red"> **Belhassen Ghoul <br/> Robin Ehrensperger <br/> Dominic Diedenhofen**</font>

<font style="color:green"><h1>Exercise 2 Benefit of BatchNorm with CIFAR10 Classification</h1></font>

In [1]:
#Library
import tensorflow as tf
import keras as K
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Conv2D, Dropout, BatchNormalization, Flatten, Dense,MaxPooling2D
from keras.layers.advanced_activations import LeakyReLU
from keras.initializers import HeNormal, GlorotNormal

from keras.datasets import cifar10

In [2]:
(xTrain,yTrain),(xTest,yTest) = cifar10.load_data()

In [5]:
#reshape and normalise data
xTrain = xTrain.reshape(xTrain.shape[0],32,32,3).astype("float32")/127.5-1.0
xTest = xTest.reshape(xTest.shape[0],32,32,3).astype("float32")/127.5-1.0

In [6]:
nClasses = 10
yTrain = K.utils.np_utils.to_categorical(yTrain, nClasses)
yTest = K.utils.np_utils.to_categorical(yTest, nClasses)

In [7]:
batchSize = 256
nepochs = 10
inputShape = (32,32,3)

a) Implement a mixed CNN/ML architecture (at least 3 conv and at least 2 fully connected
hidden layers, ReLU) for classifying CIFAR10 images - without any batchnorm layers nor
regularisation. To give an example, this may look as follows : (Conv2d, ReLU, MaxPool),
(Conv2d, ReLU, MaxPool), (Linear, ReLU), (Linear,ReLU), Linear.


In [31]:
def buildModel(activation):
    model = Sequential()
    model.add(Conv2D(32,kernel_size=3,strides=2,padding="same",activation=activation,input_shape = inputShape))
    model.add(MaxPooling2D(pool_size=2))

    model.add(Conv2D(64,kernel_size=3,strides=2,padding="same",activation=activation))
    model.add(MaxPooling2D(pool_size=2))

    model.add(Conv2D(128,kernel_size=3,padding="same",activation=activation))
    model.add(MaxPooling2D(pool_size=2))

    model.add(Dense(128,activation=activation))
    model.add(Flatten())
    model.add(Dense(10,activation="softmax"))


    return model

In [32]:
buildModel("relu").summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_14 (Conv2D)          (None, 16, 16, 32)        896       
                                                                 
 max_pooling2d_12 (MaxPoolin  (None, 8, 8, 32)         0         
 g2D)                                                            
                                                                 
 conv2d_15 (Conv2D)          (None, 4, 4, 64)          18496     
                                                                 
 max_pooling2d_13 (MaxPoolin  (None, 2, 2, 64)         0         
 g2D)                                                            
                                                                 
 conv2d_16 (Conv2D)          (None, 2, 2, 128)         73856     
                                                                 
 max_pooling2d_14 (MaxPoolin  (None, 1, 1, 128)      

b) Train this over a suitable number of epochs until you see stable test performance or before
you observe overfitting (by using Adam with default settings). Use this as baseline and
remember train and test curves (cost, accuracy) for later reference.


In [33]:
model = buildModel("relu")
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
log = model.fit(xTrain,yTrain,batch_size=batchSize,epochs=nepochs,validation_data=(xTest,yTest))

KeyboardInterrupt: 

In [None]:
f = plt.figure(figsize=(12,4))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
ax1.plot(log.history['loss'], label='Training loss')
ax1.plot(log.history['val_loss'], label='Testing loss')
ax1.legend()
ax1.grid()
ax2.plot(log.history['accuracy'], label='Training acc')
ax2.plot(log.history['val_accuracy'], label='Testing acc')
ax2.set_ylim(0.0,1.0)
ax2.legend()
ax2.grid()

loss_train, metric_train = model.evaluate(xTrain, yTrain, verbose=0)
print('Train accuracy:', metric_train)

loss_test, metric_test = model.evaluate(xTest, yTest, verbose=0)
print('Test accuracy:', metric_test)
print('Test loss:', loss_test)

c) Do the same (with the same number of epochs as used above), but now using tanh instead
of ReLU.


In [None]:
model = buildModel("tanh")
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
log = model.fit(xTrain,yTrain,batch_size=batchSize,epochs=nepochs,validation_data=(xTest,yTest))

In [None]:
f = plt.figure(figsize=(12,4))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
ax1.plot(log.history['loss'], label='Training loss')
ax1.plot(log.history['val_loss'], label='Testing loss')
ax1.legend()
ax1.grid()
ax2.plot(log.history['accuracy'], label='Training acc')
ax2.plot(log.history['val_accuracy'], label='Testing acc')
ax2.set_ylim(0.0,1.0)
ax2.legend()
ax2.grid()

loss_train, metric_train = model.evaluate(xTrain, yTrain, verbose=0)
print('Train accuracy:', metric_train)

loss_test, metric_test = model.evaluate(xTest, yTest, verbose=0)
print('Test accuracy:', metric_test)
print('Test loss:', loss_test)

d) Now add batchnorm after each Conv2d and Linear layer (before the non-linearity). Again
perform the training (over the same number of epochs). Do this twice : with ReLU and
Tanh.


<font style="color:red">Prof. Melchior told us druing the lecture it doesn't matter when we do the batchnorm and to avoid code dublication I added a variable to modify the activation :) </font>

In [34]:
def buildModel(activation):
    model = Sequential()
    model.add(Conv2D(32,kernel_size=3,strides=2,padding="same",activation=activation,input_shape = inputShape))
    model.add(MaxPooling2D(pool_size=2))
    model.add(BatchNormalization())

    model.add(Conv2D(64,kernel_size=3,strides=2,padding="same",activation=activation))
    model.add(MaxPooling2D(pool_size=2))
    model.add(BatchNormalization())

    model.add(Conv2D(128,kernel_size=3,padding="same",activation=activation))
    model.add(MaxPooling2D(pool_size=2))
    model.add(BatchNormalization())

    model.add(Dense(128,activation=activation))
    model.add(Flatten())
    model.add(Dense(10,activation="softmax"))


    return model

In [None]:
buildModel("relu").summary()

In [None]:
model = buildModel("relu")
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
log = model.fit(xTrain,yTrain,batch_size=batchSize,epochs=nepochs,validation_data=(xTest,yTest))

In [None]:
f = plt.figure(figsize=(12,4))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
ax1.plot(log.history['loss'], label='Training loss')
ax1.plot(log.history['val_loss'], label='Testing loss')
ax1.legend()
ax1.grid()
ax2.plot(log.history['accuracy'], label='Training acc')
ax2.plot(log.history['val_accuracy'], label='Testing acc')
ax2.set_ylim(0.0,1.0)
ax2.legend()
ax2.grid()

loss_train, metric_train = model.evaluate(xTrain, yTrain, verbose=0)
print('Train accuracy:', metric_train)

loss_test, metric_test = model.evaluate(xTest, yTest, verbose=0)
print('Test accuracy:', metric_test)
print('Test loss:', loss_test)

In [None]:
model = buildModel("tanh")
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
log = model.fit(xTrain,yTrain,batch_size=batchSize,epochs=nepochs,validation_data=(xTest,yTest))

In [None]:
f = plt.figure(figsize=(12,4))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
ax1.plot(log.history['loss'], label='Training loss')
ax1.plot(log.history['val_loss'], label='Testing loss')
ax1.legend()
ax1.grid()
ax2.plot(log.history['accuracy'], label='Training acc')
ax2.plot(log.history['val_accuracy'], label='Testing acc')
ax2.set_ylim(0.0,1.0)
ax2.legend()
ax2.grid()

loss_train, metric_train = model.evaluate(xTrain, yTrain, verbose=0)
print('Train accuracy:', metric_train)

loss_test, metric_test = model.evaluate(xTest, yTest, verbose=0)
print('Test accuracy:', metric_test)
print('Test loss:', loss_test)

e) Now study the impact when adding dropout regularisation before each fully connected
layer (not CNN). Do this for the architecture without and with batchnorm. Perform
according trainings. How far can you bring up the test accuracy by continuing the training
possibly over more epochs ?


In [None]:
def buildModel():
    model = Sequential()
    model.add(Conv2D(32,kernel_size=3,strides=2,padding="same",activation="relu",input_shape = inputShape))
    model.add(MaxPooling2D(pool_size=2))

    model.add(Conv2D(64,kernel_size=3,strides=2,padding="same",activation="relu"))
    model.add(MaxPooling2D(pool_size=2))

    model.add(Conv2D(128,kernel_size=3,padding="same",activation="relu"))
    model.add(MaxPooling2D(pool_size=2))

    model.add(Dropout(0.5))
    model.add(Dense(128,activation="relu"))
    model.add(Flatten())
    model.add(Dropout(0.5))
    model.add(Dense(10,activation="softmax"))


    return model

In [None]:
buildModel().summary()

In [None]:
model = buildModel()
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
log = model.fit(xTrain,yTrain,batch_size=batchSize,epochs=nepochs,validation_data=(xTest,yTest))

In [None]:
f = plt.figure(figsize=(12,4))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
ax1.plot(log.history['loss'], label='Training loss')
ax1.plot(log.history['val_loss'], label='Testing loss')
ax1.legend()
ax1.grid()
ax2.plot(log.history['accuracy'], label='Training acc')
ax2.plot(log.history['val_accuracy'], label='Testing acc')
ax2.set_ylim(0.0,1.0)
ax2.legend()
ax2.grid()

loss_train, metric_train = model.evaluate(xTrain, yTrain, verbose=0)
print('Train accuracy:', metric_train)

loss_test, metric_test = model.evaluate(xTest, yTest, verbose=0)
print('Test accuracy:', metric_test)
print('Test loss:', loss_test)

In [None]:
def buildModel():
    model = Sequential()
    model.add(Conv2D(32,kernel_size=3,strides=2,padding="same",activation="relu",input_shape = inputShape))
    model.add(MaxPooling2D(pool_size=2))
    model.add(BatchNormalization())

    model.add(Conv2D(64,kernel_size=3,strides=2,padding="same",activation="relu"))
    model.add(MaxPooling2D(pool_size=2))
    model.add(BatchNormalization())

    model.add(Conv2D(128,kernel_size=3,padding="same",activation="relu"))
    model.add(MaxPooling2D(pool_size=2))
    model.add(BatchNormalization())

    
    model.add(Dropout(0.5))
    model.add(Dense(128,activation="relu"))
    model.add(Flatten())
    model.add(Dropout(0.5))
    model.add(Dense(10,activation="softmax"))


    return model

In [None]:
buildModel().summary()

In [None]:
model = buildModel()
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
log = model.fit(xTrain,yTrain,batch_size=batchSize,epochs=nepochs,validation_data=(xTest,yTest))

In [None]:
f = plt.figure(figsize=(12,4))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
ax1.plot(log.history['loss'], label='Training loss')
ax1.plot(log.history['val_loss'], label='Testing loss')
ax1.legend()
ax1.grid()
ax2.plot(log.history['accuracy'], label='Training acc')
ax2.plot(log.history['val_accuracy'], label='Testing acc')
ax2.set_ylim(0.0,1.0)
ax2.legend()
ax2.grid()

loss_train, metric_train = model.evaluate(xTrain, yTrain, verbose=0)
print('Train accuracy:', metric_train)

loss_test, metric_test = model.evaluate(xTest, yTest, verbose=0)
print('Test accuracy:', metric_test)
print('Test loss:', loss_test)

f) Create one or several comparisons plots with the learning curves with/without batchnorm
also under the different other settings : with/without regularisation and with ReLU or
Tanh. Estimate a factor of speedup when using batchnorm (with/without reg, with ReLU
or Tanh). Discuss your findings and make a statement about whether the results are as
you expect.

<font style="color:red"> <h2>see</h2> plots above </br> as you can see from the plots and the evauluation the system overfits arount epoch 10 and with the relu activation we achieve a better accuracy. So to overcome the overfitting and get a better generalisation it's recomeded to use the dropout between 0.3-0.5 to get an accuracy around 68%</font>