# Implementing Batch Normalization on CIFAR10

## Batch Normalization
(See [5_Neural_Network/2_Tensorflow.ipynb](../../5_Neural_Network/2_Tensorflow.ipynb))

* Distribution on each mini-batch keeps changing.
* In order to learn changed distribuition, model deviates a little from correct direction.
* **Batch Normalization** prevents the mean of weights on each batch from being biased toward + or -, so that backpropagate weights smoothly.
* **Batch Normalization** is the concept of normalizing the output of the middle layer so that the next layer learns normalized output from the previous layer.

$$BN(x_i) = \gamma \cdot \frac{x_i - \mu_B}{\sqrt{{\sigma_B}^2 + \epsilon}}$$

in keras: ```model.add(tf.keras.layers.BatchNormalization())```

[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Coveriance Shift](https://arxiv.org/abs/1502.03167)

This improves accuracy on CIFAR10 learning (74% to 79%)

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import datasets 
from tensorflow.keras.utils import to_categorical

# load CIFAR10 Dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
Y_train = to_categorical(y_train)
Y_test = to_categorical(y_test)

print("Length of train set:", len(Y_train))
print("Shape of x_train:", x_train.shape[1:])

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Length of train set: 50000
Shape of x_train: (32, 32, 3)


In [2]:
img_rows, img_cols, channel = x_train.shape[1:]

# Unifying image size (reshape X_train, X_test)
X_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, channel)
X_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, channel)
input_shape = (img_rows, img_cols, channel)

# Normalize pixel value in image
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

# Label is already one-hot encoded
print(Y_train[0])
num_classes = 10
batch_size = 32
print(input_shape)

[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
(32, 32, 3)


In [3]:
x = layers.Input(shape=input_shape,  name='input')
h = layers.BatchNormalization()(x) # Batch Normalization
h = layers.Conv2D(32, kernel_size=(3, 3), activation='relu',  name='conv1')(h)
h = layers.Dropout(0.2)(h)
h = layers.BatchNormalization()(h) # Batch Normalization
h = layers.Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same', name='conv2')(h)
h = layers.MaxPooling2D(pool_size=(2, 2), name='pool1')(h)
h = layers.BatchNormalization()(h) # Batch Normalization
h = layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same', name='conv3')(h)
h = layers.MaxPooling2D(pool_size=(2, 2), name='pool2')(h)
h = layers.Flatten()(h)
h = layers.Dropout(0.2)(h)
h = layers.BatchNormalization()(h) # Batch Normalization
h = layers.Dense(512, activation='relu', name='hidden')(h)
h = layers.Dropout(0.2)(h)
h = layers.BatchNormalization()(h) # Batch Normalization
y = layers.Dense(num_classes, activation='softmax', name='output')(h)

model = models.Model(x, y)
print(model.summary())

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 32, 32, 3)]       0         
_________________________________________________________________
batch_normalization (BatchNo (None, 32, 32, 3)         12        
_________________________________________________________________
conv1 (Conv2D)               (None, 30, 30, 32)        896       
_________________________________________________________________
dropout (Dropout)            (None, 30, 30, 32)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 30, 30, 32)        128       
_________________________________________________________________
conv2 (Conv2D)               (None, 30, 30, 32)        9248      
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 15, 15, 32)        0     

In [4]:
epochs = 25
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train, Y_train, batch_size=batch_size,
          epochs=epochs, validation_split=0.1, verbose=2)

Epoch 1/25
1407/1407 - 39s - loss: 1.3825 - accuracy: 0.5258 - val_loss: 0.9884 - val_accuracy: 0.6540
Epoch 2/25
1407/1407 - 7s - loss: 0.9660 - accuracy: 0.6599 - val_loss: 0.8193 - val_accuracy: 0.7170
Epoch 3/25
1407/1407 - 7s - loss: 0.8282 - accuracy: 0.7101 - val_loss: 0.7393 - val_accuracy: 0.7464
Epoch 4/25
1407/1407 - 7s - loss: 0.7304 - accuracy: 0.7437 - val_loss: 0.7472 - val_accuracy: 0.7486
Epoch 5/25
1407/1407 - 7s - loss: 0.6470 - accuracy: 0.7731 - val_loss: 0.7050 - val_accuracy: 0.7680
Epoch 6/25
1407/1407 - 7s - loss: 0.5865 - accuracy: 0.7929 - val_loss: 0.6452 - val_accuracy: 0.7888
Epoch 7/25
1407/1407 - 6s - loss: 0.5271 - accuracy: 0.8147 - val_loss: 0.6438 - val_accuracy: 0.7852
Epoch 8/25
1407/1407 - 7s - loss: 0.4902 - accuracy: 0.8264 - val_loss: 0.6415 - val_accuracy: 0.7892
Epoch 9/25
1407/1407 - 7s - loss: 0.4498 - accuracy: 0.8424 - val_loss: 0.6542 - val_accuracy: 0.7956
Epoch 10/25
1407/1407 - 7s - loss: 0.4174 - accuracy: 0.8515 - val_loss: 0.6628 -

In [5]:
score = model.evaluate(X_test, Y_test)
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Test loss: 0.7623740434646606
Test accuracy: 0.7896000146865845
