## Chapter 11 - Training Deep Neural Networks

In [1]:
import numpy as np
import pandas as pd
from tensorflow import keras
from sklearn.model_selection import train_test_split

loading CIFAR10 dataset and splitting it.

In [2]:
(X_train , y_train) , (X_test , y_test) = keras.datasets.cifar10.load_data()

In [3]:
X_train , X_val , y_train , y_val = train_test_split(X_train , y_train , train_size=40000 , random_state=42)
print(X_train.shape , X_val.shape , X_test.shape)

(40000, 32, 32, 3) (10000, 32, 32, 3) (10000, 32, 32, 3)


building DNN with 20 hidden layers of 100 neurons at each layer, using He initialization, ELU activation and nadam optimizer.

In [4]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (32,32,3))
])
for layer in range(20):
    model.add(keras.layers.Dense(100 , kernel_initializer = "he_normal" ,
                                activation = 'elu'))

model.add(keras.layers.Dense(10 , activation = "softmax"))

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 3072)              0         
_________________________________________________________________
dense (Dense)                (None, 100)               307300    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_2 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_3 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_4 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_5 (Dense)              (None, 100)               1

In [6]:
early_stop = keras.callbacks.EarlyStopping(patience=5)
model.compile(loss = "sparse_categorical_crossentropy" , optimizer = "nadam" , metrics=["accuracy"])

training the DNN with early stopping, we got 1.6636 val_loss and 0.3944 accuracy on the validation set after 26 epochs.

In [7]:
history = model.fit(X_train , y_train , validation_data=(X_val, y_val) , callbacks=[early_stop] ,
                    epochs = 50 , batch_size = 64)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50


training the DNN model but this time adding batch normalization after each Dense layer and the Input layer, before activation layer.

In [12]:
model2 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (32,32,3)) ,
    keras.layers.BatchNormalization()
])
for layer in range(20):
    model2.add(keras.layers.Dense(100 , kernel_initializer = "he_normal"))
    model2.add(keras.layers.BatchNormalization())
    model2.add(keras.layers.Activation("elu"))
               
model2.add(keras.layers.Dense(10 , activation = "softmax"))
               
model2.compile(loss = "sparse_categorical_crossentropy" , optimizer = "nadam" , metrics=["accuracy"])               

training the DNN, we got 1.3837 val_loss and 0.5299 val accuracy after 16 epochs, so the model is performing much better and faster after using batch normalization.

In [13]:
history2 = model2.fit(X_train,y_train,validation_data=(X_val,y_val) , batch_size = 64,epochs = 50 , 
                     callbacks=[early_stop])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50


this time we will use self-normalize DNN, so we standardize the inputs, we use lecun initialization and SELU activation function (and the DNN is sequential).

In [14]:
X_mean = X_train.mean(axis=0)
X_std = X_train.std(axis=0)
X_train_stand = (X_train - X_mean) / X_std
X_val_stand= (X_val - X_mean) / X_std
X_test_stand = (X_test - X_mean) / X_std

In [15]:
model3 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (32,32,3)) ,
])
for layer in range(20):
    model3.add(keras.layers.Dense(100 , kernel_initializer = "lecun_normal" , activation = "selu"))
               
model3.add(keras.layers.Dense(10 , activation = "softmax"))
               
model3.compile(loss = "sparse_categorical_crossentropy" , optimizer = "nadam" , metrics=["accuracy"])               

training the DNN, we got 1.4888 val_loss and 0.4906 val accuracy after 14 epcohs, so this DNN performing rather similary to the DNN with batch normalization.

In [16]:
history3 = model3.fit(X_train_stand , y_train , validation_data=(X_val_stand , y_val) , batch_size = 64 , epochs = 50,
                     callbacks=[early_stop])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50


adding to the previous DNN Alphadropout layers with ratio of 0.1 after Dense layers (Alphadropout works in self-normalize networks).

In [21]:
model4 = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (32,32,3)) ,
])
for layer in range(20):
    model4.add(keras.layers.Dense(100 , kernel_initializer = "lecun_normal" , activation = "selu"))
    model4.add(keras.layers.AlphaDropout(0.1,seed=42))
               
model4.add(keras.layers.Dense(10 , activation = "softmax"))
               
model4.compile(loss = "sparse_categorical_crossentropy" , optimizer = "nadam" , metrics=["accuracy"])   

training the DNN, we got 1.9014 val_loss and 0.3631 val accuracy after 5 epochs, the model converges much faster when using rather small early stopping patience, but with worst results.

In [22]:
history4 = model4.fit(X_train_stand , y_train , validation_data=(X_val_stand , y_val) , batch_size = 64 , epochs = 50,
                     callbacks=[early_stop])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
