# Exercise 1 - Lenet5 on Fashion MNIST
## Stav Bar-Sheshet and Shlomo Spitzer

**Initialize and import Fashion MNIST data**

In [0]:
import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# change dim to match Keras model
x_train = np.expand_dims(x_train, -1)
y_train = tf.keras.utils.to_categorical(y_train, 10)
x_test = np.expand_dims(x_test, -1)
y_test = tf.keras.utils.to_categorical(y_test, 10)

print("x_train shape:", x_train.shape, "y_train shape:", y_train.shape)

# Normalize the data so each pixel lies in [0,1] instead of [0,255]
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
x_train shape: (60000, 28, 28, 1) y_train shape: (60000, 10)


# Full Features Net 

Including batch normaliztion, weight decay and dropout.

**Dropout note** - Keras use inverted dropout, which means that the weights are scaled up by (1/p) at training phase and not transform in testing.

**Network construction**


In [0]:
from keras import regularizers
model1 = tf.keras.models.Sequential()
# 1st Input layer - 28x28x1 - normalized

# 2nd Conv Layer - 64 features, 5x5 kernal and pooling 2
model1.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu', input_shape=(28,28,1), kernel_regularizer=regularizers.l2(0.01), bias_regularizer=regularizers.l2(0.01)))
model1.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model1.add(tf.keras.layers.Dropout(0.25))
model1.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# 2nd Conv Layer - 32 features, 5x5 kernal and pooling 2
model1.add(tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu', kernel_regularizer=regularizers.l2(0.01), bias_regularizer=regularizers.l2(0.01)))
model1.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model1.add(tf.keras.layers.Dropout(0.25))
model1.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# 3rd Fully Connected Layer - 256 neurons
model1.add(tf.keras.layers.Flatten())
model1.add(tf.keras.layers.Dense(256))
model1.add(tf.keras.layers.Activation('relu'))
model1.add(tf.keras.layers.Dropout(0.25))
model1.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# 5th Fully Connected Layer - 256 neurons
model1.add(tf.keras.layers.Flatten())
model1.add(tf.keras.layers.Dense(128))
model1.add(tf.keras.layers.Activation('relu'))
model1.add(tf.keras.layers.Dropout(0.25))
model1.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# Output Layer
model1.add(tf.keras.layers.Dense(10))
model1.add(tf.keras.layers.Activation('softmax'))
model1.summary()



Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 64)        1664      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 64)        0         
_________________________________________________________________
batch_normalization (BatchNo (None, 14, 14, 64)        256       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 32)        51232     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 7, 7, 32)          0         
__________

**Compile** 

Log loss (cross entropy) is chosen.

Adam optimizer is chosen with it's defult parameters (lr=0.001)

The net's performance is judged by it accuracy.

!! Same compilation is chosen for all the networks.


In [0]:
model1.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

**Train**

In [0]:
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='model1.weights.best.hdf5', verbose = 1, save_best_only=True)
history1 = model1.fit(x_train,
         y_train,
         batch_size=100,
         epochs=15,
         validation_data=(x_test, y_test),
         callbacks=[checkpointer])

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 00001: val_loss improved from inf to 0.59170, saving model to model1.weights.best.hdf5
Epoch 2/15
Epoch 00002: val_loss improved from 0.59170 to 0.40216, saving model to model1.weights.best.hdf5
Epoch 3/15
Epoch 00003: val_loss did not improve from 0.40216
Epoch 4/15
Epoch 00004: val_loss improved from 0.40216 to 0.38308, saving model to model1.weights.best.hdf5
Epoch 5/15
Epoch 00005: val_loss did not improve from 0.38308
Epoch 6/15
Epoch 00006: val_loss improved from 0.38308 to 0.35647, saving model to model1.weights.best.hdf5
Epoch 7/15
Epoch 00007: val_loss improved from 0.35647 to 0.32827, saving model to model1.weights.best.hdf5
Epoch 8/15
Epoch 00008: val_loss did not improve from 0.32827
Epoch 9/15
Epoch 00009: val_loss did not improve from 0.32827
Epoch 10/15
Epoch 00010: val_loss improved from 0.32827 to 0.31656, saving model to model1.weights.best.hdf5
Epoch 11/15
Epoch 00011: val_loss did not improve from 0.

**Evaluate**

In [0]:
# Load the weights with the best validation accuracy
model1.load_weights('model1.weights.best.hdf5')

# Evaluate the model on test set
train_score = model1.evaluate(x_train, y_train, verbose=0)
test_score = model1.evaluate(x_test, y_test, verbose=0)

# Print test accuracy
print('Train accuracy:', train_score[1])
print('Test accuracy:', test_score[1])

**Plot Accuracy and Loss**

In [0]:
import matplotlib.pyplot as plt

def plot_acc_lss(history):
    # list all data in history
    # print(history.history.keys())
    # summarize history for accuracy
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    # summarize history for loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
  
plot_acc_lss(history1)


#Clean Net

No batch normalization, weight decay or dropout.

**Network construction**

In [0]:
model2 = tf.keras.models.Sequential()
# 1st Input layer - 28x28x1 - normalized

# 2nd Conv Layer - 64 features, 5x5 kernal and pooling 2
model2.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu', input_shape=(28,28,1)))
model2.add(tf.keras.layers.MaxPooling2D(pool_size=2))

# 2nd Conv Layer - 32 features, 5x5 kernal and pooling 2
model2.add(tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu'))
model2.add(tf.keras.layers.MaxPooling2D(pool_size=2))

# 3rd Fully Connected Layer - 256 neurons
model2.add(tf.keras.layers.Flatten())
model2.add(tf.keras.layers.Dense(256))
model2.add(tf.keras.layers.Activation('relu'))

# 5th Fully Connected Layer - 256 neurons
model2.add(tf.keras.layers.Flatten())
model2.add(tf.keras.layers.Dense(128))
model2.add(tf.keras.layers.Activation('relu'))

# Output Layer
model2.add(tf.keras.layers.Dense(10))
model2.add(tf.keras.layers.Activation('softmax'))
model2.summary()

In [0]:
model2.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

**Train**

In [0]:
from keras.callbacks import ModelCheckpoint

# Callback only for the best results
checkpointer = ModelCheckpoint(filepath='model2.weights.best.hdf5', verbose = 1, save_best_only=True)

# Fit the model
history2 = model2.fit(x_train,
         y_train,
         batch_size=100,
         epochs=15,
         validation_data=(x_test, y_test),
         callbacks=[checkpointer])

**Evaluate**

In [0]:
# Load the weights with the best validation accuracy
model2.load_weights('model2.weights.best.hdf5')

# Evaluate the model on test set
train_score = model2.evaluate(x_train, y_train, verbose=0)
test_score = model2.evaluate(x_test, y_test, verbose=0)

# Print test accuracy
print('Train accuracy:', train_score[1])
print('Test accuracy:', test_score[1])

**Plot Accuracy and Loss**

In [0]:
plot_acc_lss(history2)

# Dopout Net

Clean network including only dropout.

**Dropout note** - Keras use inverted dropout, which means that the weights are scaled up by (1/p) at training phase and not transform in testing.

**Network construction**


In [0]:
model3 = tf.keras.models.Sequential()
# 1st Input layer - 28x28x1 - normalized

# 2nd Conv Layer - 64 features, 5x5 kernal and pooling 2
model3.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu', input_shape=(28,28,1)))
model3.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model3.add(tf.keras.layers.Dropout(0.25))

# 2nd Conv Layer - 32 features, 5x5 kernal and pooling 2
model3.add(tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu'))
model3.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model3.add(tf.keras.layers.Dropout(0.25))

# 3rd Fully Connected Layer - 256 neurons
model3.add(tf.keras.layers.Flatten())
model3.add(tf.keras.layers.Dense(256))
model3.add(tf.keras.layers.Activation('relu'))
model3.add(tf.keras.layers.Dropout(0.25))

# 5th Fully Connected Layer - 256 neurons
model3.add(tf.keras.layers.Flatten())
model3.add(tf.keras.layers.Dense(128))
model3.add(tf.keras.layers.Activation('relu'))
model3.add(tf.keras.layers.Dropout(0.25))

# Output Layer
model3.add(tf.keras.layers.Dense(10))
model3.add(tf.keras.layers.Activation('softmax'))
model3.summary()

**Compile** 

Log loss (cross entropy) is chosen.

Adam optimizer is chosen with it's defult parameters (lr=0.001)

The net's performance is judged by it accuracy.

!! Same compilation is chosen for all the networks.


In [0]:
model3.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

**Train**

In [0]:
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='model3.weights.best.hdf5', verbose = 1, save_best_only=True)
history3 = model3.fit(x_train,
         y_train,
         batch_size=100,
         epochs=15,
         validation_data=(x_test, y_test),
         callbacks=[checkpointer])

**Evaluate**

In [0]:
# Load the weights with the best validation accuracy
model3.load_weights('model3.weights.best.hdf5')

# Evaluate the model on test set
train_score = model3.evaluate(x_train, y_train, verbose=0)
test_score = model3.evaluate(x_test, y_test, verbose=0)

# Print test accuracy
print('Train accuracy:', train_score[1])
print('Test accuracy:', test_score[1])

**Plot Convergence and Loss**

In [0]:
plot_acc_lss(history3)

# Weight Decay Net 

Clean network including only weight decay.

**Network construction**


In [0]:
from keras import regularizers
model4 = tf.keras.models.Sequential()
# 1st Input layer - 28x28x1 - normalized

# 2nd Conv Layer - 64 features, 5x5 kernal and pooling 2
model4.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu', input_shape=(28,28,1), kernel_regularizer=regularizers.l2(0.01), bias_regularizer=regularizers.l2(0.01)))
model4.add(tf.keras.layers.MaxPooling2D(pool_size=2))

# 2nd Conv Layer - 32 features, 5x5 kernal and pooling 2
model4.add(tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu', kernel_regularizer=regularizers.l2(0.01), bias_regularizer=regularizers.l2(0.01)))
model4.add(tf.keras.layers.MaxPooling2D(pool_size=2))

# 3rd Fully Connected Layer - 256 neurons
model4.add(tf.keras.layers.Flatten())
model4.add(tf.keras.layers.Dense(256))
model4.add(tf.keras.layers.Activation('relu'))

# 5th Fully Connected Layer - 256 neurons
model4.add(tf.keras.layers.Flatten())
model4.add(tf.keras.layers.Dense(128))
model4.add(tf.keras.layers.Activation('relu'))

# Output Layer
model4.add(tf.keras.layers.Dense(10))
model4.add(tf.keras.layers.Activation('softmax'))
model4.summary()



**Compile** 

Log loss (cross entropy) is chosen.

Adam optimizer is chosen with it's defult parameters (lr=0.001)

The net's performance is judged by it accuracy.

!! Same compilation is chosen for all the networks.


In [0]:
model4.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

**Train**

In [0]:
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='model4.weights.best.hdf5', verbose = 1, save_best_only=True)
history4 = model4.fit(x_train,
         y_train,
         batch_size=100,
         epochs=15,
         validation_data=(x_test, y_test),
         callbacks=[checkpointer])

**Evaluate**

In [0]:
# Load the weights with the best validation accuracy
model4.load_weights('model4.weights.best.hdf5')

# Evaluate the model on test set
train_score = model4.evaluate(x_train, y_train, verbose=0)
test_score = model4.evaluate(x_test, y_test, verbose=0)

# Print test accuracy
print('Train accuracy:', train_score[1])
print('Test accuracy:', test_score[1])

**Plot Accuracy and Loss**

In [0]:
plot_acc_lss(history4)

# Batch Normalization Net 

Clean network including only batch normaliztion.

**Network construction**


In [0]:
model5 = tf.keras.models.Sequential()
# 1st Input layer - 28x28x1 - normalized

# 2nd Conv Layer - 64 features, 5x5 kernal and pooling 2
model5.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu', input_shape=(28,28,1)))
model5.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model5.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# 2nd Conv Layer - 32 features, 5x5 kernal and pooling 2
model5.add(tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu'))
model5.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model5.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# 3rd Fully Connected Layer - 256 neurons
model5.add(tf.keras.layers.Flatten())
model5.add(tf.keras.layers.Dense(256))
model5.add(tf.keras.layers.Activation('relu'))
model5.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# 5th Fully Connected Layer - 256 neurons
model5.add(tf.keras.layers.Flatten())
model5.add(tf.keras.layers.Dense(128))
model5.add(tf.keras.layers.Activation('relu'))
model5.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))

# Output Layer
model5.add(tf.keras.layers.Dense(10))
model5.add(tf.keras.layers.Activation('softmax'))
model5.summary()


**Compile** 

Log loss (cross entropy) is chosen.

Adam optimizer is chosen with it's defult parameters (lr=0.001)

The net's performance is judged by it accuracy.

!! Same compilation is chosen for all the networks.


In [0]:
model5.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

**Train**

In [0]:
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='model5.weights.best.hdf5', verbose = 1, save_best_only=True)
history5 = model5.fit(x_train,
         y_train,
         batch_size=100,
         epochs=15,
         validation_data=(x_test, y_test),
         callbacks=[checkpointer])

**Evaluate**

In [0]:
# Load the weights with the best validation accuracy
model5.load_weights('model5.weights.best.hdf5')

# Evaluate the model on test set
train_score = model5.evaluate(x_train, y_train, verbose=0)
test_score = model5.evaluate(x_test, y_test, verbose=0)

# Print test accuracy
print('Train accuracy:', train_score[1])
print('Test accuracy:', test_score[1])

**Plot Accuracy and Loss**

In [0]:
plot_acc_lss(history5)