## Deep Learning and Computer Vision

### Training a small network

Shani Israelov

Jean Monnet University, 2023

The aim of this exercise is to train a small network with dense layers for the classification of
handwritten digits. We are using the MNIST dataset, composed of 70,000 images: 60,000 for
training and 10,000 for testing. This is a classification problem with 10 categories

0/ Run the provided codes (Keras and Pytorch).

In [31]:
from keras.datasets import mnist     
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.utils import np_utils
from keras.optimizers import gradient_descent_v2
import matplotlib.pyplot as plt
import numpy as np

# Parameters
Sbatch=128
Nepochs=10
lr=1

# Load the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Image Preprocessing
X_train = X_train.astype('float32')  
X_test = X_test.astype('float32')
X_train /= 255                     
X_test /= 255

# Labels
nb_classes = 10
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

# Create the Network
model = Sequential()
model.add(Flatten())
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

# Loss and optimizer
model.compile(loss='categorical_crossentropy', optimizer=gradient_descent_v2.SGD(learning_rate=lr), metrics=['accuracy'])

# Training
model.fit(X_train, Y_train, batch_size=Sbatch, epochs=Nepochs, verbose=1)

# Test
score = model.evaluate(X_test, Y_test)
print('Test accuracy:', score[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.9217000007629395


Questions about the Keras code (if you have time, you can do that again on the Pytorch code, after):

1/ What is the size of each image ?

In [None]:
print("The size of each image is:", X_train[0,:].shape)

2/ Display some labels before and after the function ‘to_categorical’.

In [None]:
for i in range(10):
    print("Label for image ", i, "is y = ", y_train[i])

3/ What is the aim of the ‘Flatten’ function ?

*Answer:* 

Flattens the input. Does not affect the batch size. if the input is size (1, 10, 64) after Flatten() it would be 640

4/ How many layers do we have in the current network ?

*Answer:*

Only 1, the Dense layer.

Sequential() groups a linear stack of layers into a tf.keras.Model,

Flatten() flattens the input, 

Dense() is just your regular densely-connected NN layer,

Activation('softmax') returns values in range (0,1).

5/ How many weights to be learned ?

*Answer:*

Dense() is getting as an input the Flatten() result. since the data is of size 28x28, the Flatten result would be 784.

6/ What are the loss function, the optimization algorithm and its parameter(s) ?


*Answer:*

'
model.compile(loss='categorical_crossentropy', optimizer=gradient_descent_v2.SGD(learning_rate=lr), metrics=['accuracy'])
'

loss='categorical_crossentropy'

the loss function is used to compute the quantity that the the model should seek to minimize during training. For regression models, the commonly used loss function used is mean squared error function while for classification models predicting the probability, the loss function most commonly used is cross entropy.
categorical_crossentropy: Used as a loss function for multi-class classification model where there are two or more output labels. The output label is assigned one-hot category encoding value in form of 0s and 1. The output label, if present in integer form, is converted into categorical encoding using keras.utils to_categorical method.


optimizer=gradient_descent_v2.SGD(learning_rate=lr)

An optimizer is one of the two arguments required for compiling a Keras model.
SGD is Stochastic gradient descent optimizer.
update rule for parameter w with gradient g when momentum is 0:
w = w - learning_rate * g


7/ What does ‘469/469’ mean in the output results ?


*Answer:*

training_data_size divided by batch_size.

The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.

in each epoc, we go through out all the training set, meaning we go over 128 samples 469 times. 



In [41]:
X_train.shape[0]/Sbatch


468.75

8/ Observe the prediction for the first test image and compare it with the actual label. Display the
first test image

468.75

9/ Display the learned weights of each neuron as an image.


10/ Insert FC layers (no convolution) and observe the results.


11/ Change the learning rate and observe the impact on the results. Do not touch the batch size or
the epoch number yet.

12/ Apply a 5-fold cross validation to tune the learning rate.


13/ Provide your best architecture and the number of learned weights.


14/ Try different batch sizes and explain the influence on accuracy and training time.