<a href="https://colab.research.google.com/github/kuldeep725/AI/blob/master/MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.datasets import mnist
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
from keras import losses, optimizers
from sklearn.preprocessing import Normalizer
from keras import backend as K


In [0]:
batch_size = 128
num_classes = 10
epochs = 12

In [136]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
image_vector_size = x_train.shape[1]*x_train.shape[2]
X_tot = x_train.reshape(x_train.shape[0], image_vector_size)
x_train.shape, y_train.shape, x_test.shape, y_test.shape

((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

In [137]:
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.083333, random_state=42)
x_train.shape, y_train.shape, x_val.shape, y_val.shape

((55000, 28, 28), (55000,), (5000, 28, 28), (5000,))

In [0]:
# Flatten the images
x_train = x_train.reshape(x_train.shape[0], image_vector_size)
x_val = x_val.reshape(x_val.shape[0], image_vector_size)
x_test = x_test.reshape(x_test.shape[0], image_vector_size)

# Convert to "one-hot" vectors using the to_categorical function
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

### (c) Classify the dataset using a feed-forward neural network. Vary the hyperparameters as follows:

In [0]:
def build_model(layer_sizes, activation='sigmoid') :
  model = Sequential()
  input_shape = (image_vector_size,)
  model.add(Dense(layer_sizes[0], input_shape=input_shape, activation='sigmoid'))
  for size in layer_sizes[1:] :
    model.add(Dense(size, activation=activation))
  model.add(Dense(num_classes, activation='softmax'))
  return model

def evaluate_network(model=None, batch_size=batch_size, epochs=epochs, x_train=x_train, x_val=x_val,
                     alpha=0.1) :
  sgd = optimizers.SGD(lr=alpha)
  model.compile(loss=losses.categorical_crossentropy,
            optimizer=sgd,
            metrics=['accuracy'])
  model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=0, validation_data=(x_val, y_val))
  score = model.evaluate(x_val, y_val, verbose=0)
  return score

# Part (i)

In [140]:
model = build_model([32])
score = evaluate_network(model=model)

print('Validation loss:', score[0])
print('Validation accuracy:', score[1])

Test loss: 0.41615091037750246
Test accuracy: 0.8778


# Part (ii)

In [141]:
transformer = Normalizer().fit(X_tot)
x_train_normalized = transformer.transform(x_train)
x_val_normalized = transformer.transform(x_val)

model = build_model([32])
score = evaluate_network(model=model, x_train=x_train_normalized, x_val=x_val_normalized)

print('Normalized Validation loss:', score[0])
print('Normalized Validation accuracy:', score[1])


Normalized Test loss: 0.5958734323501587
Normalized Test accuracy: 0.8448


### Conclusion of Part (ii)

---


Since the time taken by normalized version is more and also the accuracy is lesser than the unnormalized version, we will stick 
with the unnormalized version

# Part (iii) 

In [142]:
deepmodel2 = build_model([32, 32])
deepmodel3 = build_model([32, 32, 32])
score2 = evaluate_network(model=deepmodel2)
score3 = evaluate_network(model=deepmodel3)

print('With 2 hidden layers, Validation loss:', score2[0])
print('with 2 hidden layers, Validation accuracy:', score2[1])

print('With 3 hidden layers, Validation loss:', score3[0])
print('with 3 hidden layers, Validation accuracy:', score3[1])

With 2 hidden layers, Test loss: 0.4897109190940857
with 2 hidden layers, Test accuracy: 0.8572
With 3 hidden layers, Test loss: 0.6286976434707642
with 3 hidden layers, Test accuracy: 0.797


### Conclusion of Part (iii)

---


Accuracy with 3 hidden layers is much poor as compared to 1 and 2 hidden layers. Accuracy with one hidden layers and two hidden layers are comparable. But since the data is not that big, using 2 layers might cause overfitting. So, we can stick with single hidden layer

# Part (iv)

In [143]:
for alpha in [0.001, 0.0001] :
  model = build_model([32])
  score = evaluate_network(model=model, alpha=alpha)
  print("Validation loss with alpha", alpha, ":", score[0])
  print("Validation accuracy with alpha", alpha, ":", score[1])

Test loss with alpha 0.001 : 0.8669310432434082
Test accuracy with alpha 0.001 : 0.8264
Test loss with alpha 0.0001 : 1.788726180267334
Test accuracy with alpha 0.0001 : 0.4522


# Conclusion for part (iv) 

---


Accuracy with learning rate 0.001 and 0.0001 is less as compared to learning rate 0.1. This is because now the learning rate
is less, so algorithm needs more epochs to converge to the minima. So, for the current epoch value, we can stick with learning rate 
0.1. 

# Part (v)

In [144]:
for layer_size in [64, 128] :
  model = build_model([layer_size])
  score = evaluate_network(model=model)
  print("Validation loss with hidden layer size", layer_size, ":", score[0])
  print("Validation accuracy with layer_size", layer_size, ":", score[1])
  print()

Test loss with hidden layer size 64 : 0.3408801975250244
Test accuracy with layer_size 64 : 0.8966

Test loss with hidden layer size 128 : 0.3146684859752655
Test accuracy with layer_size 128 : 0.9062



# Conclusion for part (v)

---


There is significant increase in accuracy with increase in layer size. It means that increasing the hidden layer size is 
help the model to learn the parameters in much better way. So, we may opt to increase the hidden layer size, say 128. 

# Part (vi)

In [151]:
lrelu = lambda x: keras.activations.relu(x, alpha=0.1)     # leaky relu
activationToStr = {K.tanh: 'tanh', K.relu: 'relu', lrelu: 'leaky relu'}
for activation in [K.tanh, K.relu, lrelu] :
  model = build_model([128], activation=activation)
  score = evaluate_network(model=model)
  print("Validation loss with", activationToStr[activation], ":", score[0])
  print("Validation accuracy with", activationToStr[activation], ":", score[1])
  print()

Validation loss with tanh : 0.2668101182937622
Validation accuracy with tanh : 0.9206

Validation loss with relu : 0.3048706015110016
Validation accuracy with relu : 0.9106

Validation loss with leaky relu : 0.2887902254104614
Validation accuracy with leaky relu : 0.9184



### Conclusion of part (vi)

---


Accuracy for ReLU, leaky ReLU and tanh activations appears to be better than sigmoid. I would choose ReLU between ReLU and tanh.
The biggest advantage of ReLu is indeed non-saturation of its gradient, which greatly accelerates the convergence of stochastic gradient
descent compared to the sigmoid / tanh functions. 

# Part (vii)

Among all the configurations used above, i will have opt for below configuration :
  * Only one hidden layer with 128 hidden units
  * Learning rate = 0.1
  * Activation function for hidden layer = ReLU
  * No normalization

> The choice for above configuration is made by analysing the improvement obtained by choosing these configurations.

# Part (viii)

Among all the models, i will choose the model with configuration of part (vii). The choice is made by analysing the 
performance of the model on validation data (by observing validation loss and validation accuracy).

# Run Model on test data 

In [153]:
model = build_model([128], activation=K.relu)
evaluate_network(model=model)
score = model.evaluate(x_val, y_val, verbose=0)
print("Test loss :", score[0])
print("Test accuracy :", score[1])

Test loss : 0.29768350949287414
Test accuracy : 0.9134
