*CSE 555 - Intro to Pattern Recognition**

# **Problem Set 4**

## **Problem Statement**

In this problem, we will train a neural network to identify the digit on a image in the MNIST dataset using Tensorflow (or) Keras. This neural network (NN) has 10 softmax outpput notes generating $log(t=m|x;w)$ where $m=0,1,...,9$. Let $x_n \in R^{28\times28}$ be the $28\times28$, $t_n$ be the label of the image $x_n$, $w$ be the synaptic weights of the NN, an $n$ be the index of a pattern in the training dataset.

2. 
- Build a NN with 1 hidden layer of 30 sigmoid nodes, and an output layer 10 softmax nodes from 1000 training images (100 images per digit). Train the network for 30 complete wpochs, using mini-batches of 10 training examples at a time, a learning rate $\eta = 0.1$. Plot the training error, testing error, criterion function on training dataset, criterion function on testing dataset of a separate 1000 testing images (100 images per digit), and the learning speed of the hidden layer (the average absolute changes of weights divided by the values of the weights).

- Repeat 2(a) with 2 hidden layers of 30 sigmoid nodes each, 3 hidden layers of 30 sigmoid nodes each, and with & without L2 regularization $\lambda|w|^2$ and $\lambda=5$. (You will repeat w(a) for 5 times: 1 for 2 hidden layer network; 1 for 3 hidden layer network; and once each for 1, 2, 3 hidden layers with regularization).

- Construct and train convolutional NN (CNN) for MNIST classification. Regularize the training of the NN through dropout. Regularize the training of NN through augment your selection of 1000 images by rotating them for 1-3 degrees clockwise and counter clockwise, and shifting them for 3 pixels in 8 different directions. You can find many tutorials on those techniques, and our emphasize is that we understand those techniques.

### **2 a & b**

Importing requried libraries

In [0]:

import numpy as np

import tensorflow as tf
import keras
from keras.preprocessing.image import ImageDataGenerator

from sklearn.metrics import accuracy_score

from matplotlib import pyplot as plt


Loading MNIST training and test data

In [0]:

mnist_data = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist_data.load_data()


Splitting training and validation data

In [0]:

validation_x = x_train[-10000:]
validation_y = y_train[-10000:]

x_train = x_train[:-10000]
y_train = y_train[:-10000]


Seggregating 100 images per category

In [0]:

categories = range(10)
categorical_data = {i:[] for i in categories}

for i in range(x_train.shape[0]):
  categorical_data[y_train[i]].append(x_train[i])

temp_x = []
temp_y = []

for i in categories:
  for j in range(100):
    temp_x.append(categorical_data[i][j])
    temp_y.append(i)

x_train = np.asarray(temp_x)
y_train = np.asarray(temp_y)


Initializing class for properties of required neural network

In [0]:

class model_specs:
  batch = 10
  eta = 0.1
  epoch = 30
  no_sigmoids = 30
  no_softmaxs = 10
  reg_lambda = 5


Initializing class for neural network (NN)

In [0]:

class nn:
  def __init__(self, reg, layer):
    self.model = keras.models.Sequential()
    self.model.add(keras.layers.Flatten())

    for each_layer in range(layer):
      if(reg):
        self.model.add(keras.layers.Dense(units = model_specs.no_sigmoids, activation = "sigmoid", kernel_regularizer = keras.regularizers.l2(model_specs.reg_lambda)))
      else:
        self.model.add(keras.layers.Dense(units = model_specs.no_sigmoids, activation = "sigmoid"))

    self.pred = []

    self.model.compile(optimizer = "rmsprop", loss = "sparse_categorical_crossentropy", lr = model_specs.eta, metrics = ["accuracy"])

  def predict(self, data, target):
    return np.argmax(self.model.predict(data), axis = 1)
    
  def fit(self, data, target, valid_data):
    return self.model.fit(data, target, validation_data = valid_data, epochs = model_specs.epoch, batch_size = model_specs.batch, verbose = 0)


Neural network with 1 hidden layer

In [7]:

nn_model = nn(reg= False, layer= 1)
training_model = nn_model.fit(data = x_train, target = y_train, valid_data = (validation_x, validation_y))
predicted_value = nn_model.predict(x_test, y_test)
accuracy = accuracy_score(y_test, predicted_value)

print("The accuracy of this model is "+str(accuracy*100)+"%")


The accuracy of this model is 53.6%


Neural network with 1 hidden layer and L2 regularization

In [8]:

nn_model = nn(reg= True, layer= 1)
training_model = nn_model.fit(data = x_train, target = y_train, valid_data = (validation_x, validation_y))
predicted_value = nn_model.predict(x_test, y_test)
accuracy = accuracy_score(y_test, predicted_value)

print("The accuracy of this model is "+str(accuracy*100)+"%")


The accuracy of this model is 37.25%


Neural network with 2 hidden layers

In [9]:

nn_model = nn(reg= False, layer= 2)
training_model = nn_model.fit(data = x_train, target = y_train, valid_data = (validation_x, validation_y))
predicted_value = nn_model.predict(x_test, y_test)
accuracy = accuracy_score(y_test, predicted_value)

print("The accuracy of this model is "+str(accuracy*100)+"%")


The accuracy of this model is 82.74000000000001%


Neural network with 2 hidden layers and L2 regularization

In [10]:

nn_model = nn(reg= True, layer= 2)
training_model = nn_model.fit(data = x_train, target = y_train, valid_data = (validation_x, validation_y))
predicted_value = nn_model.predict(x_test, y_test)
accuracy = accuracy_score(y_test, predicted_value)

print("The accuracy of this model is "+str(accuracy*100)+"%")


The accuracy of this model is 9.8%


Neural network with 3 hidden layers

In [11]:

nn_model = nn(reg= False, layer= 3)
training_model = nn_model.fit(data = x_train, target = y_train, valid_data = (validation_x, validation_y))
predicted_value = nn_model.predict(x_test, y_test)
accuracy = accuracy_score(y_test, predicted_value)

print("The accuracy of this model is "+str(accuracy*100))


The accuracy of this model is 79.54


Neural networks with 3 hidden layers and L2 regularization

In [12]:

nn_model = nn(reg= True, layer= 3)
training_model = nn_model.fit(data = x_train, target = y_train, valid_data = (validation_x, validation_y))
predicted_value = nn_model.predict(x_test, y_test)
accuracy = accuracy_score(y_test, predicted_value)

print("The accuracy of this model is "+str(accuracy*100))


The accuracy of this model is 9.82


### **2 c**

Convoluted Neural Networks (CNN)

In [46]:

cnn_model = keras.models.Sequential()
cnn_model.add(keras.layers.Conv2D(64, kernel_size=(3,3), activation= "relu", input_shape=(28,28,1)))
cnn_model.add(keras.layers.Conv2D(32, kernel_size=(3,3), activation= "relu"))
cnn_model.add(keras.layers.Flatten())
cnn_model.add(keras.layers.Dense(10, activation= "softmax"))

cnn_model.compile(optimizer= "adam", loss= "categorical_crossentropy", metrics= ["accuracy"])

train_manipulate = ImageDataGenerator(featurewise_center= True, featurewise_std_normalization= True, rotation_range= 3, width_shift_range= 3, height_shift_range= 3, horizontal_flip= True)

train_manipulate.fit(x_train.reshape(1000, 28, 28, 1))

cnn_model.fit_generator(train_manipulate.flow(x_train.reshape(1000, 28, 28, 1), keras.utils.to_categorical(y_train), batch_size= model_specs.batch), steps_per_epoch= len(x_train.reshape(1000,28,28,1))/model_specs.batch, epochs= model_specs.epoch)


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.callbacks.History at 0x7f767d4195c0>

Loss and accuracy of a Convoluted neural network

In [47]:

cnn_pred = cnn_model.predict(x_test.reshape(10000, 28, 28, 1))
cnn_loss, cnn_accuracy = cnn_model.evaluate(x_test.reshape(10000, 28, 28, 1), keras.utils.to_categorical(y_test))

print("The loss incurred by CNN is "+str(cnn_loss))
print("The accuracy of CNN model is "+str(round(cnn_accuracy*100, 2))+"%")


The loss incurred by CNN is 47.87280726280213
The accuracy of CNN model is 78.61%


1. Demonstrate that a NN maximizes the log likelihood of label is one that has softmax output nodes and maximizes the criterion function of the negative log probability of training dataset: $J_0(w)=-\log(\{(x_n,t_n):n=1,2...\};w)=-\log \prod_n \prod_{m=0}^9 p(t_n=m|x_n;w)$. Demonstrate that a NN maximizes the a posterior likelihood of observing the training data given a gaussian prior of the weight distribution $p(w;\alpha)=N(0,\alpha l)$ is one that maximizes the criterion function with L2 regularization $J(w)=J_0(w)-\log p(w;\alpha^{-1})$.