<a href="https://colab.research.google.com/github/nrjcs/iitpbse/blob/master/dl/d2/bse_mnist_digits_fcnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Basic Fully Connected Neural Network for MNIST Digit Classification

---


## -- Keras provides in-built support to many datasets
## -- nice documentation is available at https://keras.io/
## -- such as MNIST (Modified National Institute of Standards and Technology database) @ http://yann.lecun.com/exdb/mnist/
	# database of handwritten digits
	# used  extensively in optical character recognition and machine learning research
	# training set of 60,000 examples, and a test set of 10,000 examples
	# digits have been size-normalized and centered in a fixed-size image
	# black and white digits
	# 28 x 28  pixels
	# Keras provides method to load MNIST data set
  
  > refer to # https://keras.io/datasets/#mnist-database-of-handwritten-digits for more details

In [None]:
# load MNIST data set
from keras.datasets import mnist	 	#importing dataset

(X_train, Y_train), (X_test, Y_test) = mnist.load_data() 	#Keras function to load and split dataset into training and test data

print ("mnist data downloaded...")

In [None]:
# this code cell is for visualization only

import matplotlib.pyplot as plt			#to plot images
	
plt.imshow(X_train[50], cmap=plt.get_cmap('gray')) # ploting first image of training data set
#plt.imshow(X_test[244], cmap=plt.get_cmap('gray'))	# ploting 2445th image of test date set
plt.show()

In [None]:
# Print shape of dataset..it will print three tuples, namely the no. of images in dataset, height and width(60000, 28, 28)

print (X_train.shape)

In [None]:

# Step 3: Preprocess input data for Keras

X_temp = X_test

# flatten 28*28 images to a 784 vector for each image and pixel precision set to 32 bit
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')

# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

# Step 4: Preprocess class labels
# check shape of our class label data

print (Y_train.shape)
#We should have 10 different classes, one for each digit, but it looks like we only have a 1-dimensional array.


In [None]:
#check labels for the first 10 training samples:
print (Y_train[:10])
# output of the form [5 0 4 1 9 2 1 3 1 4]
#The Y_train and Y_test data are not split into 10 distinct class labels, but rather are represented as a single array with the class values.

In [None]:

from keras.utils import np_utils		#for transforming data 

# Convert 1-dimensional class arrays to 10-dimensional class matrices
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
num_classes = Y_test.shape[1]

# check again	
print (Y_train.shape)
# (60000, 10)
print (Y_train[:5])


### A very simple model is being created in next few lines...this is a crucial step => creating a good network

# Keras Models

> Core data structure of Keras

> A way to organize layers

There are three ways to create Keras models:

> Sequential model
  >> A simple list of layers

  >> Stacked

  >> Single-input and single-output stacks of layers

> The Functional API and Model subclassing

> Refer to https://keras.io/api/models/ for more details

### Use sequential model
> Details @ https://keras.io/guides/sequential_model/

>  A Sequential model is declared as
>>        model = Sequential()
then dense layers are added


> Dense implements the operation: output = activation(dot(input, kernel) + bias) 

           >>  activation is the element-wise activation function passed as the activation argument

           >>  kernel is a weights matrix created by the layer

           >>  bias is a bias vector created by the layer (only applicable if use_bias is True)
      
> Adding layers (can be combined with layer declaration as well)
>>         model = Sequential([Dense(32, input_shape=(784,)), Activation('relu'),Dense(10), Activation('softmax'),])
 

>> > Or

>>         model.add(Dense(32, input_dim=784))
>>         model.add(Activation('relu'))

> Generally, all layers in Keras need to know the shape of their inputs in order to be able to create their weights

>> First layer in a Sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape


> Dense(32, input_dim=784) specifies that it is 
		>> first (input) layer
        
  >> output dimension is 32 ($1^{st}$ argument 
    
  >> input dimension is 784

> kernel_initializer: Initializations define the way to set the initial random weights of Keras layers.
    
   >> kernel_initializer='normal': name of initialization function for the weights of the layer. normal for values 
    
   >> randomly drawn from normal distribution.
   
   >> many more intializers: Zeros, Ones, normal, Constant, normal , and many more
    
> If no activation function specified, no activation is applied (ie. "linear" activation: a(x) = x).


  >> activations: Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers
      
  >> many activation function are available in Keras: relu, softmax, sigmoid, tanh, so on

In [None]:
# Define model architecture

from keras.models import Sequential		#model
from keras.layers import Dense			#layer
from keras.layers import Dropout		#layer
from keras import initializers      # for importing initializers of keras

model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer=initializers.RandomNormal(), activation='relu')) #only one hidden layer with relu as activation function
#model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer=initializers.Constant(value=5), activation='relu')) #only one hidden layer with relu as activation function	
model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))					#output layer with softmax as activation function


#print(model.summary())

print ("congrts model defined...")

**Additional**

---



=> Improving Performance of Simple Network: additional hidden layers (add one more dense layer)
 
     model.add(Dense(num_classes, kernel_initializer='normal', activation='relu'))
     
=> Improving Performance of Simple Network: additional hidden layers (add one more dense layer)

     model.add(Dense(num_classes, kernel_initializer='normal', activation='tanh'))

=> Improving Performance of Simple Network: introducing dropout layer

      model.add(Dropout(0.2))


In [None]:
# Once a model is "built", summary() method can be used to display its contents:
model.summary()


### Before training, configure the learning process, using compile() method. Three argements:
    > loss function: the objective function that model try to minimize
          >> many more: categorical_crossentropy, mean_squared_error, mean_squared_logarithmic_error, ......

    > optimizer: ANN training process is an optimization task with the aim of finding a set of weights to minimize some 
      >> objective function
      >> determine how weights are updated
      >> many more: adam (Adaptive moment estimation), sgd (Stochastic gradient descent)

    > list of metrics: used to judge performance of model, similar to objective function however not used for training purpose
      
### optimizer, loss function, meterics => very important step which will determine the performance of your network

In [None]:
# Compile model: Configures the model for training.

model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

print ("Compilation done ...")

# Training, Validation, and Test Data

> training dataset => parameter tuning (e.g. weight) or learning

> validation set => hyperparameter tuning (e.g. architecture)

> test set => evaluation

> epoch: number of times learning algorithm sees entire data

> batch size: number of samples processed before updating weights

> By setting verbose 0, 1 or 2 you just say how do you want to 'see' the training progress for each epoch. (no information, animated bar, numbe of epochs)

In [None]:
#Train model

batch_size = 100
epochs = 10

history =model.fit(X_train, Y_train,validation_split=0.2, epochs=epochs, batch_size=batch_size)

print ("parameter tuning done...")

In [None]:
# about training 
history.history.keys()


In [None]:
# Accuracy with the epochs

plt.plot(history.history['accuracy'],'r')
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training'], loc='center right')
plt.show()

In [None]:
# Loss with epochs

plt.plot(history.history['loss'],'g')
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training'], loc='upper right')
plt.show()

In [None]:
# Step 8: Evaluate model
scores = model.evaluate(X_test, Y_test)
print("Error: %.2f%%" % (100-scores[1]*100))


In [None]:
#printing metrices
print(model.metrics_names)

In [None]:
# print summary of the model
print (model.summary())

# for more on model visualization you may refer to https://keras.io/visualization/


In [None]:
#pop method: Removes the last layer in the model
#model.pop()


In [None]:
plt.imshow(X_temp[2], cmap=plt.get_cmap('gray'))
plt.show()

In [None]:

predictions = model.predict(X_test)
predictions[2]


=> Improving Performance of Simple Network: using different optimizers: SGD, Adagrad,Adam...

=> Improving Performance of Simple Network: training for more number of epochs (20)

=> other options to explore


> different learning rate for optimizer

> number of neurons in hidden layer

> batch size

> with different optimizers
   
> Increasing the number of internal hidden neurons
   
  
=> steps to follow to make an efficient image classifier?
     
     >lot of experimentation and testing to get the optimal structure and parameters