# 205229118
# Mahalakshmi s

## PDL Lab10 Tutorial: Multi-class Classification using MNIST dataset


### Part-I

In this notebook we will build a Neural Network multi-class classification model using a dataset popularly known as **'MNIST'**


In [None]:
!pip install tensorflow

In [None]:
import tensorflow as tf                       # deep learning library
import numpy as np                            # for matrix operations
import matplotlib.pyplot as plt               # for visualization
%matplotlib inline

## Loading Data
The MNIST dataset is available in the TensorFlow only. Let's load the data:

In [None]:
from tensorflow.keras.datasets.mnist import load_data    # To load the MNIST digit dataset

(X_train, y_train) , (X_test, y_test) = load_data()      # Loading data

## Basic EDA

In [None]:
print("There are ", len(X_train), "images in the training dataset")     # checking total number of records / data points available in the X_train dataset
print("There are ", len(X_test), "images in the test dataset")     # checking total number of records / data points available in the X_test dataset

In [None]:
# Checking the shape of one image
X_train[0].shape

Each image in the dataset is of shape 28X28 numbers (i.e. pixels)

In [None]:
# Take a look how one image looks like
X_train[0]

Only numbers! Can't understand what digit does it represent. 

There is a function in matplotlib called as 'matshow()', it helps you to display the image of the array of numbers

In [None]:
plt.matshow(X_train[0])

In [None]:
# we can use y_train to cross check
y_train[0]

Now one can easily say the above number is 5. Well we want to build a model that will tell you what digit does that 28X28 array represent.

In [None]:
# code to view the images
num_rows, num_cols = 2, 5
f, ax = plt.subplots(num_rows, num_cols, figsize=(12,5),
                     gridspec_kw={'wspace':0.03, 'hspace':0.01}, 
                     squeeze=True)

for r in range(num_rows):
    for c in range(num_cols):
      
        image_index = r * 5 + c
        ax[r,c].axis("off")
        ax[r,c].imshow( X_train[image_index], cmap='gray')
        ax[r,c].set_title('No. %d' % y_train[image_index])
plt.show()
plt.close()

## Data Preprocessing

In [None]:

X_train = X_train / 255
X_test = X_test / 255

"""
Why divided by 255?
The pixel value lie in the range 0 - 255 representing the RGB (Red Green Blue) value. """

Now if you look at the data, each pixel value should be in range 0 to 1.

In [None]:
X_train[0]

**Flatten the Data**

We simply convert a 2 dimensional data (i.e. one image data) to 1 dimensional.

Why to flatten data?

Before understanding why let's check the shape of the data

In [None]:
X_train.shape

The data is 3 dimensional. The first value i.e. 60000 is nothing but the number of records or images in this case. The second and third dimension represent each individual image i.e. each image is of shape 28X28. 

Most of the the supervised learning algorithms that execute classification and regression tasks, as well as some deep learning models built for this purposes, are fed with two-dimensional data. Since we have our data as three-dimensional, we will need to flatten our data to make it two-dimensional.

In [None]:
X_train_flattened = X_train.reshape(len(X_train), 28*28)    # converting our 2D array representin an image to one dimensional
X_test_flattened = X_test.reshape(len(X_test), 28*28)

Now if you check the shape of our data, it should be 2 dimensional

In [None]:
X_train_flattened.shape

In [None]:
X_test_flattened.shape

**Define the model**

In [None]:
# Defining the Model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, input_shape=(784,), activation='sigmoid')     # The input shape is 784. 
])

In [None]:
model.summary()

Generally for multi-class classification problem, it is suggested to use softmax. Later you can also try using both and keep the one which gives better performance.

**Compile the model**

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

*  **adam** is an optimization algorithm which is faster than Stochastic Gradient Descent. If you remember from the learning material of Day 4 (i.e. working of neural networks), we know that Stochastic Gradient Descent (SGD in short) is just a type of Gradient Descent algorithm.

*  **sparse_categorical_crossentropy** is a loss function similar to **binary_crossentropy** (discussed in Binary Classification Notebook), the only difference is that if the target variable is binary we use binary_crossentropy but if your target values are normal integers more then two, use sparse categorical crossentropy. Why not use **categorical_crossentropy**? You may ask. Well, [this article](https://jovianlin.io/cat-crossentropy-vs-sparse-cat-crossentropy/) will help you understand it.

*  The metrics used to evaluate the model is **accuracy**. Accuracy calculates how often the predictions calculated by the model are correct.

**Fit the model**

In [None]:
model.fit(X_train_flattened, y_train, epochs=5)

You can play with different number of epochs.

**Evaluate the model on unseen data (i.e. X_test_flattened)**

In [None]:
model.evaluate(X_test_flattened, y_test)

The performance of the model on very simple model with no hidden layer is 92.6 %. Not Bad!

**predict for the X_test**

In [None]:
y_predicted = model.predict(X_test_flattened)
y_predicted[0]

The above numbers are the probabilities values for different digits. The maximum probability will confirm what is the predicted digit for first image in X_test.

The value at the 0th index in above array of numbers is saying the probability of the digit being 0. 

**Generalize:** The value at the nth index in above array of numbers is saying the probability of the digit being n

**np.argmax finds a maximum element from an array and returns the index of it**

In [None]:
np.argmax(y_predicted[0])

The predicted digit is 7.

Let's see the original digit at first index in X_test. Can see this using matshow() function.

In [None]:
plt.matshow(X_test[0])

Hence the prediction is correct

### Exercise: Now use softmax activation function to create the model, compile, predict and check your results

In [None]:
# Defining the Model1
model1 = tf.keras.Sequential([
    tf.keras.layers.Dense(10, input_shape=(784,), activation='softmax')     # The input shape is 784. 
])

In [None]:
model1.summary()

In [None]:
model1.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])       ### compiling the model1

In [None]:
model1.fit(X_train_flattened, y_train, epochs=3)  ### fit the model1

In [None]:
model1.evaluate(X_test_flattened, y_test)   ### evaluate the model1

In [None]:
y_predicted = model1.predict(X_test_flattened)      ### predict the model1
y_predicted[0]

### Building Neural Network Model Using hidden layer

In [None]:
# Defining the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(100, input_shape=(784,), activation='relu'),
    tf.keras.layers.Dense(100, input_shape=(100,),activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.summary()


In [None]:
# Compiling the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model
model.fit(X_train_flattened, y_train, batch_size= 128,epochs=5)

In [None]:
# Evaluate the model
model.evaluate(X_test_flattened,y_test)

**Try yourself**: 
Change the values of epochs and try adding more hidden layers. Are you able to increase the accuracy above 97.5%?

# Saving and loading the model

In [None]:
# saving the model
save_dir = "/results/"
model_name = 'keras_mnist.h5'
model.save(model_name)
model_path = save_dir + model_name
print('Saved trained model at %s ' % model_path)

# Summary
*  We learned why we need to normalize and flatten the data.
*  We observed the performance of very simple neural network with no hidden layer and that of with one hidden layer with 100 hidden neurons. The performance of later model was better than earlier

## Exercises

#### Perform atleast 10 modifications and submit a table containing changes made and outputs observed

In [None]:
# code to view the images
num_rows, num_cols = 3, 6            ### I changed num of rows and columns as 3,6
f, ax = plt.subplots(num_rows, num_cols, figsize=(8,7),
                     gridspec_kw={'wspace':0.05, 'hspace':0.03}, 
                     squeeze=True)   ### Here i changed figsize as 8,7 and wspace and hspace as 0.05 and 0.03

for r in range(num_rows):
    for c in range(num_cols):
      
        image_index = r * 5 + c
        ax[r,c].axis("off")
        ax[r,c].imshow( X_train[image_index], cmap='pink')   ### I changed cmap color as pink.
        ax[r,c].set_title('No. %d' % y_train[image_index])
plt.show()
plt.close()

In [None]:
model.fit(X_test_flattened, y_test, epochs=4)    ### I changed model fit as X_test_flattened, y_test and epochs as 4.

In [None]:
model.evaluate(X_train_flattened, y_train) ### Here i changed model evaluate as X_train_flattened, y_train.

In [None]:
y_predicted1 = model.predict(X_test_flattened)
y_predicted1[0]

In [None]:
np.argmin(y_predicted[0])  ### Here i changed argmin to finds a minimum element from an array.

In [None]:
plt.matshow(X_test[5])  ### I changed matshow as X_test[5].

In [None]:
# Compiling the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model
model.fit(X_train_flattened, y_train, batch_size= 100,epochs=3) ### Here i changed batch size as 100 and epochs as 3