# MNIST Digit Classification

### Introduction and General Information

* We are using Naive Bayes Classification technique to classify the handwritten digits in MNIST database. 

* Naive Bayes Classifier is a probabilistic machine learning model. It works on the principle of Bayes Theorem.
    
* Bayes Theorem states that P(y|X) = (P(X|y) * P(y)) / P(X). Here y is the class label (digit 0 or 1), X represents the image features.

* Using this theorem, we can find the class probability of the data for given set of features. It is assumed at all features are independent of each other.

### Objectives

* For the purpose of this project, we are considering only a subset of MNIST dataset. We filter the training and testing sets to obtain the images of digits 0 and 1 only.

* The main objective of this project is to apply the concept of Naive Bayes Classification technique on the subset of MNIST dataset and evaulate the performance of the classifier.

### Data Used
(Data Source: http://yann.lecun.com/exdb/mnist/)

* We are using MNIST dataset containing images of handwritten digits from 0 - 9. 

* Each image is of size 28 pixels x 28 pixels, making 784 pixels in total. This dataset is divided into training and testing sets. The training set consists of 60,000 images and testing set consists of 10,000 images. 

* We build the model based on the training dataset and use testing dataset to evaluate the performance of the classification model.

### Approach and Methodology

* Loading and Filtering MNIST Dataset to obtain the images of digits 0 and 1.

* Convert the given dataset in 3D Numpy format to 2D pandas dataframe. Now each row is representing a single image in pandas dataframe.

* Each image consists of two features namely, average brightness and standard deviation. Perform feature extraction for each image from train and test datasets of digits 0 and 1.

* Compute density parameters - Mean and Variance for each of the extracted features. 

* Apply Naive Baye's classifier technique to classify the image as 0 or 1 on the test dataset by using the density parameters computed above. 

* Evaluate the performance of the classifier by computing the accuracy of predictions.

### 1. Importing libraries

In [1]:
import pathlib
import numpy as np
import scipy.io
import math
import pandas as pd
from math import sqrt
from math import pi
from math import exp
from mlxtend.data import loadlocal_mnist
from IPython.display import display

### 2. Loading and Filtering MNIST Dataset

* Download the data from the source link given above and define the paths to access the data.
* Filter the desired digit (0 or 1) from the dataset.
* The dataset is available in 3D Numpy format. So convert it to 2D pandas dataframe.

In [5]:
BASE_PATH = str(pathlib.Path().absolute())

In [6]:
TRAIN_IMAGE_PATH = BASE_PATH + '/mnist_dataset/train_digit_images_dataset'
TRAIN_LABEL_PATH = BASE_PATH + '/mnist_dataset/train_digit_labels_dataset'
TEST_IMAGE_PATH = BASE_PATH + '/mnist_dataset/test_digit_images_dataset'
TEST_LABEL_PATH = BASE_PATH + '/mnist_dataset/test_digit_labels_dataset'

#### 2.1 Method to download the MNIST data and extract the data for particular digit from the entire data:
* This method takes the desired digit to be extracted and the data paths as parameters.
* After extracted the image data of a particular digit from the entire dataset, We convert it into pandas dataframe.

In [14]:
def converting_mnist_dataset_to_dataframe(digit, digit_image_path, digit_label_path):
    mnist_data_image, mnist_data_label = loadlocal_mnist(digit_image_path, digit_label_path)
    
    indices_of_desired_digit = np.where((mnist_data_label == digit))
    filtered_image_data = mnist_data_image[indices_of_desired_digit]
    
    filtered_image_data_df = pd.DataFrame(data=filtered_image_data, index=None, columns=None)

    return filtered_image_data_df

In [34]:
train_0_df = converting_mnist_dataset_to_dataframe(0, TRAIN_IMAGE_PATH, TRAIN_LABEL_PATH)
test_0_df = converting_mnist_dataset_to_dataframe(0, TEST_IMAGE_PATH, TEST_LABEL_PATH)
print("Size of digit 0 train dataset: {}".format(train_0_df.shape[0]))
print("Size of digit 0 test dataset: {}".format(test_0_df.shape[0]))

Size of digit 0 train dataset: 5923
Size of digit 0 test dataset: 980


In [33]:
train_1_df = converting_mnist_dataset_to_dataframe(1, TRAIN_IMAGE_PATH, TRAIN_LABEL_PATH)
test_1_df = converting_mnist_dataset_to_dataframe(1, TEST_IMAGE_PATH, TEST_LABEL_PATH)
print("Size of digit 1 train dataset: {}".format(train_1_df.shape[0]))
print("Size of digit 1 test dataset: {}".format(test_1_df.shape[0]))

Size of digit 1 train dataset: 6742
Size of digit 1 test dataset: 1135


### 3. Feature Extraction

* Each row of the pandas dataframe represents an image. Each image consists of following two features - Average Brightness and Standard Deviation.
* Average Brightness is extracted by computing the mean of all pixel values in a row.
* Standard Deviation is extracted by computing standard deviation of all the pixel values in a row.
* Extract these features of digit 0 and 1 seperately and then combine them in a single dataframe.

#### 3.1 Method to extract features from the dataframe:

* We perform feature extraction seperately on train_0, train_1, test_0 and test_1 dataframes.
* This methods takes a dataframe and the desired digit as parameters. 
* Average Brightness and Standard Deviation is computed for each image i.e. each row of the input dataframe.
* A column for class vector is also added to the extracted features dataframe.

In [20]:
def feature_extraction(df, digit):
    avg_brightness = df.agg('mean', axis = 'columns')
    avg_std = df.agg('std', axis = 'columns')
    extracted_features_df = pd.concat([avg_brightness, avg_std], axis=1)
    extracted_features_df.columns = ['averageBrightness', 'standardDeviation']
    extracted_features_df['classVector'] = digit
    
    return extracted_features_df

In [69]:
train_0_extracted_features_df = feature_extraction(train_0_df, 0)
print("Size of digit 0 extracted features from train dataframe: {}".format(train_0_extracted_features_df.shape[0]))

Size of digit 0 extracted features dataframe: 5923


In [70]:
print(train_0_extracted_features_df.round(3).head())

   averageBrightness  standardDeviation  classVector
0             39.662             83.941            0
1             45.195             89.087            0
2             46.565             91.800            0
3             47.533             91.750            0
4             58.091             99.273            0


In [71]:
print(train_0_extracted_features_df.round(3).tail())

      averageBrightness  standardDeviation  classVector
5918             36.849             83.442            0
5919             30.084             73.061            0
5920             39.562             83.475            0
5921             45.062             88.404            0
5922             44.422             88.973            0


In [72]:
train_1_extracted_features_df = feature_extraction(train_1_df, 1)
print("Size of digit 1 extracted features from train dataframe: {}".format(train_1_extracted_features_df.shape[0]))

Size of digit 1 extracted features from train dataframe: 6742


In [76]:
print(train_1_extracted_features_df.round(3).head())

   averageBrightness  standardDeviation  classVector
0             21.856             66.121            1
1             22.508             67.885            1
2             13.870             52.649            1
3             14.824             54.617            1
4             21.144             64.798            1


In [77]:
print(train_1_extracted_features_df.round(3).tail())

      averageBrightness  standardDeviation  classVector
6737             15.491             55.311            1
6738             20.932             64.326            1
6739             15.911             55.794            1
6740             19.302             61.458            1
6741             12.279             47.491            1


In [75]:
test_0_extracted_features_df = feature_extraction(test_0_df, 0)
print("Size of digit 0 extracted features from test dataframe: {}".format(test_0_extracted_features_df.shape[0]))

Size of digit 0 extracted features from test dataframe: 980


In [78]:
print(test_0_extracted_features_df.round(3).head())

   averageBrightness  standardDeviation  classVector
0             47.212             92.463            0
1             37.960             81.688            0
2             37.509             81.939            0
3             67.417            105.710            0
4             43.903             88.954            0


In [79]:
print(test_0_extracted_features_df.round(3).tail())

     averageBrightness  standardDeviation  classVector
975             49.635             92.342            0
976             42.518             84.580            0
977             36.282             80.183            0
978             56.047             98.186            0
979             69.921            107.090            0


In [80]:
test_1_extracted_features_df = feature_extraction(test_1_df, 1)
print("Size of digit 1 extracted features from test dataframe: {}".format(test_1_extracted_features_df.shape[0]))

Size of digit 1 extracted features from test dataframe: 1135


In [81]:
print(test_1_extracted_features_df.round(3).head())

   averageBrightness  standardDeviation  classVector
0             12.591             48.898            1
1             17.672             59.554            1
2             20.662             66.499            1
3             16.311             57.236            1
4             13.804             52.370            1


In [82]:
print(test_1_extracted_features_df.round(3).tail())

      averageBrightness  standardDeviation  classVector
1130             18.744             60.695            1
1131             17.869             58.316            1
1132             21.606             65.197            1
1133             22.036             67.327            1
1134             21.989             66.490            1


#### 3.2 Method to combine extracted features of digits 0 and 1 into a single dataframe:

* This method combines the extracted features of digit 0 and 1 from the training set into single dataframe.

In [83]:
def create_combined_train_features_df():    
    combined_train_features_df = pd.concat([train_0_extracted_features_df, train_1_extracted_features_df])
    
    return combined_train_features_df

In [84]:
combined_train_features_df = create_combined_train_features_df()
print("Size of combined train dataset: {}".format(combined_train_features_df.shape[0]))

Size of combined train dataset: 12665


In [85]:
print(combined_train_features_df.round(3).head())

   averageBrightness  standardDeviation  classVector
0             39.662             83.941            0
1             45.195             89.087            0
2             46.565             91.800            0
3             47.533             91.750            0
4             58.091             99.273            0


In [86]:
print(combined_train_features_df.round(3).tail())

      averageBrightness  standardDeviation  classVector
6737             15.491             55.311            1
6738             20.932             64.326            1
6739             15.911             55.794            1
6740             19.302             61.458            1
6741             12.279             47.491            1


### 4. Density Estimation

* We assume that both the features extracted above are independent of each other and each of the image follows normal distribution.
* Normal Distribution is characterized by Mean and Variance.
* We compute Mean and Variance for both the features from the combined dataframe.

#### 4.1 Method to compute Mean and Variance for each feature:

* This method takes a dataframe as parameter.
* It groups the data in the dataframe by class vector and computes Mean and Variance for each column of the dataframe.

In [87]:
def normal_distribution_parameters(df):
    return df.groupby('classVector').agg(['mean','var']).round(2)

In [88]:
statistics = normal_distribution_parameters(combined_train_features_df)
display(statistics)

Unnamed: 0_level_0,averageBrightness,averageBrightness,standardDeviation,standardDeviation
Unnamed: 0_level_1,mean,var,mean,var
classVector,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,44.22,115.29,87.49,101.63
1,19.38,31.45,61.41,82.81


### 5. Implementing Naive Bayes Classifier

* This will predict the class vector of the previously unknown dataset i.e. test dataset. 

* Extracted features of the test dataset will be used here.

* Each row of the extracted features dataframe contains Average Brightness and Standard Deviation values of a single image.

* We find the class probabilities of the image.
    * Let D = data representing Average Brightness and Standard Deviation of a particular image.
    * Compute P(class = 0 | D)
    * Compute P(class = 1 | D)
    
* Compare the values of P(class = 0 | D) and P(class = 1 | D). Highest among the two is considered as the class label for that particular image, D.

#### Formula for computing class probabilities:

* P(class = 0 | D) = P(D | class = 0) * P(class = 0)
              = P(averageBrightness | class = 0) * P(standardDeviation | class = 0) * P(class = 0)

* P(class = 1 | D) = P(D | class = 1) * P(class = 1)
              = P(averageBrightness | class = 1) * P(standardDeviation | class = 1) * P(class = 1)

#### Gaussian Formula for computing probability of continuous feature attributes:

* P(x | class) can be computed by the gaussian formula shown below.
    * x is averageBrightness or standardDeviation
    * class is 0 or 1

![61463c3decedda46e356782e24051ec7dd3c34c8.svg](attachment:61463c3decedda46e356782e24051ec7dd3c34c8.svg)

Here, 
* 𝜇 - mean of the distribution
* 𝜎 - standard deviation of the distribution
* 𝜎^2 - variance of the distribution
* x = averageBrightness or standardDeviation of the image.







#### 5.1 Method to calculate Gaussian probability distribution for Average Brightness and Standard Deviation of each row of the combined features dataframe:

* This method takes the feature value (can be either of Average Brightness and Standard Deviation), mean and standard deviation of the feature distribution as parameters.
* This method is used for computation of -
    * P(averageBrightness | class = 0)
    * P(standardDeviation | class = 0)
    * P(averageBrightness | class = 1)
    * P(standardDeviation | class = 1)
    

In [89]:
def calc_gaussian_probability(feature_val, mean, stdev):
    exponent = exp(-((feature_val-mean)**2 / (2 * stdev**2 )))
    return (1 / (sqrt(2 * pi) * stdev)) * exponent

#### 5.2 Method to compute class probabilities for each row of the combined features dataframe:

* Each row of the combined features dataframe represents an image. 
* This method is used to compute P(class = 0 | D) and P(class = 1 | D); D = data representing Average Brightness and Standard Deviation of a particular image.
* Probabilities obtained using above formula are stored in a dictionary of form {0: probability value, 1: probability value)

In [2]:
def calc_class_probability(row):
#     combined_train_features_df = create_combined_train_features_df()
    total_images = combined_train_features_df.shape[0]
    features = ['averageBrightness', 'standardDeviation']
    classVector = [0,1]
    class_probabilities = dict()
    stats = combined_train_features_df.groupby('classVector').agg(['mean','std'])

    for class_value in classVector:
        class_probabilities[class_value] = (combined_train_features_df.classVector == class_value).sum()/total_images
        for i in range(len(features)):
            feature_df = stats[features[i]]
            feature_mean = feature_df.loc[class_value]['mean']
            feature_std = feature_df.loc[class_value]['std']
            class_probabilities[class_value] *= calc_gaussian_probability(row[i], feature_mean, feature_std)
            
    return class_probabilities

#### 5.3 Method to predict the class for a given previously unknown dataset:

* We now have class probabilities in the form of dictionary for a particular image.
* In this method, we will compare the probabilites of the image belonging to class = 0 and class = 1. The highest of the two becomes our predicted class label for the image.

In [3]:
def predict_class_for_dataset(row):
    class_probabilities = calc_class_probability(row)
    predicted_label, highest_prob = None, -1
    
    for class_val, probability in class_probabilities.items():
        if predicted_label is None or probability > highest_prob:
            highest_prob = probability
            predicted_label = class_val
            
    return predicted_label

#### 5.4 Method to build Naive Baye's classifier model:

* This method takes a dataframe as parameter. 
* We use the extracted features of digit 0 and 1 from test dataset for building this classification model.
* Iterate through the dataframe and convert each row to a list. Store all such lists in rows_list.
* Now iterate through this list consisting of all the rows for predicting their class.
* Store the predictions for all these rows of the dataframe in a form of a list.

In [4]:
def naive_bayes_classifier(test_df):
    class_predictions = []
    rows_list = []

    for index, rows in test_df.iterrows():
        test_row =[rows.averageBrightness, rows.standardDeviation]
        rows_list.append(test_row)

    for row in rows_list:
        predicted_class = predict_class_for_dataset(row)
        class_predictions.append(predicted_class)

    return class_predictions

### 6. Evaluating Classifier Model Performance

* In order to evaluate the model performance, we will compute the accuracy of the class predictions made by the model. 

* Accuracy for predicting digits 0 and 1 is computed seperately.


#### 6.1 Method to calculate accuracy of the naive bayes classifier model:

* In this method, model prediction for digit 0 and 1 are compared with the ground truth respectively.
* Number of correct predictions is counted against the total number of data in test dataset of a particular digit.
* Accuracy is given by (Number of correct predictions / Total number of observations) * 100

In [6]:
def calc_accuracy(digit, test_df):
    class_count = test_df.shape[0]
    actual = [digit]*class_count
    predicted = naive_bayes_classifier(test_df)
    correctly_predicted = 0

    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correctly_predicted += 1

    accuracy = round(correctly_predicted/len(actual) * 100,2)
    
    return accuracy

In [94]:
prediction_accuracy_0 = calc_accuracy(0,test0_extracted_features_df)
prediction_accuracy_1 = calc_accuracy(1,test1_extracted_features_df)

print("Accuracy for digit 0: {} %".format(prediction_accuracy_0))
print("Accuracy for digit 1: {} %".format(prediction_accuracy_1))

Accuracy for digit 0: 91.43 %
Accuracy for digit 1: 92.42 %


### 7. Conclusion

* We built a Naive Baye's Classification Model to classify the digits as 0 and 1 from the subset of MNIST testing dataset.
* This model is able to classify digit 0 with 91.4 % accuracy and digit 1 with 92.4 % accuracy.