# Challenge2: MNIST Digit Recognition | #100MLProjects | #Laxmena

**Author: Lakshmanan Meiyappan**

100MLProjects Pre Project Update: [Blog: Challenge-2 MNIST Digit Recognition](https://medium.com/@laxmena/project2-of-100mlprojetcs-classification-mnist-digit-recognition-d9208856f1f2)

- LinkedIn: https://www.linkedin.com/in/lakshmanan-meiyappan/
- Github: https://github.com/laxmena/

## SVM Classification


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

In [3]:
import random
import gzip

## Load MNIST Dataset

MNIST Dataset can be found here: http://yann.lecun.com/exdb/mnist/

File names of Training and Test sets:

- train-images-idx3-ubyte.gz -  training set images (9912422 bytes)
- train-labels-idx1-ubyte.gz -  training set labels (28881 bytes)
- t10k-images-idx3-ubyte.gz  -  test set images (1648877 bytes)
- t10k-labels-idx1-ubyte.gz  -  test set labels (4542 bytes)


**load_mnist** 
A function to load MNIST data into iPython Notebook

**Parameters:**
- *filename* : Name of the MNIST '.gz' file with extension
- *type* : 'image' or 'label' to specify the type of data
- *n_datapoints* : Number of datapoints

In [24]:
def load_mnist(filename, type, n_datapoints):
    # MNIST Images have 28*28 pixels dimension
    image_size = 28
    f = gzip.open(filename)
    
    if(type == 'image'):
        f.read(16)    # Skip Non-Image information
        buf = f.read(n_datapoints * image_size * image_size)
        data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
        data = data.reshape(n_datapoints, image_size, image_size, 1)
    elif(type == 'label'):
        f.read(8) # Skip Inessential information
        buf = f.read(n_datapoints)
        data = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)
        data = data.reshape(n_datapoints, 1)
    return data

In [25]:
# Training Dataset
train_size = 6000
test_size = 1000
# dirpath = '/content/drive/My Drive/02 MNIST Digit Recognition/'
dirpath = ''
X = load_mnist(dirpath + 'train-images-idx3-ubyte.gz', 'image', train_size)
y = load_mnist(dirpath + 'train-labels-idx1-ubyte.gz', 'label', train_size)
X_test = load_mnist(dirpath + 't10k-images-idx3-ubyte.gz', 'image', test_size)
y_test = load_mnist(dirpath + 't10k-labels-idx1-ubyte.gz', 'label', test_size)

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X[:(train_size//10)], y[:(train_size//10)], test_size=0.25, random_state=28)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)

(450, 784) (150, 784) (450, 1) (150, 1)


## Build SVM Model

In [37]:
classifier = SVC()

In [38]:
%%time
print('Training the Model')
classifier = classifier.fit(X.reshape(X.shape[0],28*28), y)

Training the Model


  y = column_or_1d(y, warn=True)


Wall time: 1min 22s


## Predict 

In [34]:
y_pred = classifier.predict(X_test.reshape(X_test.shape[0],28*28))

In [35]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00        85
           1       0.13      1.00      0.22       126
           2       0.00      0.00      0.00       116
           3       0.00      0.00      0.00       107
           4       0.00      0.00      0.00       110
           5       0.00      0.00      0.00        87
           6       0.00      0.00      0.00        87
           7       0.00      0.00      0.00        99
           8       0.00      0.00      0.00        89
           9       0.00      0.00      0.00        94

    accuracy                           0.13      1000
   macro avg       0.01      0.10      0.02      1000
weighted avg       0.02      0.13      0.03      1000



  'precision', 'predicted', average, warn_for)


In [36]:
confusion_matrix(y_test, y_pred)

array([[  0,  85,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0, 126,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0, 116,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0, 107,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0, 110,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  87,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  87,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  99,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  89,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  94,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=int64)

In [27]:
X[0].shape

(28, 28, 1)

In [29]:
(X.reshape(X.shape[0], 28*28)).shape

(6000, 784)