An algorithm that can classify 8x8 images of handwritten digits.
Example demo using basic python, numpy, pandas, scikit-learn, matplotlib.


In [1]:
# Adapted Example of Scikit-learn website

# Standard scientific Python imports
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics

The digits dataset is avaiable from the sklearn library. More information about this dataset can be found here : http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html

In [2]:
# The digits dataset
digits = datasets.load_digits()

In [3]:
# the dataset is made out of 8x8 arrays of float numbers representing a
# grey-scale image. The digit is known before-hand and present in 'target'
# let's visualize the first 4 digit images.

# this can be done using pandas or standard python,numpy
# benefit of pandas is table-like visualization of contents.
# makes it easier to work with.
images_and_labels = list(zip(digits.images, digits.target))
for index, (image, label) in enumerate(images_and_labels[:4]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)

plt.show()


In [None]:
# make dataframe from list
# show some standard information of the pandas dataframe
df_images_and_labels = pd.DataFrame(images_and_labels)
print(df_images_and_labels.head())
df_images_and_labels.info()


Same plots can be done using the pandas dataframe instead of numpy array.

In [None]:
#same plots produced using pandas dataframe.
for index in np.arange(4):
    plt.subplot(2,4, index + 1)
    plt.axis('off')
    plt.imshow(df_images_and_labels.iloc[index][0], cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % df_images_and_labels.iloc[index][1])

plt.show()





Before applying the algorithm a preprocessing step is done. Thealgorithm can use the float values of the 8x8 image as a 64x1 array. This data transformation step is called 'preprocessing'

In [None]:
# To apply the algorithm on the digits we need to put it in
# as a 1-d array of floats. digits.images are 2-d arrays
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# pandas to visualize data transformation
df = pd.DataFrame(data)
df.head()



The popular sk-learn algorithm used here is called Support Vector Machine. It can classify different cases. The different cases are the digits 0-9. 

In [None]:
# boot the algorithm
classifier = svm.SVC(gamma=0.001)

# train the algorithm on first half of digit images
classifier.fit(data[:n_samples // 2], digits.target[:n_samples // 2])

# Now predict the value of the digits on the second half:
expected = digits.target[n_samples // 2:]
predicted = classifier.predict(data[n_samples // 2:])

#built-in report of scikyt-learn for this algorithm
print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))


This demo is an example of machine learning. The complete 'intelligence' of this implementation is present in the classifier object after training/learning. The variable predicted contains the digits the algorithm predicts for the unseen digit images. 
The predicted digits are added to the plot.

In [None]:
#plot prediction digits & images for test set (second half)
images_and_predictions = list(zip(digits.images[n_samples // 2:], predicted))
for index, (image, prediction) in enumerate(images_and_predictions[:4]):
    plt.subplot(2, 4, index + 5)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Prediction: %i' % prediction)

plt.show()