# Recognizing hand-written digits

An example showing how the scikit-learn can be used to recognize images of
hand-written digits.

This example is commented in the [tutorial section](http://scikit-learn.org/stable/tutorial/basic/tutorial.html#introduction) of the user manual.

**Author**: Gael Varoquaux - gael.varoquaux@normalesup.org  
**License**: BSD 3 clause

**Total running time of the example**: 0.97 seconds ( 0 minutes 0.97 seconds)

Migrated to Jupyter for Orlando Python ML Workshop

## Intro

In this notebook, we are going to use a Support Vector Machine as our supervised classification model to identify handwritten digits as a 2D matrix of pixel values.

Our dataset is part of the [MNIST handwritten digit database](http://yann.lecun.com/exdb/mnist/). It is a standard dataset used in data science and is available as part of sk-learn's "datasets" module. This module will download the dataset for you and have it available for future use.

In [None]:
# Standard scientific Python imports
import matplotlib.pyplot as plt

# Tell Jupyter to render graphs inside the notebook
%matplotlib inline

# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics

# The digits dataset
digits = datasets.load_digits()

![MNIST example digits with labels](https://longhowlam.files.wordpress.com/2015/11/digits.jpeg?w=527&h=391)

The data that we are interested in is made of 8x8 images of digits, let's have a look at the first 3 images, stored in the `images` attribute of the dataset. If we were working from image files, we could load them using pylab.imread. Note that each image must have the same size. For these images, we know which digit they represent: it is given in the 'target' of the dataset.

In [None]:
images_and_labels = list(zip(digits.images, digits.target))
for index, (image, label) in enumerate(images_and_labels[:4]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)

## Model

Now for the fun part

[Support Vector Machine](http://scikit-learn.org/stable/modules/svm.html)

### Data Preprocessing

To apply a classifier on this data, we need to flatten the image, to turn the data in a (samples, feature) matrix.

In [None]:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

### Create and Fit

We train our model on the first half of the digits

In [None]:
classifier = svm.SVC(gamma=0.001)
classifier.fit(data[:n_samples / 2], digits.target[:n_samples / 2])

### Test

Now predict the value of the digit on the second half

In [None]:
expected = digits.target[n_samples / 2:]
predicted = classifier.predict(data[n_samples / 2:])

print("Classification report for classifier {}:\n{}\n".format(
        classifier, metrics.classification_report(expected, predicted)))
print("Confusion matrix:{}".format(metrics.confusion_matrix(expected, predicted)))

Pretty simple, right?

## Classified Digits

Let's create a final image showing what the model saw and how it classified some digits.

In [None]:
images_and_predictions = list(zip(digits.images[n_samples / 2:], predicted))
for index, (image, prediction) in enumerate(images_and_predictions[:4]):
    plt.subplot(2, 4, index + 5)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Prediction: %i' % prediction)

plt.show()