# Scikit-learn Intro

We want to get a machine to learn how to recognize digits. 

Scikit-learn has some builtin datasets

In [None]:
from sklearn import datasets

The most famous one in ML is the MNIST dataset. 

In [None]:
digits = datasets.load_digits()

Let's take a look at the data briefly. 

In [None]:
print(digits.data) 

That was not very helpful. We need to study it in a bit more detail. 

In [None]:
digits.data.shape

In [None]:
digits.data[0]

And what are we trying to predict? 

In [None]:
digits.target

In [None]:
digits.target[-10:]

In [None]:
digits.target.shape

# Learning the digits

Let's load a builtin classifier -- an object that decides what the image corresponds to. 

In [None]:
from sklearn import svm

Don't worry about the exact values we've put in for the parameters. They're not important for us at the moment. 

In [None]:
clf = svm.SVC(gamma=0.001, C=100.)

Let's tell the classifier to learn from the data. We'll show it all the images and tell it which ones they are -- except for the last one. We'll hold that secret. 

In [None]:
clf.fit(digits.data[:-1], digits.target[:-1])  

The machine has learned the digits -- or so it thinks. Let's test it on the one digits we never showed it. Notice that we're not telling it the `target`. 

In [None]:
clf.predict(digits.data[-1:])

Let's see what that image looks like and whether the prediction makes sense. 

In [None]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

plt.figure(figsize=(2, 2))
plt.imshow(digits.images[-1], interpolation='nearest', cmap=plt.cm.binary)

Let's go back and see where all the learning happened. 