# `scikit-learn`

Today, we are going to take a *quick* glance into the world of Machine Learning and Data Science using `scikit-learn`.

_REMINDER_! You can download these notebooks by (in your Terminal) typing:
```
git clone https://github.com/icme/cme193
```

You can also click the download link in the top-right corner on `nbviewer`!

Well here we go...another whirlwind tour!

I have often said that "there could be an entire course about this topic" about many of the topics in class. For this topic, there are like 5 classes at Stanford *only* about this -- if the stuff in this lecture is interesting to you, I recommend you look at them!

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

# -- we will only use linear models today...
from sklearn import datasets, linear_model

In [None]:
digits = datasets.load_digits()
images_and_labels = list(zip(digits.images, digits.target))

In [None]:
print images_and_labels[0]

In [None]:
for index, (image, label) in enumerate(images_and_labels[:10]):
    plt.subplot(2, 5, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)

Now, let's manipulate our data into a Numpy/scikit-learn friendly format! 

We are going to work with Logistic Regression here in this digit recognition example. For each digit, we want to learn a model that will, given the image in a numpy array $x$, predict the probability that it is an image of the number $d\in\{0, 1, ..., 8, 9\}$. We estimate 10 models of the form

$$
P(d | x) = \frac{1}{1+e^{-(\theta_d^T x)}}, d\in\{0, 1, ..., 8, 9\}
$$

and, for a given $x$, the predicted class is 

$$
\arg\max_{i} P(i | x)
$$

That is, we predict that a digit is the number $d$ if that is the highest probability digit!

In [None]:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
print data.shape

In [None]:
classifier = linear_model.LogisticRegressionCV(Cs=20, cv=10, verbose=True, 
                                               penalty='l2', max_iter=10000, 
                                               n_jobs=6)

In [None]:
classifier.fit(data[:n_samples / 2], digits.target[:n_samples / 2])

In [None]:
print classifier.scores_[0].shape
print classifier.Cs_.shape

In [None]:
for k, v in classifier.scores_.iteritems():
    plt.plot(classifier.Cs_, v.mean(axis=0), label='Digit: {}'.format(k))
plt.semilogx()
plt.legend()
plt.plot()

In [None]:
yhat = classifier.predict(data[n_samples / 2:])

In [None]:
print 'Prediction Accuracy: {}%'.format((yhat == digits.target[n_samples / 2:]).mean())

Now, what did we learn?

In [None]:
print classifier.coef_.shape

This means that we have a $\theta_d$ with shape `(1, 64)` for each class, like we saw.

In [None]:
for index, w in enumerate(classifier.coef_):
    mx = np.max(np.abs(w))
    plt.subplot(2, 5, index + 1)
    plt.axis('off')
    plt.imshow(w.reshape((8, 8)), cmap=plt.cm.RdYlBu_r, interpolation='nearest', vmax=mx, vmin=-mx)
    plt.title('Filter: %i' % index)