**Confusion Matrix**   
One good way to evaluate the performance of a classifier is to look at the **confusion matrix**. The general idea is to count the number of times instances of class A are classified as class B.   

My purpose is to show how to read confusion matrix, so for this, we will have the MNIST dataset and create a sub-train and test set that only has 5s. So we will have two classes: 5s or not-5s. Then we will train the Stochastic Gradient Descent classifier and look at the confusion matrix to measure the perfomance:

In [1]:
from sklearn.datasets import fetch_mldata

mnist = fetch_mldata('MNIST original')

In [2]:
X, y = mnist['data'], mnist['target']

In [3]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

In [11]:
import numpy as np

shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]

Let's create the subsets:

In [5]:
y_train_5 = (y_train == 5) #True for all 5s, False for all other digits.
y_test_5 = (y_test == 5)

Let's train the model:

In [6]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42)

sgd_clf.fit(X_train, y_train_5)



SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=None, n_iter=None,
       n_jobs=1, penalty='l2', power_t=0.5, random_state=42, shuffle=True,
       tol=None, verbose=0, warm_start=False)

To compute the confusion matrix, you first need to have a set of predictions, so they can be compared to the actual targets. You could make predictions on the test set, but let's keep it untouched for now (remember that you want to use the test set only at the very end of your project, once you have a classifier that you are ready to launch). Instead, you can use the `cross_val_predict()` function

In [7]:
from sklearn.model_selection import cross_val_predict

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)



In [8]:
from sklearn.metrics import confusion_matrix

confusion_matrix(y_train_5, y_train_pred)

array([[53875,   704],
       [ 1680,  3741]])

Each row in a confusion matrix represents an *actual class*, while each column represents a *predicted class*. The first row of this matrix considers non-5 images (the *negative class*): 53,875 of them were correctly classified as non 5s (they are called **true negatives**), while the remaining 704 were wrongly classified as 5s (**false positives**).  

The second row considers the images of 5s (*the positive class*): 1,680 were wrongly classified as non-5s (**false negatives**), while the remaining 3,741 were correctly classified as 5s (**true positives**).   

A perfect classifier would have only true positives and true negatives, so its confusion matrix would have nonzero values only on top left to bottom right:

In [9]:
y_train_perfect_predictions = y_train_5

In [10]:
confusion_matrix(y_train_5, y_train_perfect_predictions)

array([[54579,     0],
       [    0,  5421]])