## How to create a confusion matrix  

When building a classification model there are many situations where one will want to take a closer look at the predictions their model made. Specifically, the true positives, false positives, true negatives and false negative (i.e., When the true true class of the example was $X$ how often did my model predict $X$ vs. $Y,Z,...,K$). This is very useful for seeing which classes your model can easily predict and which classes are more difficult. Below is an example of what a confusion matrix looks like.

In [18]:
import pandas as pd
import numpy as np

Let's say you have 10 examples that fall into three classes and here is a list of the class that corresponds to. each example

In [19]:
actual_classes_of_examples = np.array([0,0,1,2,2,1,0,0,1,2])

Now, you've created your amazing model that you think is awesome at predciting and you want to see which classes  
it was good at predicting and which ones not so much. You get the predictions from your model. If using scikit  
learn you will use the predict() method. Let's say this is the list it returned:

In [20]:
predicted_classes_of_examples = np.array([0,0,1,2,1,1,2,1,1,2])

Using Pandas we'll create the matrix.

In [21]:
# Create confusion matrix
pd.crosstab(actual_classes_of_examples
            ,predicted_classes_of_examples
            ,rownames=['Actual']
            ,colnames=['Predicted'])

Predicted,0,1,2
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,2,1,1
1,0,3,0
2,0,1,2


Ideally, the only numbers in the matrix would be on the diagonal and the rest would be zeros. That would mean your model predicted every class perfectly. Maybe that's not ideal, because it probably means you're overfitting :) Here one can see get a sense of the precision: $True\ Positives / (True\ Positives + False\ Positives)$ by looking at the columns, and recall $True\ Positives / (True\ Positives + False\ Negatives)$ by looking at the rows.