# Confusion matrix

Inaccuracy matrix is a very important concept for evaluating classification models.

## Idea

Consider a binary classification problem. We have two classes, Positive and Negative.

Let be:

- $P$ is the number of positive observations in the sample;
- $N$ is the number of negative observations in the sample.

Now, suppose we have formed some classifier. We have the following groups of observations.

- True positive - observations that were positive in the sample and we correctly predicted them as positive. We will denote their number as $TP$;
- True negative - observations that were negative in the sample and we correcrly predicted then as negative. We will denote their number as $TN$;
- False positve - observations that were negative in the sample, but which we then mistakenly predicted to be positive. We will denote their number as $FP$;
- False negative - observations that were positive in the sample, but wich we then mistakenly predicted to be negative. We will denote their number as $FN$.

So, if you put the actual value on the rows and the predicted value on the columns, you will get a confusion matrix.

<table>
  <thead>
    <tr>
      <th></th>
      <th>Predicted $N$</th>
      <th>Predicted $P$</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Actual $N$</td>
      <td>$TN$</td>
      <td>$FP$</td>
    </tr>
    <tr>
      <td>Actual $P$</td>
      <td>$FN$</td>
      <td>$TP$</td>
    </tr>
  </tbody>
</table>

Also valuable is the representation of the confusion matrix using relative values.

Let be:

- $P^* = TP + FP$ - number of observations from the sample predicted as positive;
- $N^* = TN + FN$ - number of observations from the sample predicted as negative;
- $TNR = TN/N^*$ - true negative rate, the proportion of correct predictions among observations that are predicted negative;
- $FNR = FN/N^*$ - false negative rate, the proportion of incorrect predictions among observations that are predicted to be negative;
- $TPR = TP/P^*$ - true positive rate, the proportion of correct predictions among observations that are predicted to be positive;
- $FPR = FP/P^*$ - false positve rate, the proportion of incorrect predicitons among observations that are predicted to be negative.


So using these notations the confusion matrix can also be written:

<table>
  <thead>
    <tr>
      <th></th>
      <th>Predicted $N$</th>
      <th>Predicted $P$</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Actual $N$</td>
      <td>$TNR$</td>
      <td>$FPR$</td>
    </tr>
    <tr>
      <td>Actual $P$</td>
      <td>$FNR$</td>
      <td>$TPR$</td>
    </tr>
  </tbody>
</table>

## Confusion table

Many classification models allow to return a score that indicates the probability that a particular object belongs to the positive class. You can select the threshold above which you consider the object under consideration to be positive. Different treshold values will consequently produce different confusion matrixes.

The table that puts in correspondence to some selected threshold the table of contiguity will be called the confusion table.

| treshold   | $TNR$ | $FPR$ | $FNR$ | $TPR$ |
|:-----------|:-----:|:-----:|:-----:|:-----:|
| $t_1$      | $TNR_1$| $FPR_1$| $FNR_1$| $TPR_1$|
| $t_2$      | $TNR_2$| $FPR_2$| $FNR_2$| $TPR_2$|
| ...        | ...    | ...    | ...    | ...    |
| $t_i$      | $TNR_i$| $FPR_i$| $FNR_i$| $TPR_i$|
| ...        | ...    | ...    | ...    | ...    |
| $t_n$      | $TNR_n$| $FPR_n$| $FNR_n$| $TPR_n$|

In [7]:
import numpy as np
def confusion_table(
    y_true,
    y_score,
    tresholds=None
):
    if tresholds is None:
        tresholds = y_score