Here are my notes on the [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix). That page has a lot of detail and I found it useful to work through a specific example.

<!-- TEASER_END -->

## Simple `knn` model

To create a confusion matrix, we will use a `k`-nearest neighbors model to predict default on the `Caravan` data set from `ISLR`. See section 4.6 in the [ISLR book](http://www-bcf.usc.edu/~gareth/ISL/) for details. We will be focused on just using the actual vs. predicted result.

In [1]:
library(ISLR)
library(class)

## Load data and split into test and training
X <- scale(Caravan[, -86])
test <- 1:1000
trX <- X[-test,]  # training data
teX <- X[test,]   # test data
trY <- Caravan$Purchase[-test]  # training target
teY <- Caravan$Purchase[test]   # test target

## knn uses some randomness e.g. to break ties, set seed for reproducibility
set.seed(1)

For the target variable, 'Yes' means a default occured. For reference, the probability of being 'Yes' in the test data is:

In [2]:
cat(mean(teY == "Yes")*100, "%\n")

5.9 %


Fit a model using all predictors and 3 nearest neighbors.

In [3]:
model <- knn(trX, teX, trY, k = 3)

confusion_matrix <- function(actual, predicted) {
  x <- addmargins(table(actual = teY, predicted = model))
  # jupyter notebooks drop some dimnames when printing because
  # 'matrix' gets added to the classes after a call to addmargins,
  # just reset the class to 'table' and it is fine
  class(x) <- "table"
  x
}

confusion_matrix(actual = teY, predicted = model)

      predicted
actual   No  Yes  Sum
   No   921   20  941
   Yes   54    5   59
   Sum  975   25 1000

## Confusion matrix

The last result is a confusion matrix. It shows actual vs. predicted. Here is a generalized version of a confusion matrix with variable names we will use below. `Neg` and `Pos` are somewhat arbitrary names but they come from terminology like "false positive".

| Confusion matrix |       |             |     |       |
|------------------|-------|-------------|-----|-------|
|                  |       | Predicted   |     |       |
|                  |       | Neg         | Pos | Total |
| Actual           | Neg   | TN          | FP  | N     |
|                  | Pos   | FN          | TP  | P     |
|                  | Total | N\*         | P\* | T     |

**True negatives** (TN) are predicted negative and are actually negative. **True positives** (TP) are predicted positive and are actually positive.

**False negatives** (FN) are predicted negative but are actually positive. **False positives** (FP) are predicted positive but actually negative.

The **accuracy** is the fraction of samples classified correctly whereas the **error** is the fraction of samples misclassified. Hence, $accuracy = 1 - error$.

The **specificity**, aka true negative rate, is the fraction of actual negative cases that were predicted to be negative.

The **sensitivity**, aka true positive rate, recall, or power, is the fraction of actual positive cases that were predicted to be positive.

A **Type I error** occurs when we predict positive but it is actually negative. The false positive rate is the fraction of actually negative cases that were predicted to be positive. This is also $1 - specificity$.

A **Type II error** occurs when we predict negative but it is actually positive. The false negative rate is the fraction of actually positive cases that were predicted to be negative. This is also $1 - sensitivity$.

The **precision** is the fraction of predicted true cases that are actually true.

The **false discovery rate (FDR)** is the fraction of predicted true cases that are actually false.

Here is a table that summarizes these names and a few aliases etc.

| Formula       | Alt      | Names                     |               |               |                 |       |
|---------------|----------|---------------------------|---------------|---------------|-----------------|-------|
| (TN+TP)/T     |          | Accuracy                  |               |               |                 |       |
| (FN+FP)/T     |          | Error                     |               |               |                 |       |
| TN/(TN+FP)    | TN / N   | True negative rate        | Specificity   |               | 1-Type I error  |       |
| FP/(TN+FP)    | FP / N   | False positive rate       | 1-Specificity |               | Type I error    |       |
| FN/(FN+TP)    | FN / P   | False negative rate       | 1-Sensitivity |               | Type II error   |       |
| TP/(FN+TP)    | TP / P   | True positive rate        | Sensitivity   | Recall        | 1-Type II error | Power |
| TN/(TN+FN)    | TN / N\* | Negative predictive value |               |               |                 |       |
| FN/(TN+FN)    | FN / N\* | False omission rate       |               |               |                 |       |
| FP/(FP+TP)    | FP / P\* | False discovery rate      |               | 1 - Precision |                 |       |
| TP/(FP+TP)    | TP / P\* | Positive predictive value |               | Precision     |                 |       |

Finally, here is an R function that summarizes our discussion.

In [4]:
confusion_stats <- function(actual, predicted) {
  tab <- table(actual = actual, predicted = predicted)
    
  TN <- tab[1,1]  # true negatives
  TP <- tab[2,2]  # true positives
  FN <- tab[2,1]  # false negatives
  FP <- tab[1,2]  # false positives
  N <- TN + FP    # actual negatives
  P <- FN + TP    # actual positives
  PN <- TN + FN   # predicted negatives, N*
  PP <- FP + FP   # predicted positives, P*
  T <- sum(tab)
  
  x <- data.frame(
      rbind(c("Accuracy",    "(TN+TP)/T", sprintf("%s/%s", TN+TP, T), sprintf("%.2f%%", 100*(TN+TP)/T)),
            c("Error",       "(FN+FP)/T", sprintf("%s/%s", FN+FP, T), sprintf("%.2f%%", 100*(FN+FP)/T)),
            c("Specificity", "TN / N",    sprintf("%s/%s", TN, N),    sprintf("%.2f%%", 100*TN/N)),
            c("Sensitivity", "TP / P",    sprintf("%s/%s", TP, P),    sprintf("%.2f%%", 100*TP/P)),
            c("False Disc.", "FP / P*",   sprintf("%s/%s", FP, PP),   sprintf("%.2f%%", 100*FP/PP)),
            c("Precision",   "TP / P*",   sprintf("%s/%s", TP, PP),   sprintf("%.2f%%", 100*TP/PP))))
  colnames(x) <- c("Name", "Formula", "Result", "Numeric")
  x
}

# repeat the confusion matrix here for reference
confusion_matrix(actual = teY, predicted = model)
confusion_stats(actual = teY, predicted = model)

      predicted
actual   No  Yes  Sum
   No   921   20  941
   Yes   54    5   59
   Sum  975   25 1000

Name,Formula,Result,Numeric
Accuracy,(TN+TP)/T,926/1000,92.60%
Error,(FN+FP)/T,74/1000,7.40%
Specificity,TN / N,921/941,97.87%
Sensitivity,TP / P,5/59,8.47%
False Disc.,FP / P*,20/40,50.00%
Precision,TP / P*,5/40,12.50%
