# Confusion Matrix
p. 8 - 20

Wikipedia says that a confusion matrix "is a specific table layout that allows visualization of the performance 
of an algorithm."

In the book, the author uses a binomial confusion matrix to shows a version with true positives, false negatives, 
false positives, and true negatives. One example from p. 8:

|  | Have Cancer (1,000) | Don't have Cancer (99,000) |
| --- | --- | --- |
| Test Positive | 80% of those that have cancer<br> (true positives) <br> 800 | 10% of those that don't have cancer <br>  (false positives) <br>  9,900 |
| Test Negative | 20% of those that have cancer<br> (false negatives) <br> 200 | 90% of those that don't have cancer <br> (true negatives)<br>  89,100 |

Of the 10,700 people tested positive (800 + 9,900), only 800 had cancer, or about 7%.

The code below calculates several different binomial confusion matrices found in the Introduction.

In [None]:
# This uses Python 3.9.6 (but should work on almost any 3.* Python.)
# Calculating the different binomial confusion matrices.
# This version only works when we know the total population, population positive rate, sensitivity, and specifity.

class ConfusionMatrix:

    def __init__(self, page, total_population, population_positive_rate, sensitivity, specificity):
        self.page = page
        self.total_population = total_population
        self.population_positive_rate = population_positive_rate
        self.sensitivity = sensitivity
        self.specificity = specificity

    def population_positive(self):
        return self.total_population * self.population_positive_rate

    def population_negative(self):
        return self.total_population * (1 - self.population_positive_rate)

    def true_positive(self):
        return self.population_positive() * self.sensitivity

    def false_negative(self):
        return self.population_positive() * (1 - self.sensitivity)

    def false_positive(self):
        return self.population_negative() * (1 - self.specificity)

    def true_negative(self):
        return self.population_negative() * self.specificity

    def chance_if_tested_positive_of_actually_being_positive(self):
        return 100 * self.true_positive() / (self.true_positive() + self.false_positive())

    def chance_if_tested_negative_of_actually_being_negative(self):
        return 100 * self.true_negative() / (self.true_negative() + self.false_negative())

    def print_details(self):
        citpoabp = self.chance_if_tested_positive_of_actually_being_positive()
        citnoabn = self.chance_if_tested_negative_of_actually_being_negative()

        print(f"This confusion matrix on page {self.page} has a population of {self.total_population}")
        print(f"Positive rate of {self.population_positive_rate * 100}%")
        print(f"Sensitivity rate of {self.sensitivity * 100}%")
        print(f"Specificity rate of {self.specificity * 100}%")
        print(f"Confusion Matrix:")
        print(f"True Positives: {self.true_positive():.0f}")
        print(f"False Positives: {self.false_positive():.0f}")
        print(f"False Negatives: {self.false_negative():.0f}")
        print(f"True Negatives: {self.true_negative():.0f}")
        print(f"The chance of testing positive and actually being positive is {citpoabp:.2f}%.")
        print(f"The chance of testing negative and actually being negative is {citnoabn:.2f}%.")
        print("")

In [2]:
# First one on page 8.
confusion_matrix_1 = ConfusionMatrix("8 (First one)", 100000.0, 0.01, 0.80, 0.90)
confusion_matrix_1.print_details()

This confusion matrix on page 8 (First one) has a population of 100000.0
Positive rate of 1.0%
Sensitivity rate of 80.0%
Specificity rate of 90.0%
Confusion Matrix:
True Positives: 800
False Positives: 9900
False Negatives: 200
True Negatives: 89100
The chance of testing positive and actually being positive is 7.48%.
The chance of testing negative and actually being negative is 99.78%.



In [3]:
# Second one on page 8.
confusion_matrix_2 = ConfusionMatrix("8 (Second one)", 100000.0, 0.10, 0.80, 0.90)
confusion_matrix_2.print_details()

This confusion matrix on page 8 (Second one) has a population of 100000.0
Positive rate of 10.0%
Sensitivity rate of 80.0%
Specificity rate of 90.0%
Confusion Matrix:
True Positives: 8000
False Positives: 9000
False Negatives: 2000
True Negatives: 81000
The chance of testing positive and actually being positive is 47.06%.
The chance of testing negative and actually being negative is 97.59%.



In [4]:
# Page 11.
confusion_matrix_3 = ConfusionMatrix("11", 1000000.0, 0.03, 0.95, 0.95)
confusion_matrix_3.print_details()

This confusion matrix on page 11 has a population of 1000000.0
Positive rate of 3.0%
Sensitivity rate of 95.0%
Specificity rate of 95.0%
Confusion Matrix:
True Positives: 28500
False Positives: 48500
False Negatives: 1500
True Negatives: 921500
The chance of testing positive and actually being positive is 37.01%.
The chance of testing negative and actually being negative is 99.84%.



In [5]:
# Page 13.
confusion_matrix_4 = ConfusionMatrix("13", 1000000.0, 0.001, 0.99, 0.99)
confusion_matrix_4.print_details()

This confusion matrix on page 13 has a population of 1000000.0
Positive rate of 0.1%
Sensitivity rate of 99.0%
Specificity rate of 99.0%
Confusion Matrix:
True Positives: 990
False Positives: 9990
False Negatives: 10
True Negatives: 989010
The chance of testing positive and actually being positive is 9.02%.
The chance of testing negative and actually being negative is 100.00%.



## Possible expansions:
[x] Change Markdown file and Python file to Jupyter Notebook<br>
[] Add data visualizations<br>
[] Make the numbers more human readable.<br>
[] Add the other confusion matrices. These didn't have the same inputs as the others.<br>
[] Plot out the confusion matrices.<br>