# Confusion Matrix

- We mentioned a way to view various metrics of classification is the **confusion matrix**.
- Let's explore the basics of the confusion matrix to help us understand precision and recall better.

- In a classification problem, during the testing pahase you will have Two Categories:
    - True Condition
        - A text message is SPAM
    - Predicted Condition
        - ML Model predicted SPAM\*
        
> \*Keep in mind it could have also predicted it that it incorrectly as HAM.

This means if you have two possible classes you should have 4 separate groups at the end of testing:

- Correctly classified to Class 1: TRUE HAM

> So that means you had a message that was HAM and you positively identified it as HAM.

- Correctly classified to Class 2: TRUE SPAM

> So something was truly SPAM and then you correctly identified it as SPAM.

- **Incorrectly** classified to Class 1: FALSE HAM
- **Incorrectly** classified to Class 2: FALSE SPAM

If we were to map these out in a little grid we'd have something that looks like this:

![](../imgs/i09.png)

This is the confusion matrix. If you look it up on [Wikipedia](https://en.wikipedia.org/wiki/Confusion_matrix), which is actually a really helpful article on this, it would map out something that looks like this.

If we were to actually look at it a little more simplified for our particular example about text messages:

![](../imgs/i10.png)

- Again we have the real condition and the predicted condition.
- so we can see over on the left hand side if we're going to have two real conditions:
    - The real condition is either HAM 
    - or real condition SPAM
- And then along the columns we have our predicted condition:
    - Predicting HAM or 
    - Predicting SPAM

- You'll notice that if the real condition is HAM and we predicted HAM then we have a **true positive**.
- Along the predicted condition, We have a **false negative**. That means that the real condition was HAM but our machine learning model incorrectly predicted it to be SPAM.

> Here we are relabeling HAM as positive and SPAM as negative.

- We also see that we have real conditions SPAM and then predicted HAM.That's known as a **false positive** that falsely identifying something to the positive class which in this case is HAM.
- We also finally **true negative** correctly identifying something to the negative class predicting SPAM for SPAM text message.

If we come back to that other confusion matrix that we saw earlier, we can expand on this to have quite a wide variety of different metrics we can calculate. Things like **true positive rate**, **false positive rate**, **positive likelihood ratio**, **false emission rate**, etc. But really we're just concerned for a few of these. 

We're concerned with **Recall**, **Accuracy** and **Precision**.

![](../imgs/i11.png)

- The main point to remember is the confusion matrix and the various calculated metrics is that they are all fundamentally ways of comparing the predicted values versus the true values.
- What constitutes "good" metrics will really depend on the specific situation. In some situations 99% accuracy is fantastic. In other situations 99% accuracy may actually not be good enough for whatever you're are trying to predict because maybe it comes at the cost of a really poor precision and poor recall.

>So we can't just say that there's certain good values for particular metrics. Obviously if you get 100% across precision, accuracy and recall then you have a really good model. But in the real world you're probably not going to get 100% of all those.





So let's go ahead and use confusion matrix to evaluate our model.

Let's imagine now we're testing for a disease.

So in this example we're going to test for the presence of a disease and we have the actual patients

come in.

Remember this is supervised learning.

So before we actually run them through the testing program we actually already know the true conditions

of these patients whether or not they had disease or don't have these.

So you can imagine that we're testing a new diagnostic tool.

So in this case for example test for presence of disease we'll say no is equal to a negative test or

false.

Often you just say that zero or yes is a positive test which is true.

And again you say that's one.

So in this particular example the total number of patients we have for this new diagnostic test is 165.

So we say end is equal to 165 and then we have the results as follows of the condition that people did

not have the condition.

Maybe we're testing for something like a cancer in this particular example.

There's 50 people that didn't have cancer that we correctly predicted they don't have cancer.

So we say predicted no actual no 50 then we can also see that we actually predicted 10 people to have

disease.

And these people actually did not have disease.

So those 10 were incorrect that we also see that we have actually yes and predicted no as 5 and then

actually yes or predicted Yes as 100

so some basic terminology here are true positives true negatives false positives and false negatives.

So you can see here on the great how that actually lines up and keep in mind you can kind of flip this

the order no versus Yes on either the columns or the rows in order to flip the quadrants of negatives

versus positives in the original confusion matrix we showed it in this lecture that was actually flipped.

So keep in mind you may see the confusion matrix both ways.

But other than that the information presented is still the same.

It's just the quadrants are slightly rotated.

So if we're trying to answer a question like What is the accuracy of this test.

We ask ourselves how often is it correct and for accuracy it's just equal to true positives.

Plus true negatives divided by the total essentially asking the question How many did I classify correctly

over all my examples.

And in this case we get 150 divided by the total number which was 165 and that means that our test was

91 percent accurate.

Now is 91 percent accurate.

Good enough.

That really depends on the situation.

If you're dealing with something that's as high stakes as something like cancer 91 percent accuracy

may not be good enough.

And you also have to take into the context of precision and recall notice that a really important statistic

here is the false negative the false negative means that you knew this person when you're going through

the test actually had the disease.

But the machine learning model predicted them as not having the disease.

That's an extremely dangerous situation to be in when the stakes are very high like a cancer diagnostic

because that means someone that actually has a disease you're telling them they do not have it.

So you have to keep in context the entire idea of what you're machine learning model is trying to achieve.

So there's always going to be a bit of a tradeoff between false negatives and false positives and ideally

for something like this where we're dealing with a really high stakes situation and we want to make

sure we minimize the false negatives.

It doesn't matter too much in kind of a diagnostic sense.

If we have a larger amount of false positives in order to lower our false negatives because we would

rather set up a situation where we tell a patient that they have a disease when they actually don't

have it and then conclude that they are in line now for further diagnostic tests.

Again it depends on the context of what the next steps are.

What we'd really like to avoid in this situation when it comes to disease is telling someone they're

clear of the disease when they actually have it.

So again there's no right or wrong answer as far as what your false positive rate or false negative

rate should be.

It really depends on the context of the situation and how important each of these are in the overall

study.

Now there's other things you can calculate such as the misclassification Ree or Airey.

That's another one that's essentially the reverse of accuracy.

It's just asking overall how often am I wrong.

So that's false positives plus false negatives divided by a total or 100 minus accuracy.

So in this case where 9 percent error rate in other common thing to keep in mind is that in statistics

false positives and false negatives are often referred to as type 1 errors and type 2 errors.

And here you can see kind of a funny example in order to keep in mind the differences between the two.

A false positive.

Here we're telling the man that they're pregnant.

Clearly this person cannot be pregnant or the false negative in which case this woman is clearly pregnant.

What we're telling them they're not pregnant.

So keep mine in statistics.

You may see type 1 error and type 2 error instead of the terms false positives and false negatives.

If you're still confused the confusion matrix don't worry too much about it.

I would encourage you to check out the Wikipedia page for it.

It has a really good diagram that we saw during this lecture with all the formulas for all the metrics

throughout the training.

What we're going to be doing is just printing out metrics for example just print the accuracy or print

out a confusion matrix or print out what's known as a classification report which reports back precision

recall an F1 score.

Again it takes time to really get an understanding of these metrics.

And more importantly an intuition behind them.

Check out the resource links for this lecture in order to help your understanding of things like precision

recall and accuracy.

All right.

Coming up next we're going to discuss a primer into using Python's sikat learn machine learning library

will see at the next lecture.

