# Evaluating Classification Models Performance
There are multiple methods to evaluate classification model performance.

# False Positive and Negatives
We cannot always assume that the classification is correct, even for our training data set.

<img src="images/evaluation/false_positives_negatives.png" height="75%" width="75%"></img>
- The "red" data points are the testing data points, classified as either 0 or 1
- The "blue" data points are the predicted classifications of the red (testing) data points
- The "gray" data points are the classification of the blue (predicted) data points

As we can see, some of the predicted values were false for the testing set.
- Type 1 (False Positive) Error with data point #3
- Type 2 (False Negative) Error with data point #2

# Confusion Matrix
<img src="images/evaluation/confusion_matrix.png" height="75%" width="75%"></img>

We can model the "False Positive and Negatives" intuition using a Confusion Matrix.
- 40 data points were actually 0
    - 35 data points predicted 0
    - 5 data points predicted 1
- 60 data points points were actually 1
    - 10 data points predicted 0
    - 50 data points predicted 1
    
Therefore, there were 5 + 10 = 15 incorrect predictions and 35 + 50 = 85 correct predictions.
- 85% accuracy rate
- 15% error rate

# Accuracy Paradox
Let's say you have a training set of 100 people, and the model predicts if they have cancer.
- In reality, only 2 of them have cancer

You create 2 classification models.

### Model 1:
Your model simply assumes nobody has cancer.

You find out 2 people actually had cancer. Therefore, 2 incorrect predictions, so the accuracy of this model is 98%.

### Model 2:
You run a random forest classification, and you predict 5 people have cancer.

You find out 2 people actually had cancer, and 3 people did not have cancer. Therefore, 3 incorrect predictions, so the accuracy of this model is 97%.

### Comparision of Models
Even though Model 2 is less accurate, it's a far more useful algorithm than just assuming no one has cancer.

In conclusion, accuracy is not the greatest judgement of a classification model. We need to delve deeper and find out exactly what the model is doing.

# Cumulative Accuracy Profile (CAP) Curve
Based on a sample's category, we can predict that sample's classification even better than predicting its classification as if its a random category.

We can see how much gain we get in each of these models compared to the random scenario.
- Hit Ratio: Return on investment

### CAP Curve Example
Let's say there's a new video game that released, and we model the category of ages to predict if a person does or does not purchase the video game (a binary classification problem).
- People ages 0 to 10 are more likely to purchase this game compared to random ages
- People ages 10 to 20 are very likely to purchase this game compared to random ages

<img src="images/evaluation/cap_model.png" height="75%" width="75%"></img>

The "random" model is of all ages.  
The "poor" model is of the ages 0 to 10.  
The "good" model is of the ages 10 to 20.  
The "crystal ball" model is when we can see the future and predict exactly who purchases the game.
- Beware, if we can somehow create a "crystal ball" model, we overfitted to the data set

In the model, we can see that if we contact people from ages 10 to 20, we will have more purchases than if we contact people from ages 0 to 10. Therefore, a company may focus on only contacting people of ages 10 to 20 to maximize profits.

### CAP Analysis
We could do a method to get the area of the curves, compare them, and make a ratio. Unfortunately, this is complicated. Therefore, there's a better method to evaluating the CAP model performance.

#### Look at the 50% point of the independent variable(s):
<img src="images/evaluation/cap_analysis.png" height="75%" width="75%"></img>
- If Purchased < 60%, then the model is rubbish, you can make a better model
- If 60% < Purchased < 70%, then the model is poor, you can make a better model
- If 70% < Purchased < 80%, then the model is good
- If 80% < Purchased < 90%, then the model is very good, but there may be overfitting to the data set
- If 90% < Purchased , 100%, then the model is too good, but there most likely is overfitting to the data set