# Section 19: Evaluations for Classification

### Lecture 126: False Positives and Negatives
- False Positive (Type 1 error)
    - We predicted a positive case but it was wrong
    - We predicted a positive outcome but that was false
- False Negative (Type 2 error)
    - We predicted a negative outcome, but the opposite happened
    - We predicted a negative outcome but that was false 
- Type 1 Error is less dangerous that a Type 2 Error
- Type 1 error is a warning
- Type 2 error is red flad
- Example:
    - Type 1 error could be that you predicted an earthquake that never happened
    - Type 2 error could be that you did NOT predicte an earthquake and it HAPPENED

### Lecture 127: Confusion Matrix
- Look below for the code
- **Accuracy Rate: Correct/Total**
- **Error Rate: Wrong/Total**


- PROBLEMS
    - When you are data has more than 2 classes. With 3 or more classes you may get a classification accuracy of 80%, but you don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model.
    - When your data does not have an even number of classes. You may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and you can achieve this score by always predicting the most common class value.
        - Andrew Ng explains that if you have a sample size that is weighted towards one class, predicting that outcome might provide an accurate prediction
    Precision: Out of all the patient that we predicted that have cancer (or 1), what fraction actually have cancer?


- Alternatives
    - **Precision**: Out of all the patient that we predicted that have cancer (or 1), what fraction actually have cancer?
        - TRUE POSITIVES/PREDICTED POSITIVES -> TRUE POSITIVES/(TRUE POSITIVES + FALSE POSITIVES)
        - Using the confusion matrix, this is represented as row 1
    - **Recall**: Out of all the patient that actually have cancer (or 1), what fraction did we correctly detect as having cancer?
        - TRUE POSITIVES/ACTUAL POSITIVES -> TRUE POSITIVES/(TRUE POSITIVES + FALSE NEGATIVES)
        - Using the confusion matrix, this is represented as col. 1 
    - **Trade-Off**
        - For precision, we are using the values that we positive predictions and not all the predictions. So if we didn't predict ALL the positive values, if we did well on the predicitons we made, we have a good precision.
        - However, recall looks at all the actual values (that are 1) and check our performance. IT doesn't care if predicted all the values 1. It would return a high recall score since we predicted 1 to most of the actual 1's.


In [27]:
import numpy as np
pred = np.array(np.random.randint(low=0, high=2, size=10))
true = np.array(np.random.randint(low=0, high=2, size=10))

In [31]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true, pred)
tn, fp, fn, tp = cm.ravel()

print(
    "true pos: {0}\n"
    "false pos: {1}\n"
    "true neg: {2}\n"
    "false neg: {3}\n".format(tp, fp, tn, fn))

true pos: 1
false pos: 2
true neg: 3
false neg: 4



### Lecture 128: Confusion Matrix Paradox
- Cannot based it on just accuracy because we might have better predictions if we only choose 1 class (the one with a larger sample)
- <img src="../archive/AZMachineLearning_Per1.png">
- <img src="../archive/AZMachineLearning_Per2.png">

### Lecture 128: CAP Curve
- The "LINE" is the average that convert based on historical info
    - In the example: 10% answer our offers in our retail business
    - <img src="../archive/AZMachineLearning_Per4.png">
- You can compare it with other models. The curve line can tell us how good are model is performing. The closer to the top, the better the model.
- We have to perform better than our historic performance
- The Crystal Model (the best one), would be that out of total contacted, (when we hit 10%), we get 10% of people to respond. We are using our time and money resourceful.
- <img src="../archive/AZMachineLearning_Per5.png">

### CAP Curve VS. ROC Curve
- <img src="../archive/AZMachineLearning_Per8.png"alt="Drawing" style="width: 300px;"/>
- The image below
    - To plot the graohin the ROC curve, we use the False Positive Rate and True Positive Rate
    - Thus, given a threshold, the TPR is the ratio of the true to the positive total (shown by the red graph)
    - The FPR would be if any values from the Blue graph were to the right of the threshold. The ratio would be blue values to the right of the threshold out of all the negative total (shown by the blue graph)
<img src="../archive/AZMachineLearning_Per9.png"alt="Drawing" style="width: 400px;"/>


- A GOOD ROC Curve is one that hugs the upper right corner
<img src="../archive/AZMachineLearning_Per10.png"alt="Drawing" style="width: 300px;"/>

- A BAD ROC Curve is one that's similar to the linear line
<img src="../archive/AZMachineLearning_Per11.png"alt="Drawing" style="width: 300px;"/>

- AUC: Area under the Curve
    - Our goal is to have an AUC close to 1

- **Important Notes**
    - ROC Curves are useful even when your predictions are not "properly caliberated"
    - ROC Curves can be extended to problems with three or more classes
        - class 1 vs classes 2 & 3
        - class 2 vs classes 1 & 3
        - class 3 vs classes 1 & 2

### Lecture 129: CAP Analysis
- We find the area of the Crystal Ball Model
- We find the basic or average model
- We fit our model within these constaints and evaulate a percentage compariing it to ht e total
<img src="../archive/AZMachineLearning_Per6.png"alt="Drawing" style="width: 400px;"/>

- Another way to evaulate is using the 50% in the y-axis
<img src="../archive/AZMachineLearning_Per7.png" alt="Drawing" style="width: 400px;"/>


###  Lecture 131: How do I know which model to choose for my problem ?

Same as for regression models, you first need to figure out whether your problem is linear or non linear. You will learn how to do that in Part 10 - Model Selection. Then:

If your problem is linear, you should go for Logistic Regression or SVM.

If your problem is non linear, you should go for K-NN, Naive Bayes, Decision Tree or Random Forest.

Then from a business point of view, you would rather use:

- Logistic Regression or Naive Bayes when you want to rank your predictions by their probability. For example if you want to rank your customers from the highest probability that they buy a certain product, to the lowest probability. Eventually that allows you to target your marketing campaigns. And of course for this type of business problem, you should use Logistic Regression if your problem is linear, and Naive Bayes if your problem is non linear.

- SVM when you want to predict to which segment your customers belong to. Segments can be any kind of segments, for example some market segments you identified earlier with clustering.

- Decision Tree when you want to have clear interpretation of your model results,

- Random Forest when you are just looking for high performance with less need for interpretation. 


- 2. How do I know which model to choose for my problem ?

Same as for regression models, you first need to figure out whether your problem is linear or non linear. You will learn how to do that in Part 10 - Model Selection. Then:

If your problem is linear, you should go for Logistic Regression or SVM.

If your problem is non linear, you should go for K-NN, Naive Bayes, Decision Tree or Random Forest.

Then from a business point of view, you would rather use:

- Logistic Regression or Naive Bayes when you want to rank your predictions by their probability. For example if you want to rank your customers from the highest probability that they buy a certain product, to the lowest probability. Eventually that allows you to target your marketing campaigns. And of course for this type of business problem, you should use Logistic Regression if your problem is linear, and Naive Bayes if your problem is non linear.

- SVM when you want to predict to which segment your customers belong to. Segments can be any kind of segments, for example some market segments you identified earlier with clustering.

- Decision Tree when you want to have clear interpretation of your model results,

- Random Forest when you are just looking for high performance with less need for interpretation. 

