In [None]:
##

# Evaluation Metrics in Classification  

<img src='images/evaluation.png' width='600' height='450'>

## Confusion Matrix for Performance Evaluation  

### 1. Introduction  
The **Confusion Matrix** is a table that helps evaluate the performance of a classification model. It shows the actual vs. predicted values, helping to analyze errors and performance in a detailed manner.  

### 2. Structure of the Confusion Matrix  

<img src='images/roc.png' width=250>


| Predicted \ Actual | Positive (1) | Negative (0) |
|--------------------|-------------|-------------|
| Positive (1)      | **TP**      | **FP**      |
| Negative (0)      | **FN**      | **TN**      |

- **True Positive (TP):** Model correctly predicts a positive class.  
- **False Positive (FP):** Model incorrectly predicts positive instead of negative (Type I Error).  
- **False Negative (FN):** Model incorrectly predicts negative instead of positive (Type II Error).  
- **True Negative (TN):** Model correctly predicts a negative class.  


### 3. Example with Dummy Data  
Assume we have **5 samples** where a binary classification model predicts whether a person has a disease (1) or not (0).  

#### Actual vs. Predicted Data  
| Sample No. | Actual Value | Predicted Value |
|------------|-------------|----------------|
| 1          | 1           | 1              |
| 2          | 0           | 0              |
| 3          | 1           | 0              |
| 4          | 0           | 1              |
| 5          | 1           | 1              |

#### Constructing the Confusion Matrix  

| Predicted \ Actual | Positive (1) | Negative (0) |
|--------------------|-------------|-------------|
| Positive (1)      | 2 (TP)     | 1 (FP)     |
| Negative (0)      | 1 (FN)     | 1 (TN)     |


In [None]:
##

In [None]:
##

In [None]:
##

## 2. Accuracy:

### Definition  
Accuracy is one of the most basic and commonly used evaluation metrics for classification models. It measures how many predictions a model made correctly out of the total predictions.  

### Formula  
<img src='images/accuracy.png' width='700'>

Where:  
- TP (True Positive): Correctly predicted positive instances  
- TN (True Negative): Correctly predicted negative instances  
- FP (False Positive): Incorrectly predicted positive instances  
- FN (False Negative): Incorrectly predicted negative instances  

## Example  
Suppose we have a dataset with 100 samples, and a classification model predicts the labels. The confusion matrix is:  

| Predicted \ Actual | Positive (1) | Negative (0) |
|--------------------|-------------|-------------|
| Positive (1)      | 45 (TP)     | 5 (FP)      |
| Negative (0)      | 10 (FN)     | 40 (TN)     |

Now, calculating accuracy:  

Accuracy = (45 + 40) / (45 + 40 + 10 + 5)
= 85 / 100
= 0.85 (85%)



---

### Cases Where Accuracy is NOT a Good Metric  

#### 1. When Test Data is Imbalanced  
- If the dataset is highly imbalanced (one class is much more frequent than the other), accuracy can be misleading.  

##### Example:  
- Suppose we have a dataset of **1000 samples**, where **950 are class 0 (negative)** and only **50 are class 1 (positive)**.  
- A model predicts **all instances as class 0** (negative).  
- Accuracy will be:  Accuracy = 950 / 1000 = 0.95 (95%)
- Even though the accuracy is **very high (95%)**, the model is completely useless for detecting class 1.  
- In such cases, we should use **Precision, Recall, or F1-score** instead.  

#### 2. When the Model Returns Probability Scores Instead of Class Labels  
- Some models (like Logistic Regression, Neural Networks) output probability scores instead of directly predicting classes.  

<img src='images/prob score.png' width=700>

##### Example:  
- Suppose a model predicts **0.7 probability for class 1** and **0.3 for class 0**, and we set a threshold of **0.5** to classify instances.  
- If the threshold is changed (e.g., from 0.5 to 0.8), the classification results will change, affecting accuracy.  
- Using accuracy does not consider the probability scores, so **AUC-ROC or Log Loss** is better in such cases.  

In [None]:
##

In [None]:
##

## Precision in Classification  

### 1. Definition  
Precision is a performance metric used in classification models to measure the accuracy of **positive predictions**. It tells us how many of the instances predicted as positive were actually correct.  

### 2. Formula  
Precision = TP / (TP + FP)

Where:  
- **TP (True Positive):** Correctly predicted positive cases.  
- **FP (False Positive):** Incorrectly predicted positive cases.  

### 3. Example  

-  **Scenario:** Spam Email Classification  
A model predicts whether an email is spam (1) or not spam (0). Given 100 emails, the confusion matrix is:  

| Predicted \ Actual | Spam (1) | Not Spam (0) |
|--------------------|---------|-------------|
| Spam (1)          | 40 (TP) | 15 (FP)     |
| Not Spam (0)      | 10 (FN) | 35 (TN)     |

- **Calculating Precision:**  
Precision = TP / (TP + FP) = 40 / (40 + 15) = 40 / 55 = 0.727 (72.7%)


4. Interpretation  
- **High Precision (close to 1):** The model makes very few false positive errors, meaning it rarely misclassifies non-spam emails as spam.  
- **Low Precision (close to 0):** The model has many false positives, meaning it often marks important emails as spam.  


6. Limitations of Precision  
- Does **not consider false negatives**, which might be important in some cases.  
- A model can achieve high precision by predicting fewer positives, but this might **lower recall** (missing actual positive cases).  



In [None]:
##

In [None]:
##

## Recall in Classification  

### 1. Definition  
Recall is a performance metric used in classification models to measure how well the model identifies **actual positive instances**. It tells us how many of the **actual positive cases** were correctly predicted.  


### 2. Formula  
Recall = TP / (TP + FN)

Where:  
- **TP (True Positive):** Correctly predicted positive cases.  
- **FN (False Negative):** Incorrectly predicted negative cases.  


### 3. Example  

**Scenario:** Spam Email Classification  
A model predicts whether an email is spam (1) or not spam (0). Given 100 emails, the confusion matrix is:  

| Predicted \ Actual | Spam (1) | Not Spam (0) |
|--------------------|---------|-------------|
| Spam (1)          | 40 (TP) | 10 (FP)     |
| Not Spam (0)      | 15 (FN) | 35 (TN)     |

**Calculating Recall:**  
Recall = TP / (TP + FN) = 40 / (40 + 15) = 40 / 55 = 0.727 (72.7%)


### 4. Interpretation  
- **High Recall (close to 1):** The model correctly identifies most actual positive cases (spam emails).  
- **Low Recall (close to 0):** The model misses many actual positive cases, meaning many spam emails are left undetected.  


### 5. Limitations of Recall  
- Does **not consider false positives**, which might be important in some cases.  
- A model can achieve high recall by predicting most instances as positive, but this might **lower precision** (increasing false positives).  


In [None]:
##

In [None]:
##

## F1-Score in Classification  

### 1. Definition  
F1-Score is a performance metric used in classification models that balances **Precision** and **Recall**. It is useful when both false positives and false negatives are important.  

F1-Score is the **harmonic mean** of Precision and Recall, ensuring that both are considered together.  


### 2. Formula  
F1-Score = 2 * (Precision * Recall) / (Precision + Recall)

Where:  
- **Precision = TP / (TP + FP)**  
- **Recall = TP / (TP + FN)**  


In [None]:
##

In [None]:
##

## ROC Curve

The `ROC Curve` (Receiver Operating Characteristic Curve) is a graphical representation that shows the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR) at different classification thresholds.

<img src='images/tpr.png' width='400'>

### How It Works?
- The model predicts probabilities for each class.
- By selecting different probability thresholds, we classify instances as positive or negative.
- For each threshold, we compute TPR and FPR and plot them on a graph.
- The x-axis represents FPR, and the y-axis represents TPR.
- The curve shows how well the model separates the positive and negative classes.

### Example Calculation:

<img src='images/ex.png' width='600'>

<img src='images/read_roc1.png' width='500'>

### Interpretation of ROC Curve
- A perfect model: A curve that reaches (0,1) in the top-left corner indicates a perfect classifier (AUC = 1).
- A random model: A diagonal line (AUC = 0.5) represents random guessing.
- Better models: A curve above the diagonal shows better classification performance.


## AUC (Area Under the Curve)

The AUC (Area Under the ROC Curve) measures the overall performance of a classification model. It represents the probability that a randomly chosen positive instance ranks higher than a randomly chosen negative instance.

### AUC Working
- Higher AUC (closer to 1): The model performs well in distinguishing between classes.
- AUC = 0.5: The model is no better than random guessing.
- Lower AUC (close to 0): The model is performing worse than a random classifier (predicting the opposite class).

### Interpretation
<img src='images/AUC.png' width='700'>

In [None]:
##

In [None]:
##