### UBC Extended Learning
#### Instructor: Socorro Dominguez
#### Module 07

Overview:

- [] Explain why accuracy is not always the best metric in ML.
- [] Explain components of a confusion matrix.
- [] Define precision, recall, and f1-score 
- [] Identify whether there is class imbalance and whether you need to deal with it.
- [] Explain class_weight and use it to deal with data imbalance.
- [] Appropriately select a scoring metric given a regression problem.
- [] Interpret and communicate the meanings of different scoring metrics on regression problems. MSE, RMSE,  R2 , MAPE.
- [] Apply different scoring functions with cross_validate and GridSearchCV and RandomizedSearchCV.

## Accuracy: Not Always the Best Metric
- Accuracy is a commonly used metric to evaluate model performance, but it may not always be the best choice.

- **Example:** Let's say we have a model that predicts whether an email is spam or not. Out of 1000 emails, 950 are not spam (negative) and 50 are spam (positive).
  - The model correctly classifies 920 non-spam emails and 10 spam emails. 
  - Accuracy = (920 + 10) / 1000 = 93%
- At first glance, 93% accuracy seems good. However, the model misses 40 spam emails (false negatives), which could be harmful.

## Components of a Confusion Matrix
- The confusion matrix is a table used to evaluate the performance of a classification model.

- **Example:** Continuing from the previous example:
  
  |               | Predicted Negative | Predicted Positive |
  |---------------|--------------------|--------------------|
  | Actual Negative (Non-Spam) | 920 (True Negative) | 30 (False Positive) |
  | Actual Positive (Spam)     | 40 (False Negative) | 10 (True Positive)  |

## Precision, Recall, and F1-Score
- These metrics provide deeper insights into a model's performance, especially in imbalanced datasets.
- **Example:** Let's use the confusion matrix from the previous example to calculate precision, recall, and F1-score:
  
  - **Precision** = TP / (TP + FP) = 10 / (10 + 30) ≈ 25%
    - Out of the emails predicted as spam, only 25% are actually spam.
  - **Recall** = TP / (TP + FN) = 10 / (10 + 40) ≈ 20%
    - The model correctly identifies only 20% of the actual spam emails.
  - **F1-Score** = 2 * (Precision * Recall) / (Precision + Recall)
    - F1-Score ≈ 22.22%
    - The F1-Score balances precision and recall, providing a single metric to assess model performance.

### Recall or Precision?

For example, let's think about the medical field, there are cases when we should prioritze recall.

- In critical medical conditions, where FN can have severe consequences or even be life-threatening.

- For diseases that are highly contagious (think of COVID-19), where early detection is crucial for containment and prevention of further spread.

There are cases where Precision could be more important:

- In situations where false positives have significant negative impacts, such as unnecessary treatments, interventions, or psychological distress for healthy individuals.

In most cases, you may want to prioritize both. You don't want to miss a cancer patient, but you don't want to give radioterapy to a healthy individual. The problem that you have, will tell you which measure you should prioritize.

## Dealing with Class Imbalance
- Class imbalance occurs when one class dominates the dataset, leading to biased model performance.

- **Example**: Let's consider a medical diagnosis model for a rare disease. Out of 1000 patients, only 10 have the disease (positive class).
- Dealing with imbalance: Resampling, using different evaluation metrics, and using `class_weight`.

### Class Weight to Handle Imbalance
- `class_weight` is a technique used to assign higher weights to minority classes during model training.
- **Example**: In our medical diagnosis model, we can set `class_weight='balanced'` to give more importance to the 10 positive cases during training.


## Scoring Metrics for Regression Problems
- Regression problems require different evaluation metrics than classification.
- **Mean Squared Error (MSE)**: Measures the average squared difference between predicted and actual values.
  - **Example**: Let's predict house prices, and we have the following actual and predicted prices:
    - Actual Prices: [200, 300, 400, 500]
    - Predicted Prices: [180, 280, 350, 480]
    - MSE = ((200-180)^2 + (300-280)^2 + (400-350)^2 + (500-480)^2) / 4 = 1050

- **Root Mean Squared Error (RMSE)**: The square root of MSE, providing errors in the original units.
  - **Example**: RMSE ≈ √1050 ≈ 32.40 (measured in the same unit as house prices).

## Interpretation of Regression Scoring Metrics
- **R-squared (R2)**: Indicates the proportion of variance in the target variable explained by the model.
  - **Example**: R2 = 1 - (MSE of the model / MSE of the mean model)
  - R2 = 1 - (1050 / ((200+300+400+500)/4)^2) ≈ 0.82
  - An R2 value of 0.82 suggests that the model explains 82% of the variance in house prices.
  - **Note**: In scikit-learn, R2 can sometimes be negative when the model's performance is worse than a simple horizontal line (mean model). This indicates that the model is a poor fit for the data.

- **Mean Absolute Percentage Error (MAPE)**: Measures the percentage difference between predicted and actual values
  - **Example**: Let's consider the same house price predictions as before:
    - MAPE = ((|200-180|/200) + (|300-280|/300) + (|400-350|/400) + (|500-480|/500)) / 4 ≈ 0.0725 or 7.25% 
    This means, on average, the model's predictions have an error of about 7.25% relative to the actual values. 

Remember, when choosing evaluation metrics, consider the specific characteristics of your dataset and the goals of your machine learning project.