# Evaluation Metrics

## Classification Evaluation

### Metrics

**Confusion Matrix**: table that summarizes the performance of a classification algorithm based on true positives, true negatives, false positives, and false negatives classified.

**Accuracy**: the fundamental metric used to evaluate performance. It measures the proportion of correctly predicted instances among all instances.

$$ accuracy = \frac{TP + TN}{TP + TN + FP + FN}$$

Accuracy can be misleading when deadling with imbalanced classes. For example, a class with 70/30 imbalanced sets would have a 70% accuracy score if using a dummy classifier with a most-frequent strategy.

**Precision**: assesses the quality of positive predictions made by a classification model. It is the proportion of true positives to all positive predictions. This would be valuable in contexts like medical diagnoses or spam detection; we want to ensure low numbers of false positive predictions.

$$ precision = \frac{TP}{TP + FP} $$

**Recall**: assesses the ability to correctly identify all positive instances within a dataset. It is the proportion of true positives among all positives. Again, this could be critical in medical diagnoses or security, where we want to ensure we are not missing true cases.

$$ recall = \frac{TP}{TP + FN} $$

**Specificity**: the true negative rate.

$$ specificity = \frac{TN}{TN + FP} $$

**F1-Score**: metric that combines precision and recall into one metric. It is especially useful when there is class imbalance.

$$ f1 = 2(\frac{precision * recall}{precision + recall}) $$

### Curves

**Receiver Operating Characteristic (ROC) Curve**: representation of the model's ability to distinguish between positive and negative classes by plotting the specificity (x) against the recall (y). The ideal curve would be in the top left corner, where the false positive rate is 0.0 and the true positive rate is 1.0.

**Precision-Recall Curve**: representation of how well the model predicts the positive class. It plots the recall (x) against the precision (y). The ideal curve would be in the top right corner.

**Area Under Curve (AUC)**: provides a single salar value that summarizes the model using the ROC or PRC.

## Regression Evaluation

### Metrics

**Mean Absolute Error (MAE)**: the average absolute difference between actual and predicted values. It is easy to interpret as it uses the same unit as the target variable, but it treats all errors equally regardless of direction.

$$ MAE = \frac{1}{n}\Sigma|y_i-\hat{y_i}| $$

**Mean Squared Error (MSE)**: the average squared difference between actual and predicted values. It penalizes larger errors more strongly, making it sensitive to outliers. It is also more difficult to interpret.

$$ MSE = \frac{1}{n}\Sigma(y_i-\hat{y_i})^2 $$

**Root Mean Squared Error (RMSE)**: takes the root of the MSE, which maintains a strong penalty for larger errors while also improve interpretability by converting back to the original unit of the target variable.

$$ RMSE = \sqrt{\frac{1}{n}\Sigma(y_i-\hat{y_i})^2} $$

**R-Squared**: represents the proportion of variance in the target variable that is explained by the regression model. In other words, the total variation measures how much the data varies overall. The unexplained variation is the variation that our model failed to capture. Thus, if we take one less the proportion of unexplained variance over total variance, we are left with the percentage of variance our model captures. We can think of the value as how much better the model is than guessing the mean.

$$ R^2=1 - \frac{Unexplained\;variation}{Total\;variation} = 1 - \frac{RSS}{SST} $$

where:
- $ RSS = \Sigma(y_i-\hat{y_i})^2 $
- $ SST = \Sigma(y_i-\bar{y_i})^2 $


**Mean Absolute Percentage Error**: the prediction error as a percentage of the actual values. While intuitive, it becomes unreliable when actual values are close to zero.

$$ MAPE = \frac{100}{n}\Sigma|\frac{y_i-\hat{y_i}}{y_i}| $$