```{contents}
```

# Performance Metrics

## **AdaBoost Classification**

When using AdaBoost as a **classifier**, you evaluate how well the model separates classes. The most common metrics are:

### ✅ Accuracy

* Proportion of correctly classified samples.
* Good for **balanced datasets**, but misleading for **imbalanced classes**.

### ✅ Precision, Recall, and F1-Score

* **Precision**: Among the predicted positives, how many are correct?
* **Recall (Sensitivity)**: Among the actual positives, how many are captured?
* **F1-score**: Harmonic mean of precision & recall → balances both.
* Useful when dealing with **imbalanced datasets**.

### ✅ ROC-AUC (Receiver Operating Characteristic – Area Under Curve)

* Plots True Positive Rate vs False Positive Rate.
* **AUC close to 1** → strong classifier.
* Threshold-independent metric.

### ✅ Log Loss (Cross-Entropy Loss)

* Measures the **probabilistic confidence** of predictions.
* Lower log loss = better probability calibration.
* More informative than accuracy because it penalizes “overconfident wrong predictions.”

---

## **AdaBoost Regression**

When AdaBoost is used with **regression trees**, you measure how well it predicts continuous values:

### Mean Squared Error (MSE)

$$
MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2
$$

* Penalizes large errors more heavily.

### Root Mean Squared Error (RMSE)

$$
RMSE = \sqrt{MSE}
$$

* Same units as the target variable → more interpretable.

### Mean Absolute Error (MAE)

$$
MAE = \frac{1}{n} \sum |y_i - \hat{y}_i|
$$

* Less sensitive to outliers than MSE.

### R² Score (Coefficient of Determination)

$$
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
$$

* Measures proportion of variance explained by the model.
* $R^2 = 1$: perfect predictions,
* $R^2 = 0$: no better than mean prediction.

### Median Absolute Error

* Median of absolute residuals.
* Very robust against **outliers** compared to MSE/RMSE.

---

**Key Insights**

* For **classification**, use **Accuracy + Precision/Recall/F1 + AUC** depending on dataset balance.
* For **regression**, rely on **MSE, MAE, RMSE, and R²** for error magnitude and explanatory power.
* Since AdaBoost can **overfit noisy datasets**, monitoring multiple metrics is crucial.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, log_loss, confusion_matrix, classification_report,
    mean_squared_error, mean_absolute_error, r2_score, median_absolute_error
)
from sklearn.model_selection import train_test_split

# -------------------------
# PART 1: CLASSIFICATION
# -------------------------
# Create synthetic classification dataset
X_cls, y_cls = make_classification(
    n_samples=500, n_features=10, n_informative=5, n_redundant=2,
    n_classes=2, weights=[0.6, 0.4], random_state=42
)

# Split data
Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_cls, y_cls, test_size=0.3, random_state=42)

# Train AdaBoost Classifier
ada_cls = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=100, learning_rate=0.5, random_state=42
)
ada_cls.fit(Xc_train, yc_train)

# Predictions
y_pred_cls = ada_cls.predict(Xc_test)
y_proba_cls = ada_cls.predict_proba(Xc_test)[:,1]

# Classification metrics
cls_metrics = {
    "Accuracy": accuracy_score(yc_test, y_pred_cls),
    "Precision": precision_score(yc_test, y_pred_cls),
    "Recall": recall_score(yc_test, y_pred_cls),
    "F1-Score": f1_score(yc_test, y_pred_cls),
    "ROC-AUC": roc_auc_score(yc_test, y_proba_cls),
    "Log Loss": log_loss(yc_test, y_proba_cls)
}

# -------------------------
# PART 2: REGRESSION
# -------------------------
# Create synthetic regression dataset
X_reg, y_reg = make_regression(
    n_samples=500, n_features=10, noise=10.0, random_state=42
)

# Split data
Xr_train, Xr_test, yr_train, yr_test = train_test_split(X_reg, y_reg, test_size=0.3, random_state=42)

# Train AdaBoost Regressor
ada_reg = AdaBoostRegressor(
    estimator=DecisionTreeRegressor(max_depth=3),
    n_estimators=100, learning_rate=0.5, random_state=42
)
ada_reg.fit(Xr_train, yr_train)

# Predictions
y_pred_reg = ada_reg.predict(Xr_test)

# Regression metrics
reg_metrics = {
    "MSE": mean_squared_error(yr_test, y_pred_reg),
    "RMSE": np.sqrt(mean_squared_error(yr_test, y_pred_reg)),
    "MAE": mean_absolute_error(yr_test, y_pred_reg),
    "R2 Score": r2_score(yr_test, y_pred_reg),
    "Median Absolute Error": median_absolute_error(yr_test, y_pred_reg)
}

import pandas as pd
cls_df = pd.DataFrame([cls_metrics], index=["AdaBoost Classification"])
reg_df = pd.DataFrame([reg_metrics], index=["AdaBoost Regression"])
(cls_df, reg_df)

(                         Accuracy  Precision    Recall  F1-Score   ROC-AUC  \
 AdaBoost Classification      0.86   0.907407  0.753846  0.823529  0.956018   
 
                          Log Loss  
 AdaBoost Classification  0.499989  ,
                              MSE       RMSE       MAE  R2 Score  \
 AdaBoost Regression  5842.940439  76.439129  57.97347  0.713638   
 
                      Median Absolute Error  
 AdaBoost Regression              46.093553  )