# AdaBoost (Adaptive Boosting)

## Problem Type
**AdaBoost (Adaptive Boosting)** is primarily used for:
- **Classification** problems
- **Regression** problems (though less common, with variants like AdaBoost.R2)
- **Supervised** learning

### How AdaBoost Works
- **Boosting technique:**
  - Sequentially combines multiple weak learners (typically decision stumps) to form a strong learner.
  - Focuses on instances that were incorrectly predicted by previous models.
- **Weighted samples:**
  - In each iteration, AdaBoost adjusts the weights of misclassified instances, making them more influential in subsequent models.
  - Correctly classified instances receive reduced weights, diminishing their impact on the next model.
- **Model combination:**
  - The final model is a weighted sum of all weak learners, with more accurate learners having higher weights.
  - Reduces both bias and variance, leading to improved generalization.
- **Iterative process:**
  - Continues adding weak learners until a specified number is reached or no further improvements can be made.

### Key Tuning Metrics
- **`n_estimators`:**
  - **Description:** Number of weak learners (e.g., decision stumps) to be added.
  - **Impact:** More estimators generally improve performance but can lead to overfitting if too high.
  - **Default:** `50`.
- **`learning_rate`:**
  - **Description:** Controls the contribution of each weak learner.
  - **Impact:** Smaller values make the model more robust and require more estimators; typically in the range of `0.01` to `1`.
  - **Default:** `1.0`.
- **`estimator`:**
  - **Description:** The weak learner algorithm used for boosting (e.g., decision tree with max depth = 1).
  - **Impact:** Changing the base estimator can affect the model’s flexibility and performance.
  - **Default:** `DecisionTreeClassifier(max_depth=1)`.
- **`algorithm`:**
  - **Description:** Specifies the boosting algorithm type (`SAMME` for multiclass classification or `SAMME.R` which uses probabilities from weak learners).
  - **Impact:** `SAMME` typically performs better but requires probabilistic outputs from the base estimator.
  - **Default:** `SAMME`.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Increases accuracy by combining weak learners         | Sensitive to noisy data and outliers                   |
| Simple and interpretable when using decision stumps   | May require many estimators to achieve high accuracy   |
| Effective on a wide range of classification tasks     | Can overfit if `n_estimators` is too large             |
| Can be used with different types of weak learners     | Performance can degrade with weak learners that are too complex |
| Does not require scaling of input data                | Slower to train and predict compared to simpler models |

### Evaluation Metrics
- **Accuracy (Classification):**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; generally, values above 0.85 indicate good performance.
  - **Bad Value:** Below 0.5 suggests poor model performance.
- **Precision (Classification):**
  - **Description:** Proportion of positive identifications that were actually correct.
  - **Good Value:** Higher values indicate fewer false positives; crucial in imbalanced datasets.
  - **Bad Value:** Low values suggest many false positives.
- **Recall (Classification):**
  - **Description:** Proportion of actual positives that were correctly identified.
  - **Good Value:** Higher values indicate fewer false negatives; important in recall-sensitive applications.
  - **Bad Value:** Low values suggest many false negatives.
- **F1 Score (Classification):**
  - **Description:** Harmonic mean of Precision and Recall.
  - **Good Value:** Higher values indicate a good balance between Precision and Recall.
  - **Bad Value:** Low values suggest an imbalance, with either high false positives or false negatives.
- **Log Loss (Classification):**
  - **Description:** Measures the performance of a classification model where the output is a probability value between 0 and 1.
  - **Good Value:** Lower values indicate better model calibration and performance.
  - **Bad Value:** Higher values suggest poor probabilistic predictions.
- **AUC-ROC (Classification):**
  - **Description:** Measures the ability of the model to distinguish between classes, summarizing performance across all classification thresholds.
  - **Good Value:** Values closer to 1 indicate excellent separability.
  - **Bad Value:** Values closer to 0.5 suggest random guessing.



In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import classification_report, log_loss, roc_auc_score, roc_curve
import matplotlib.pyplot as plt

In [None]:
# Load the breast cancer dataset
data = datasets.load_breast_cancer()
X = data.data
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize the features (optional but often helpful)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Initialize the AdaBoostClassifier with the specified parameters
adaboost = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=50,
    learning_rate=1.0,
    algorithm='SAMME',
    random_state=42
)

# Train the model
adaboost.fit(X_train, y_train)

# Predict probabilities
y_pred_proba = adaboost.predict_proba(X_test)[:, 1]

# Predict class labels
y_pred = adaboost.predict(X_test)

In [None]:
# Evaluate the model using log loss
logloss = log_loss(y_test, y_pred_proba)
print(f'Log Loss: {logloss:.2f}')

# Evaluate the model using ROC-AUC
roc_auc = roc_auc_score(y_test, y_pred_proba)
print(f'ROC-AUC Score: {roc_auc:.2f}')

# Print classification report
print('Classification Report:')
print(classification_report(y_test, y_pred))

# Plot the ROC curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
plt.plot(fpr, tpr, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')  # Diagonal line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc='lower right')
plt.show()
