# Boosting Algorithms: AdaBoost, GradientBoosting, XGBoost, and CatBoost

In this notebook, we'll demonstrate how to use four popular boosting algorithms: **AdaBoost**, **GradientBoosting**, **XGBoost**, and **CatBoost** using the **Covertype dataset** dataset.

## 1. Loading and Preparing the Dataset

We will use the Covertype dataset

*  **Description:** A multi-class classification dataset that contains information on  forest cover types from cartographic data. It is a larger dataset with 54 features and seven classes, providing a more challenging task for classification algorithms.

*  **Problem Type:** Multi-class classification

*   **Use Case:** Predict the forest cover type from cartographic data such as elevation, slope, and soil type.

*  **Dataset Size:** Large, can be used for testing model scalability.


In [1]:
#pip install catboost

In [2]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_covtype
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report



In [3]:
# Load the Covertype dataset, for simplicity we only take first 30000 samples
covtype = fetch_covtype()
X = covtype.data[:30000,:]
y = covtype.target[:30000]


# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Show dataset dimensions
print(f"Training samples: {X_train.shape[0]}, Test samples: {X_test.shape[0]}")

Training samples: 21000, Test samples: 9000


In [4]:
# Initialize and train the AdaBoost classifier
ada_clf = AdaBoostClassifier(n_estimators=50, random_state=42)
ada_clf.fit(X_train, y_train)

# Make predictions
y_pred_ada = ada_clf.predict(X_test)

# Evaluate AdaBoost
print("AdaBoost Results:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_ada)}")
print(confusion_matrix(y_test, y_pred_ada))
print(classification_report(y_test, y_pred_ada))



AdaBoost Results:
Accuracy: 0.6261111111111111
[[ 291  782    0    0  120    2  205]
 [ 159 3745    0    0  215   31   66]
 [   0    0   14    0   47  593    0]
 [   0    0    0    0    0  654    0]
 [   0  337    0    0  386   31    0]
 [   0    0   11    0   62  563    0]
 [  48    0    0    0    2    0  636]]
              precision    recall  f1-score   support

           1       0.58      0.21      0.31      1400
           2       0.77      0.89      0.82      4216
           3       0.56      0.02      0.04       654
           4       0.00      0.00      0.00       654
           5       0.46      0.51      0.49       754
           6       0.30      0.89      0.45       636
           7       0.70      0.93      0.80       686

    accuracy                           0.63      9000
   macro avg       0.48      0.49      0.42      9000
weighted avg       0.61      0.63      0.57      9000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [5]:
# Initialize and train the Gradient Boosting classifier
gb_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_clf.fit(X_train, y_train)

# Make predictions
y_pred_gb = gb_clf.predict(X_test)

# Evaluate Gradient Boosting
print("Gradient Boosting Results:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_gb)}")
print(confusion_matrix(y_test, y_pred_gb))
print(classification_report(y_test, y_pred_gb))

Gradient Boosting Results:
Accuracy: 0.8493333333333334
[[ 928  374    0    0   22    3   73]
 [ 194 3889   15    0   94   20    4]
 [   0    0  454   42   24  134    0]
 [   0    0   24  620    0   10    0]
 [   7   82   13    0  645    7    0]
 [   0    8  125   20   22  461    0]
 [  38    1    0    0    0    0  647]]
              precision    recall  f1-score   support

           1       0.80      0.66      0.72      1400
           2       0.89      0.92      0.91      4216
           3       0.72      0.69      0.71       654
           4       0.91      0.95      0.93       654
           5       0.80      0.86      0.83       754
           6       0.73      0.72      0.73       636
           7       0.89      0.94      0.92       686

    accuracy                           0.85      9000
   macro avg       0.82      0.82      0.82      9000
weighted avg       0.85      0.85      0.85      9000



In [6]:
# Initialize and train the XGBoost classifier
xgb_clf = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
# Subtract 1 from y_train to make the labels start from 0
y_train_xgb =y_train-1
y_test_xgb = y_test - 1
xgb_clf.fit(X_train, y_train_xgb)

# Make predictions
y_pred_xgb = xgb_clf.predict(X_test)

# Evaluate XGBoost
print("XGBoost Results:")
print(f"Accuracy: {accuracy_score(y_test_xgb, y_pred_xgb)}")
print(confusion_matrix(y_test_xgb, y_pred_xgb))
print(classification_report(y_test_xgb, y_pred_xgb))

XGBoost Results:
Accuracy: 0.8154444444444444
[[ 779  477    0    0   34    1  109]
 [ 195 3848   11    0  128   25    9]
 [   0    0  421   61   23  149    0]
 [   0    0   25  619    0   10    0]
 [   4   82   18    0  633   17    0]
 [   0    0  184   34   29  389    0]
 [  36    0    0    0    0    0  650]]
              precision    recall  f1-score   support

           0       0.77      0.56      0.65      1400
           1       0.87      0.91      0.89      4216
           2       0.64      0.64      0.64       654
           3       0.87      0.95      0.90       654
           4       0.75      0.84      0.79       754
           5       0.66      0.61      0.63       636
           6       0.85      0.95      0.89       686

    accuracy                           0.82      9000
   macro avg       0.77      0.78      0.77      9000
weighted avg       0.81      0.82      0.81      9000



In [7]:
# Initialize and train the CatBoost classifier (suppressing verbose output with verbose=0)
catboost_clf = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=3, verbose=0, random_state=42)
catboost_clf.fit(X_train, y_train)

# Make predictions
y_pred_catboost = catboost_clf.predict(X_test)

# Evaluate CatBoost
print("CatBoost Results:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_catboost)}")
print(confusion_matrix(y_test, y_pred_catboost))
print(classification_report(y_test, y_pred_catboost))

CatBoost Results:
Accuracy: 0.7842222222222223
[[ 637  598    0    0   33    1  131]
 [ 162 3881   11    0  122   24   16]
 [   0    3  378   95   29  149    0]
 [   0    0   35  608    0   11    0]
 [   0  148    9    0  569   28    0]
 [   0    5  179   49   45  358    0]
 [  55    2    0    0    2    0  627]]
              precision    recall  f1-score   support

           1       0.75      0.46      0.57      1400
           2       0.84      0.92      0.88      4216
           3       0.62      0.58      0.60       654
           4       0.81      0.93      0.86       654
           5       0.71      0.75      0.73       754
           6       0.63      0.56      0.59       636
           7       0.81      0.91      0.86       686

    accuracy                           0.78      9000
   macro avg       0.74      0.73      0.73      9000
weighted avg       0.78      0.78      0.77      9000

