# Boosting Algorithms: AdaBoost, GradientBoosting, XGBoost, and CatBoost

In this notebook, we'll demonstrate how to use four popular boosting algorithms: **AdaBoost**, **GradientBoosting**, **XGBoost**, and **CatBoost** using the **Covertype dataset** dataset.

## 1. Loading and Preparing the Dataset

We will use the Covertype dataset

*  **Description:** A multi-class classification dataset that contains information on  forest cover types from cartographic data. It is a larger dataset with 54 features and seven classes, providing a more challenging task for classification algorithms.

*  **Problem Type:** Multi-class classification

*   **Use Case:** Predict the forest cover type from cartographic data such as elevation, slope, and soil type.

*  **Dataset Size:** Large, can be used for testing model scalability.


In [2]:
#pip install catboost

In [3]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_covtype
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report



In [None]:
# Load the Covertype dataset, for simplicity we only take first 30000 samples
covtype = fetch_covtype()
X = covtype.data[:30000,:]
y = covtype.target[:30000]


# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Show dataset dimensions
print(f"Training samples: {X_train.shape[0]}, Test samples: {X_test.shape[0]}")

In [None]:
# Initialize and train the AdaBoost classifier
ada_clf = AdaBoostClassifier(n_estimators=50, random_state=42)
ada_clf.fit(X_train, y_train)

# Make predictions
y_pred_ada = ada_clf.predict(X_test)

# Evaluate AdaBoost
print("AdaBoost Results:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_ada)}")
print(confusion_matrix(y_test, y_pred_ada))
print(classification_report(y_test, y_pred_ada))

In [None]:
# Initialize and train the Gradient Boosting classifier
gb_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_clf.fit(X_train, y_train)

# Make predictions
y_pred_gb = gb_clf.predict(X_test)

# Evaluate Gradient Boosting
print("Gradient Boosting Results:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_gb)}")
print(confusion_matrix(y_test, y_pred_gb))
print(classification_report(y_test, y_pred_gb))

In [None]:
# Initialize and train the XGBoost classifier
xgb_clf = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
# Subtract 1 from y_train to make the labels start from 0
y_train_xgb =y_train-1
y_test_xgb = y_test - 1
xgb_clf.fit(X_train, y_train_xgb)

# Make predictions
y_pred_xgb = xgb_clf.predict(X_test)

# Evaluate XGBoost
print("XGBoost Results:")
print(f"Accuracy: {accuracy_score(y_test_xgb, y_pred_xgb)}")
print(confusion_matrix(y_test_xgb, y_pred_xgb))
print(classification_report(y_test_xgb, y_pred_xgb))

In [None]:
# Initialize and train the CatBoost classifier (suppressing verbose output with verbose=0)
catboost_clf = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=3, verbose=0, random_state=42)
catboost_clf.fit(X_train, y_train)

# Make predictions
y_pred_catboost = catboost_clf.predict(X_test)

# Evaluate CatBoost
print("CatBoost Results:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_catboost)}")
print(confusion_matrix(y_test, y_pred_catboost))
print(classification_report(y_test, y_pred_catboost))