In this notebook, we will cover the applications of various boosting techniques. Specifically Gradient Boosting Regression/Classification, along with XGBoost.


I will cover both in this tutorial with the Iris Dataset, and the diabetes regression task!

# Imports

In [1]:
%matplotlib inline

import pandas as pd
import numpy as np

import matplotlib as mpl
import matplotlib.pyplot as plt

# Preprocessing

In [63]:
from sklearn.datasets import load_diabetes

dia = load_diabetes()

X, y = dia["data"], dia["target"]

In [64]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

# Modeling

Comparing GB Regression and RF Regression

## Random Forest Regression

In [78]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

rf = RandomForestRegressor(n_estimators=120, random_state=42)
rf.fit(X_train, y_train)
mse = mean_squared_error(y_test, rf.predict(X_test))
print("RF: ", mse)

RF:  2867.886986111111


# Gradient Boosting Regression

In [79]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

gb = GradientBoostingRegressor(n_estimators=120, random_state=42)
gb.fit(X_train, y_train)
mse = mean_squared_error(y_test, gb.predict(X_test))
print("GB: ", mse)

GB:  2777.3534422486705


## Conclusion:

From this we can conclude the effectiveness of GB regression as it clearly performs better than the random forests algorithm with the exact same hyper-parameters.

# XGBoost

In [114]:
from xgboost import XGBRegressor

xgb = XGBRegressor(n_estimators=500, learning_rate=0.01, max_depth=3, random_state=42)
xgb.fit(X_train, y_train)
mse = mean_squared_error(y_test, xgb.predict(X_test))
print("XGB: ", mse)

XGB:  2686.2451192496683


## Conclusion

XGBoost performs even better than the state of the art scikit-learn boosting model.


XGBoost is a up and coming ML boosting algorithm and it is **the** SOTA model for ML tasks.