# XGBoost

__Gradient Tree Boosting__ or __Gradient Boosted Regression Trees__ works in a stagewise fashion. Each subsequent model is trained on the errors of previous model. This process is repeated iteratively.


In [1]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

In [2]:
# Create a dataset
from sklearn.model_selection import train_test_split

n = 200
np.random.seed(42)
X = np.random.rand(n, 1) - 0.5
y = 3*X[:, 0]**2 + 0.05 * np.random.randn(n)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
y_train_copy = np.copy(y_train)

In [3]:
from sklearn.tree import DecisionTreeRegressor

tree_reg = []

for i in range(5):
    reg = DecisionTreeRegressor(
        max_depth=2, random_state=42).fit(X_train, y_train)
    y_pred = reg.predict(X_train)
    y_train = y_train - y_pred
    tree_reg.append(reg)

In [4]:
y_pred = sum(t.predict(X_test) for t in tree_reg)

In [5]:
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f"MSE: {mse:4f}, MAE: {mae:0.4f}")

MSE: 0.006385, MAE: 0.0663


## GradientBoostingRegressor

We can implement an equivalent `GradientBoostingRegressor()` as follows. The `learning_rate=1.0` as we directly subtracted residuals. The error rates will be equal.

In [6]:
from sklearn.ensemble import GradientBoostingRegressor

gbrt = GradientBoostingRegressor(
    max_depth=2,
    n_estimators=5,
    learning_rate=1.0,
    random_state=42)
gbrt.fit(X_train, y_train_copy)
y_pred = gbrt.predict(X_test)

In [7]:
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f"MSE: {mse:4f}, MAE: {mae:0.4f}")

MSE: 0.006385, MAE: 0.0663


## XGBoost

Is a different python library that tunes to be scalable, fast and portable. 