# XGBoost

__Gradient Tree Boosting__ or __Gradient Boosted Regression Trees__ works in a stagewise fashion. Each subsequent model is trained on the errors of previous model. This process is repeated iteratively.

 - Type: Boosting + Regularization 
 - Objective: $\mathcal{L}^{(t)} = \sum_i l(y_i, \hat{y}_i^{(t-1)} + f_t(x_i)) + \Omega(f_t)$ $\Omega(f) = \gamma T + \frac{1}{2} \lambda \sum w_j^2$



In [157]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

In [158]:
# Create a dataset
from sklearn.model_selection import train_test_split

n = 200
np.random.seed(42)
X = np.random.uniform(-200, 200, n)
y = 3*X**2 + 0.05 * np.random.randn(n)
X = X.reshape(-1, 1)
y = y.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [159]:
from sklearn.tree import DecisionTreeRegressor

tree_reg = []
learning_rate = 1
residual = y_train.copy()
residual = residual.reshape(-1)

for i in range(100):
    reg = DecisionTreeRegressor(max_depth=4, random_state=42)
    reg.fit(X_train, residual)
    y_pred = reg.predict(X_train)
    residual -= learning_rate * y_pred
    tree_reg.append(reg)

In [160]:
y_pred = sum(learning_rate * t.predict(X_test) for t in tree_reg)

In [161]:
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f"MSE: {mse:,.2f}, MAE: {mae:,.2f}")

MSE: 1,004,340.26, MAE: 719.80


## GradientBoostingRegressor

We can implement an equivalent `GradientBoostingRegressor()` as follows. The `learning_rate=1.0` as we directly subtracted residuals. The error rates will be equal.

In [162]:
from sklearn.ensemble import GradientBoostingRegressor

gbrt = GradientBoostingRegressor(
    max_depth=10,
    n_estimators=5,
    learning_rate=1.0,
    random_state=42)
gbrt.fit(X_train, y_train.reshape(-1))
y_pred = gbrt.predict(X_test)

In [163]:
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f"MSE: {mse:,.2f}, MAE: {mae:,.2f}")

MSE: 1,004,340.26, MAE: 719.80


## XGBoost

Is a different python library that tunes to be scalable, fast and portable. 