# XGBoost (Extreme Gradient Boosting)

It belongs to a family of boosting algorithms and uses the gradient boosting (GBM) framework at its core.

https://www.datacamp.com/community/tutorials/xgboost-in-python

https://towardsdatascience.com/a-beginners-guide-to-xgboost-87f5d4c30ed7

### Pros:

- it is comparatively faster than other ensemble classifiers because it is written in C++.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
plt.style.use("default")

In [2]:
from sklearn.datasets import load_boston

In [3]:
boston = load_boston()

In [6]:
boston.keys()

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

In [7]:
boston.feature_names

array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')

In [11]:
print(boston.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

In [4]:
df = pd.DataFrame(data=boston.data, columns=boston.feature_names)
df.head(3)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03


In [5]:
df["PRICE"] = boston.target
df.head(3)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7


In [14]:
df.shape

(506, 14)

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, shuffle=True,
                                                    test_size=0.2, random_state=12)

In [8]:
import xgboost as xgb

In order for **XGBoost** to be able to use our data, we’ll need to transform it into a specific format that XGBoost can handle. That format is called DMatrix. It’s a very simple one-linear to transform a numpy array of data to DMatrix format:

In [9]:
D_train = xgb.DMatrix(X_train, label=y_train)
D_test = xgb.DMatrix(X_test, label=y_test)

In [11]:
params = {"objective":"reg:squarederror",
          "max_depth": 5,
          "eta": 0.3}

steps = 20

model = xgb.train(params, D_train, steps)

#### `eta` parameter

- The eta can be thought of more intuitively as a learning rate.
- It is common to have small values in the range of 0.1 to 0.3.

In [12]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [13]:
y_preds = model.predict(D_test)

In [18]:
len(y_test), len(y_preds), y_test.shape, y_preds.shape

(102, 102, (102,), (102,))

In [20]:
mse = mean_squared_error(y_test, y_preds)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_preds)
r2 = r2_score(y_test, y_preds)

print(f"mse: {mse:0.2f}, rmse: {rmse:0.2f}, mae: {mae:0.2f}, r2: {r2:0.2f}")

mse: 10.58, rmse: 3.25, mae: 2.32, r2: 0.87


## Another way

In [26]:
xg_reg = xgb.XGBRegressor()

In [31]:
xg_reg.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

In [32]:
y_pred = xg_reg.predict(X_test)

In [24]:
len(y_test), len(y_pred), y_test.shape, y_pred.shape

(102, 102, (102,), (102,))

In [33]:
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"mse: {mse:0.2f}, rmse: {rmse:0.2f}, mae: {mae:0.2f}, r2: {r2:0.2f}")

mse: 10.23, rmse: 3.20, mae: 2.28, r2: 0.87


### K-Fold CV

#### DMatrices
Instead of numpy arrays or pandas dataFrame, XGBoost uses DMatrices. A DMatrix can contain both the features and the target.

In [39]:
data_dmatrix = xgb.DMatrix(data=boston.data,label=boston.target)

In [45]:
params = {"objective":"reg:squarederror","colsample_bytree": 0.3,"learning_rate": 0.1,
          "max_depth": 5, "alpha": 10}

cv_results = xgb.cv(dtrain=data_dmatrix, params=params, nfold=3,
                    num_boost_round=50, early_stopping_rounds=10,
                    metrics=["rmse", "mae"], as_pandas=True, seed=123)

In [46]:
cv_results.head(3)

Unnamed: 0,train-rmse-mean,train-rmse-std,train-mae-mean,train-mae-std,test-rmse-mean,test-rmse-std,test-mae-mean,test-mae-std
0,21.750757,0.036152,19.87447,0.099426,21.765523,0.02885,19.879341,0.170341
1,19.778532,0.077649,17.923921,0.089126,19.83076,0.03176,17.960867,0.126975
2,18.05281,0.118633,16.168603,0.079432,18.157336,0.116038,16.242643,0.098806


In [47]:
cv_results.tail(3)

Unnamed: 0,train-rmse-mean,train-rmse-std,train-mae-mean,train-mae-std,test-rmse-mean,test-rmse-std,test-mae-mean,test-mae-std
47,2.358588,0.108396,1.646851,0.082153,4.027098,0.375358,2.662418,0.200856
48,2.330911,0.103723,1.631041,0.080232,4.023613,0.377495,2.664285,0.200703
49,2.289405,0.100094,1.607475,0.078872,3.99692,0.39378,2.649759,0.20555


### GridSearchCV

In [34]:
from sklearn.model_selection import RandomizedSearchCV

In [37]:
estimator = xgb.XGBRegressor(n_jobs=-1)
param_grid = {"colsample_bytree": [i/10.0 for i in range(2,6)],
              "learning_rate": [0.01, 0.1],
              "max_depth": [3, 4, 5]}

reg_xgb = RandomizedSearchCV(estimator, param_grid, cv=5, n_iter=5)
reg_xgb.fit(X_train, y_train)

RandomizedSearchCV(cv=5,
                   estimator=XGBRegressor(base_score=None, booster=None,
                                          colsample_bylevel=None,
                                          colsample_bynode=None,
                                          colsample_bytree=None, gamma=None,
                                          gpu_id=None, importance_type='gain',
                                          interaction_constraints=None,
                                          learning_rate=None,
                                          max_delta_step=None, max_depth=None,
                                          min_child_weight=None, missing=nan,
                                          monotone_constraints=None,
                                          n_estimators=100, n_jobs=-1,
                                          num_parallel_tree=None,
                                          random_state=None, reg_alpha=None,
                                         

In [38]:
pd.DataFrame(reg_xgb.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_learning_rate,param_colsample_bytree,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.088749,0.015122,0.003998,6.641573e-07,5,0.1,0.5,"{'max_depth': 5, 'learning_rate': 0.1, 'colsam...",0.841314,0.840761,0.916839,0.890745,0.862982,0.870528,0.029496,2
1,0.067561,0.00102,0.003798,0.0003998056,5,0.01,0.3,"{'max_depth': 5, 'learning_rate': 0.01, 'colsa...",-0.101496,-0.379535,-0.285872,-0.321162,-0.236212,-0.264855,0.094096,5
2,0.077156,0.015996,0.004397,0.001853443,3,0.1,0.4,"{'max_depth': 3, 'learning_rate': 0.1, 'colsam...",0.888105,0.837129,0.897039,0.879981,0.856283,0.871707,0.021967,1
3,0.059965,0.000632,0.003198,0.0003996137,3,0.1,0.3,"{'max_depth': 3, 'learning_rate': 0.1, 'colsam...",0.891771,0.825216,0.904531,0.837466,0.83323,0.858443,0.032908,3
4,0.077157,0.004305,0.003398,0.000489298,4,0.01,0.5,"{'max_depth': 4, 'learning_rate': 0.01, 'colsa...",-0.029799,-0.300559,-0.198544,-0.192814,-0.184677,-0.181278,0.086737,4
