<a href="https://colab.research.google.com/github/raj-vijay/ml/blob/master/04.Extreme%20Gradient%20Boost/08_Model_Tuning_on_Housing_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [None]:
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)

In [None]:
# Create the training and test sets
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=123)

**Untuned Model**

In [None]:
housing_dmatrix = xgb.DMatrix(data=X,label=y)
untuned_params={"objective":"reg:squarederror"}
untuned_cv_results_rmse = xgb.cv(dtrain=housing_dmatrix,
params=untuned_params,nfold=4,
metrics="rmse",as_pandas=True,seed=123)
print("Untuned rmse: %f" %((untuned_cv_results_rmse["test-rmse-mean"]).tail(1)))

Untuned rmse: 3.432062


**Tuned Model**

In [None]:
housing_dmatrix = xgb.DMatrix(data=X,label=y)
tuned_params = {"objective":"reg:squarederror",'colsample_bytree': 0.3,
'learning_rate': 0.1, 'max_depth': 5}
tuned_cv_results_rmse = xgb.cv(dtrain=housing_dmatrix,
params=tuned_params, nfold=4, num_boost_round=200, metrics="rmse",
as_pandas=True, seed=123)
print("Tuned rmse: %f" %((tuned_cv_results_rmse["test-rmse-mean"]).tail(1)))

Tuned rmse: 3.431760


**Tuning the number of boosting rounds**

Let's start with parameter tuning by seeing how the number of boosting rounds (number of trees you build) impacts the out-of-sample performance of XGBoost model. 

Use xgb.cv() inside a for loop and build one model per num_boost_round parameter.

In [None]:
# Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree: params 
params = {"objective":"reg:squarederror", "max_depth":3}

# Create list of number of boosting rounds
num_rounds = [5, 10, 15]

# Empty list to store final round rmse per XGBoost model
final_rmse_per_round = []

# Iterate over num_rounds and build one model per num_boost_round parameter
for curr_num_rounds in num_rounds:

    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3, num_boost_round=curr_num_rounds, metrics="rmse", as_pandas=True, seed=123)
    
    # Append final round RMSE
    final_rmse_per_round.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
num_rounds_rmses = list(zip(num_rounds, final_rmse_per_round))
print(pd.DataFrame(num_rounds_rmses,columns=["num_boosting_rounds","rmse"]))

   num_boosting_rounds      rmse
0                    5  5.950357
1                   10  3.784688
2                   15  3.525731


**Automated boosting round selection using early_stopping**

In [None]:
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X,label=y)

# Create the parameter dictionary for each tree: params
params = {"objective":"reg:squarederror", "max_depth":4}

# Perform cross-validation with early stopping: cv_results
cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3, num_boost_round=50, early_stopping_rounds=10, metrics="rmse", as_pandas=True, seed=123)

# Print cv_results
print(cv_results)

    train-rmse-mean  train-rmse-std  test-rmse-mean  test-rmse-std
0         17.131356        0.020616       17.223397       0.067773
1         12.382814        0.025832       12.619156       0.132110
2          9.063983        0.037020        9.505188       0.117293
3          6.726435        0.034949        7.339914       0.120754
4          5.102424        0.045689        5.954197       0.137525
5          3.976571        0.049928        5.043778       0.198929
6          3.195956        0.061088        4.436179       0.214428
7          2.694113        0.066253        4.066125       0.295735
8          2.341284        0.050819        3.837729       0.326883
9          2.099043        0.052467        3.667519       0.352621
10         1.931933        0.042599        3.573348       0.360856
11         1.824035        0.047074        3.510915       0.370727
12         1.713532        0.058416        3.469601       0.372425
13         1.628078        0.058896        3.441818       0.39