<a href="https://colab.research.google.com/github/raj-vijay/ml/blob/master/04.Extreme%20Gradient%20Boost/11_GridSearchCV_XGBoost_Housing_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Grid search: review**
- Search exhaustively over a given set of hyperparameters, once per
set of hyperparameters
- Number of models = number of distinct values per hyperparameter
multiplied across each hyperparameter
- Pick final model hyperparamter values that give best cross-validated evaluation metric value

In [None]:
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

In [None]:
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)

In [None]:
# Create the training and test sets
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=123)

**Grid search**

In [None]:
housing_dmatrix = xgb.DMatrix(data=X,label=y)

gbm_param_grid = {'learning_rate': [0.01,0.1,0.5,0.9],
'n_estimators': [200],
'subsample': [0.3, 0.5, 0.9]}

gbm = xgb.XGBRegressor()

grid_mse = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid, scoring='neg_mean_squared_error', cv=4, verbose=1)

grid_mse.fit(X, y)

print("Best parameters found: ",grid_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(grid_mse.best_score_)))

Fitting 4 folds for each of 12 candidates, totalling 48 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Best parameters found:  {'learning_rate': 0.1, 'n_estimators': 200, 'subsample': 0.9}
Lowest RMSE found:  4.367013801830744


[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:    3.7s finished


**Random search**

**Random search: review**
- Create a (possibly inSet the number of iterations you would like for the random search
- Set the number of iterations you would like for the random search
to continue
- During each iteration, randomly draw a value in the range of
specified values for each hyperparameter searched over and
train/evaluate a model with those hyperparameters
- After reaching the maximum number of iterations, select the
hyperparameter configuration with the best evaluated score

In [None]:
housing_dmatrix = xgb.DMatrix(data=X,label=y)
gbm_param_grid = {'learning_rate': np.arange(0.05,1.05,.05),
'n_estimators': [200],
'subsample': np.arange(0.05,1.05,.05)}
gbm = xgb.XGBRegressor()
randomized_mse = RandomizedSearchCV(estimator=gbm, param_distributions=gbm_param_grid,
n_iter=25, scoring='neg_mean_squared_error', cv=4, verbose=1)
randomized_mse.fit(X, y)
print("Best parameters found: ",randomized_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(randomized_mse.best_score_)))

Fitting 4 folds for each of 25 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Best parameters found:  {'subsample': 0.8, 'n_estimators': 200, 'learning_rate': 0.1}
Lowest RMSE found:  4.347858962378819


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    7.3s finished
