## Hyperparameters 

XGBoost has a lot of parameters you can adjust to control the fitting of your model. The parameters can have a big impact on model performance and run time. See the [XGBoost docs](https://xgboost.readthedocs.io/en/latest/parameter.html) for a full list.

Some of these parameters like `booster` are general and define which type of boosting is to be done. These general parameters impact which other parameters can be used down stream.

For this lab and LTR in general we are focused on Regression trees. For trees we can select the `objective` function to control what is being optimized by the model. We can also control a variety of tree attributes.

The goal of this lab is to show how hyparameters impact model perfomance and how we can can search for them effectively.

In [59]:
import xgboost as xgb
import pandas as pd

import ltr

from matplotlib.pylab import rcParams

In [2]:
df = pd.read_csv(_____csv_of_features_____)
feature_cols = [col for col in df if col.startswith('feature')] # https://stackoverflow.com/questions/27275236/pandas-best-way-to-select-all-columns-whose-names-start-with-x

features = df[feature_cols]
labels = df[['grade']]

dmx = xgb.DMatrix(features, labels)

### Test parameters

We still need to define which parameters we want to search over, similar to the fields in the `netfix` notebook. We do this in a Python dictionary.

In [52]:
# example to get started
test_params = {
 'max_depth':[1,2,4,3],
 'grow_policy': ['lossguide', 'depthwise']
}


### Grid search again

This time we are grid searching in Python and we can take advantage of the `sklearn` library. XGBoost plays well with sklearn.

In [None]:
# install sklearn into your notebook environment if it doesn't exist
!pip3 install sklearn

In [None]:
from sklearn.model_selection import GridSearchCV

model = GridSearchCV(estimator = xgb.XGBRegressor(), param_grid = test_params)
model.fit(features,labels)

print(model.best_params_)

In [None]:
# expand the parameters you are checking
# exploring the parameters [eta, gamma, lambda] are good bet

test_params2 = {
 'max_depth':[1,2,4,3],
 'grow_policy': ['lossguide', 'depthwise'],
 ___.....____
}


In [58]:
# re-run the GridSearch across your new expanded test_params2
# Do the parameter estimates change?


### Combination explosion

Some times you will find you have too many parameters for GridSearch to be economical, it would just take too long to complete.

If that's the case there is a sibling function in `sklearn` called `RandomizedSearchCV`. This function works to identify good combinations, while allow you to set the maximum number of combinations to try. The search is not exhaustive like GridSearch is, so you might not find the best combination.

In [None]:
from sklearn.model_selection import RandomizedSearchCV

RandomizedSearchCV??

In [53]:
model = RandomizedSearchCV(estimator = xgb.XGBRegressor(), param_distributions=test_params2)
model.fit(features,labels)
print(model.best_params_)

{'max_depth': 3, 'grow_policy': 'lossguide', 'gamma': 2, 'base_score': 0}


In [55]:
# Increase the default number of iterations and re-RandomizedSearch
# Do you parameter estimates change?



{'max_depth': 3, 'grow_policy': 'lossguide', 'gamma': 2, 'base_score': 0}


In [57]:
# Decrease the number of iterations to 3
# What happens in this case?

