question or to discuss: how to find optimal parameters of xgb? #7

lihang00 · 2016-05-31T18:06:28Z

I usually do a grid search of the following parameter set:

param_grid = {
"learning_rate" : [ 0.1, 0.05, 0.01],
"n_estimators" : [ 100, 500, 1000],
"max_depth" : [ 4, 8, 12],
"min_child_weight" : [ 1, 5, 10],
"subsample" : [1, 0.8],
"colsample_bytree" : [1, 0.8]
}

Most time ppl can find a much better parameter set.

szilard · 2016-05-31T18:12:34Z

Instead of varying nrounds (number of trees) I would use a larger number and early stopping. Excellent topic btw, thanks @lihang00.

lihang00 · 2016-06-01T17:02:27Z

Early stopping param is not supported in scikit-learn wrapper yet... maybe we can ask whether he want to support those params in scikit-learn wrapper. :)

lihang00 · 2016-06-02T05:49:57Z

Parameters :

eta : step size shrinkage used in update to prevents overfitting.
alpha / lambda : L1/L2 regularization term on weights.
subsample : subsample ratio of the training instance.
colsample_bytree : subsample ratio of columns when constructing each tree.

All these parameters can prevent overfitting, how to choose them. When tuning parameters, can we just tune one (few) of them?

lihang00 · 2016-06-03T00:07:31Z

FYI
https://github.com/dmlc/xgboost/blob/master/doc/how_to/param_tuning.md

Most time default parameters works pretty well.

jacquespeeters · 2016-06-07T12:16:47Z

First you need to find stable eta. By stable i mean that you get aproximatively the same results on your choosed Metrics if you re-run the code. It depends of your data, same about CV.
Usually 0.1 is fine.

Then make a sequential loop for finding the best max_depth (usually independant of the other parameters).

Then gridSearch or sequential loops for finding subsample and colsample.

Now you've good hyper-parameters. Run you algo once again with a smaller eta (eg 0.01), depending of how much time you want to spend. Usually the smaller, the better.

You definetly won't get the best hyper-parameters, but you will have good ones and more importantly in a decent time. You can also perform this steps with a smaller eta, therefore you'll be able to add more randomness (sample, colsample) but it will take more time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question or to discuss: how to find optimal parameters of xgb? #7

question or to discuss: how to find optimal parameters of xgb? #7

lihang00 commented May 31, 2016

szilard commented May 31, 2016 •

edited

Loading

lihang00 commented Jun 1, 2016

lihang00 commented Jun 2, 2016

lihang00 commented Jun 3, 2016

jacquespeeters commented Jun 7, 2016

question or to discuss: how to find optimal parameters of xgb? #7

question or to discuss: how to find optimal parameters of xgb? #7

Comments

lihang00 commented May 31, 2016

szilard commented May 31, 2016 • edited Loading

lihang00 commented Jun 1, 2016

lihang00 commented Jun 2, 2016

lihang00 commented Jun 3, 2016

jacquespeeters commented Jun 7, 2016

szilard commented May 31, 2016 •

edited

Loading