Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question or to discuss: how to find optimal parameters of xgb? #7

Open
lihang00 opened this issue May 31, 2016 · 5 comments
Open

question or to discuss: how to find optimal parameters of xgb? #7

lihang00 opened this issue May 31, 2016 · 5 comments

Comments

@lihang00
Copy link

I usually do a grid search of the following parameter set:

param_grid = {
"learning_rate" : [ 0.1, 0.05, 0.01],
"n_estimators" : [ 100, 500, 1000],
"max_depth" : [ 4, 8, 12],
"min_child_weight" : [ 1, 5, 10],
"subsample" : [1, 0.8],
"colsample_bytree" : [1, 0.8]
}

Most time ppl can find a much better parameter set.

@szilard
Copy link
Owner

szilard commented May 31, 2016

Instead of varying nrounds (number of trees) I would use a larger number and early stopping. Excellent topic btw, thanks @lihang00.

@lihang00
Copy link
Author

lihang00 commented Jun 1, 2016

Early stopping param is not supported in scikit-learn wrapper yet... maybe we can ask whether he want to support those params in scikit-learn wrapper. :)

@lihang00
Copy link
Author

lihang00 commented Jun 2, 2016

Parameters :

  • eta : step size shrinkage used in update to prevents overfitting.
  • alpha / lambda : L1/L2 regularization term on weights.
  • subsample : subsample ratio of the training instance.
  • colsample_bytree : subsample ratio of columns when constructing each tree.

All these parameters can prevent overfitting, how to choose them. When tuning parameters, can we just tune one (few) of them?

@lihang00
Copy link
Author

lihang00 commented Jun 3, 2016

FYI
https://github.com/dmlc/xgboost/blob/master/doc/how_to/param_tuning.md

Most time default parameters works pretty well.

@jacquespeeters
Copy link

First you need to find stable eta. By stable i mean that you get aproximatively the same results on your choosed Metrics if you re-run the code. It depends of your data, same about CV.
Usually 0.1 is fine.

Then make a sequential loop for finding the best max_depth (usually independant of the other parameters).

Then gridSearch or sequential loops for finding subsample and colsample.

Now you've good hyper-parameters. Run you algo once again with a smaller eta (eg 0.01), depending of how much time you want to spend. Usually the smaller, the better.

You definetly won't get the best hyper-parameters, but you will have good ones and more importantly in a decent time. You can also perform this steps with a smaller eta, therefore you'll be able to add more randomness (sample, colsample) but it will take more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants