Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API design choice: why not enable early stopping by default? #2270

Closed
NicolasHug opened this issue Jul 18, 2019 · 3 comments
Closed

API design choice: why not enable early stopping by default? #2270

NicolasHug opened this issue Jul 18, 2019 · 3 comments
Labels

Comments

@NicolasHug
Copy link

In scikit-learn scikit-learn/scikit-learn#14303 we're considering enabling early stopping by default.

We're curious about why you chose not to enable it by default in LightGBM, considering that early stopping is almost always useful in practice?

Thanks!

@jameslamb
Copy link
Collaborator

Hi @NicolasHug , interesting question! I'm the newest of the LightGBM team members, so I don't have the historical context for that decision. I think it would be best to have @guolinke or @StrikerRUS comment.

One thing I do know is that we require you to explicitly pass in your validation set if you want to take advantage of early stopping, So it's possible that we don't have it enabled by default because we wanted to give users finer control over the dataset they validate on than just saying "x% of rows, randomly held out".

@guolinke
Copy link
Collaborator

I agree with @jameslamb .
As an ML model tool, we focus on the training/prediction.
Enable ES by default will need the data partition in our side, And I believe there are many partition methods for different tasks, implementing them is duplicated.

Another point is the consistency with ML domain knowledge and other tools.
The concept of the training set, validation set, and test set are the basic knowledge in ML domain, and most tools will distinguish them.
So I don't think to convert a user-passed training set to training set + validation set is a good idea.

@NicolasHug
Copy link
Author

Thanks a lot for your answers!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants