Skip to content
This repository has been archived by the owner on Nov 14, 2023. It is now read-only.

Early stopping for XGBoost + Update Readme #63

Merged
merged 74 commits into from
Sep 1, 2020

Conversation

inventormc
Copy link
Collaborator

@inventormc inventormc commented Jul 31, 2020

This PR supports early stopping for XGBoost. We leverage the incremental learning capabilities of XGBoost:

Note that this may not necessarily improve performance but instead allows us to break the training process into multiple parts.

clf = XGBClassifier(n_estimators=10, nthread=8)
base_model = None
for i in range(20):
    z = clf.fit(x_tr, y_tr, xgb_model=base_model)
    y_te = z.predict(x_te)
    print(sklearn.metrics.mean_squared_error(y_te, y_pr))
    base_model = z.get_booster()

resolves #58

@inventormc inventormc added enhancement New feature or request wip Work in progress labels Jul 31, 2020
@inventormc
Copy link
Collaborator Author

microsoft/LightGBM#3057 -init_model doesn't exist in the latest stable release yet

Comment on lines 97 to 106
if self.is_lgbm:
self.saved_models[i] = self.estimator[i].fit(
X_train, y_train, init_model=self.saved_models[i])
elif self.is_xgb:
self.estimator[i].fit(
X_train, y_train, xgb_model=self.saved_models[i])
self.saved_models[i] = self.estimator[i].get_booster()
else:
self.estimator[i].partial_fit(X_train, y_train,
np.unique(self.y))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you just put this in another trainable? would it make sense there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to put it into another trainable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it's probably going to be unsustainable to have like 5 different special cases for different libraries, but right now it's probably fine

@richardliaw richardliaw changed the title Early stopping for other estimators Early stopping for XGBoost Aug 31, 2020
Copy link
Collaborator

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this looks good to me. I've added a couple updates to make things clearer.

@@ -112,8 +118,14 @@ def _train(self):
test,
train_indices=train)
if self._can_partial_fit():
self.estimator_list[i].partial_fit(X_train, y_train,
np.unique(self.y))
if is_xgboost_model(self.main_estimator):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be under the case the we can partial_fit? I think xgboost doesn't have the that method, so maybe this should be outside

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK nice, I wrote a couple tests here.

self.estimator = estimator

if not self._can_early_stop() and max_iters is not None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's actually more cases we need to be checking here. Here's all the cases:

  • User wants to early stop, and it can be done
    • need to make sure the user sets max_iters, so raise a warning if this isn't done
  • User wants to early stop, and it cannot be done
    • directly throw an error, regardless of what max_iters is
    • maybe it would be nice the remove the warning that would come up in the if not self._can_early_stop() and max_iters > 1: since there will be duplicate errors
  • User does not want to early stop, and it can be done
    • if max_iters isn't set, do nothing
    • if max_iters is set, raise a warning that it is ignored because user didn't enable early stop
  • User does not want to early stop, and it cannot be done
    • if max_iters isn't set, do nothing
    • if max_iters is set, raise a warning that it is ignored because user didn't enable early stop

Right now

Copy link
Collaborator

@richardliaw richardliaw Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now reduced to this:

  • User wants to early stop, and it cannot be done
    • directly throw an error.
  • User wants to early stop, and it can be done, and max_iters is not set
    • raise a warning that it should be set.
  • regardless of whether it can be done, if user does not want to early stop and if max_iters is set:
    • raise a warning that it is ignored / set to 1.

@richardliaw richardliaw changed the title Early stopping for XGBoost Early stopping for XGBoost + Update Readme Sep 1, 2020
@richardliaw richardliaw merged commit 483b700 into ray-project:master Sep 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request wip Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Early stopping doesn't work with XGBoost, LightGBM or CatBoost
3 participants