-
Notifications
You must be signed in to change notification settings - Fork 52
Early stopping for XGBoost + Update Readme #63
Conversation
microsoft/LightGBM#3057 - |
tune_sklearn/_trainable.py
Outdated
if self.is_lgbm: | ||
self.saved_models[i] = self.estimator[i].fit( | ||
X_train, y_train, init_model=self.saved_models[i]) | ||
elif self.is_xgb: | ||
self.estimator[i].fit( | ||
X_train, y_train, xgb_model=self.saved_models[i]) | ||
self.saved_models[i] = self.estimator[i].get_booster() | ||
else: | ||
self.estimator[i].partial_fit(X_train, y_train, | ||
np.unique(self.y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if you just put this in another trainable? would it make sense there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to put it into another trainable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it's probably going to be unsustainable to have like 5 different special cases for different libraries, but right now it's probably fine
…er-partial merge master
…o warm-start merge master
…m-start merge master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this looks good to me. I've added a couple updates to make things clearer.
tune_sklearn/_trainable.py
Outdated
@@ -112,8 +118,14 @@ def _train(self): | |||
test, | |||
train_indices=train) | |||
if self._can_partial_fit(): | |||
self.estimator_list[i].partial_fit(X_train, y_train, | |||
np.unique(self.y)) | |||
if is_xgboost_model(self.main_estimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be under the case the we can partial_fit
? I think xgboost doesn't have the that method, so maybe this should be outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK nice, I wrote a couple tests here.
self.estimator = estimator | ||
|
||
if not self._can_early_stop() and max_iters is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's actually more cases we need to be checking here. Here's all the cases:
- User wants to early stop, and it can be done
- need to make sure the user sets
max_iters
, so raise a warning if this isn't done
- need to make sure the user sets
- User wants to early stop, and it cannot be done
- directly throw an error, regardless of what
max_iters
is - maybe it would be nice the remove the warning that would come up in the
if not self._can_early_stop() and max_iters > 1:
since there will be duplicate errors
- directly throw an error, regardless of what
- User does not want to early stop, and it can be done
- if
max_iters
isn't set, do nothing - if
max_iters
is set, raise a warning that it is ignored because user didn't enable early stop
- if
- User does not want to early stop, and it cannot be done
- if
max_iters
isn't set, do nothing - if
max_iters
is set, raise a warning that it is ignored because user didn't enable early stop
- if
Right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now reduced to this:
- User wants to early stop, and it cannot be done
- directly throw an error.
- User wants to early stop, and it can be done, and max_iters is not set
- raise a warning that it should be set.
- regardless of whether it can be done, if user does not want to early stop and if
max_iters
is set:- raise a warning that it is ignored / set to 1.
This PR supports early stopping for XGBoost. We leverage the incremental learning capabilities of XGBoost:
Note that this may not necessarily improve performance but instead allows us to break the training process into multiple parts.
resolves #58