Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Bug fix for first_metric_only on earlystopping. #2209

Merged

Conversation

matsuken92
Copy link
Contributor

  1. Order of metric list is not fixed even if it is defined by list. So explicitly indicating which metric is used on early stopping.
  2. Due to introducing eval_train_metric feature, if the feature is enabled, then the first metric became train score of the metrics, so it does not appropriate for early stopping. So until the specified metrics of validation is coming, skipping to check early stopping in the loop.

(This PR is latest version of #2127. Once this PR is activated #2127 should be closed.)

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matsuken92 Thank you very much for continuing the work on this PR! Unfortunately, I hadn't had a chance to take a closer look yet, but below are some comments which I can leave after a brief glance.

python-package/lightgbm/callback.py Outdated Show resolved Hide resolved
python-package/lightgbm/callback.py Outdated Show resolved Hide resolved
python-package/lightgbm/callback.py Outdated Show resolved Hide resolved
python-package/lightgbm/callback.py Outdated Show resolved Hide resolved
python-package/lightgbm/callback.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
@StrikerRUS StrikerRUS changed the title Bug fix for first_metric_only on earlystopping. [python] Bug fix for first_metric_only on earlystopping. Jun 1, 2019
@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jun 1, 2019

@matsuken92 Seems that I've managed to relax num_boost_round to 25 (only for regression for now UPD: for all checks). Don't you mind I'll push necessary changes directly into your branch? If you are OK with it, please make sure that Allow edits from maintainers is checked: https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork.

@matsuken92
Copy link
Contributor Author

matsuken92 commented Jun 2, 2019

@StrikerRUS

Don't you mind I'll push necessary changes directly into your branch?

Yes, of course, please!

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jun 2, 2019

@matsuken92

Yes, of course, please!

Done! Please take a look at the latest commit.

Also, it seems to me that something is wrong with sklearn wrapper (first_metric_only=True disables early stopping completely (?)). Can you please check it?

@StrikerRUS
Copy link
Collaborator

@matsuken92

I haven't realized the behavior, so I'm going to investigate it. Please wait for a while.

Below is the example about what I was saying.

import numpy as np
import lightgbm as lgb

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_valid = lgb.Dataset(X_test, y_test, reference=lgb_train)

_ = lgb.LGBMRegressor(learning_rate=1).fit(X_train, y_train, eval_set=[(X_test, y_test)],
                                           eval_metric='l1', early_stopping_rounds=1)
_ = lgb.train({'objective': 'regression', 'learning_rate': 1, 'metric': ['l2', 'l1']},
              lgb_train, valid_sets=[lgb_valid], early_stopping_rounds=1)

Output for master (just for the record):

[1]	valid_0's l2: 24.0111	valid_0's l1: 3.27364
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923
[3]	valid_0's l2: 21.8776	valid_0's l1: 3.04044
Early stopping, best iteration is:
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923


[1]	valid_0's l2: 24.0111	valid_0's l1: 3.27364
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923
[3]	valid_0's l2: 21.8776	valid_0's l1: 3.04044
Early stopping, best iteration is:
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923

And here are results for 9387197 (the latest commit in this PR), just add first_metric_only=True in sklearn model constructor and 'first_metric_only': True in params for train:

[1]	valid_0's l1: 3.27364	valid_0's l2: 24.0111
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l1: 3.03923	valid_0's l2: 21.9143
[3]	valid_0's l1: 3.04044	valid_0's l2: 21.8776
[4]	valid_0's l1: 2.82583	valid_0's l2: 19.9852
[5]	valid_0's l1: 2.74323	valid_0's l2: 18.2202
[6]	valid_0's l1: 2.69581	valid_0's l2: 17.676
[7]	valid_0's l1: 2.77143	valid_0's l2: 17.7967
[8]	valid_0's l1: 2.95588	valid_0's l2: 18.0791
[9]	valid_0's l1: 2.96642	valid_0's l2: 18.8399
[10]	valid_0's l1: 2.91047	valid_0's l2: 18.9361
[11]	valid_0's l1: 2.89317	valid_0's l2: 18.7892
[12]	valid_0's l1: 2.8452	valid_0's l2: 18.0382
[13]	valid_0's l1: 2.81333	valid_0's l2: 17.9244
[14]	valid_0's l1: 2.79163	valid_0's l2: 17.6823
[15]	valid_0's l1: 2.77444	valid_0's l2: 17.8778
[16]	valid_0's l1: 2.67583	valid_0's l2: 16.8548
[17]	valid_0's l1: 2.66395	valid_0's l2: 16.7515
[18]	valid_0's l1: 2.6448	valid_0's l2: 16.2761
[19]	valid_0's l1: 2.65737	valid_0's l2: 16.1425
[20]	valid_0's l1: 2.63126	valid_0's l2: 15.903
[21]	valid_0's l1: 2.67531	valid_0's l2: 16.2844
[22]	valid_0's l1: 2.66466	valid_0's l2: 16.0598
[23]	valid_0's l1: 2.67159	valid_0's l2: 15.8824
[24]	valid_0's l1: 2.66639	valid_0's l2: 16.2519
[25]	valid_0's l1: 2.67604	valid_0's l2: 16.5683
[26]	valid_0's l1: 2.62472	valid_0's l2: 16.1583
[27]	valid_0's l1: 2.64028	valid_0's l2: 16.1226
[28]	valid_0's l1: 2.64049	valid_0's l2: 16.0299
[29]	valid_0's l1: 2.63736	valid_0's l2: 15.8527
[30]	valid_0's l1: 2.65795	valid_0's l2: 15.9717
[31]	valid_0's l1: 2.65511	valid_0's l2: 16.0035
[32]	valid_0's l1: 2.69792	valid_0's l2: 16.2856
[33]	valid_0's l1: 2.69483	valid_0's l2: 16.2534
[34]	valid_0's l1: 2.70091	valid_0's l2: 16.3198
[35]	valid_0's l1: 2.70696	valid_0's l2: 16.303
[36]	valid_0's l1: 2.72469	valid_0's l2: 16.4889
[37]	valid_0's l1: 2.72439	valid_0's l2: 16.5529
[38]	valid_0's l1: 2.72801	valid_0's l2: 16.5713
[39]	valid_0's l1: 2.73169	valid_0's l2: 16.6174
[40]	valid_0's l1: 2.73144	valid_0's l2: 16.5682
[41]	valid_0's l1: 2.72428	valid_0's l2: 16.4991
[42]	valid_0's l1: 2.71914	valid_0's l2: 16.4974
[43]	valid_0's l1: 2.72347	valid_0's l2: 16.5122
[44]	valid_0's l1: 2.72092	valid_0's l2: 16.4841
[45]	valid_0's l1: 2.72379	valid_0's l2: 16.4423
[46]	valid_0's l1: 2.7121	valid_0's l2: 16.2953
[47]	valid_0's l1: 2.7094	valid_0's l2: 16.2658
[48]	valid_0's l1: 2.72105	valid_0's l2: 16.2631
[49]	valid_0's l1: 2.72685	valid_0's l2: 16.2346
[50]	valid_0's l1: 2.72393	valid_0's l2: 16.3498
[51]	valid_0's l1: 2.72445	valid_0's l2: 16.2387
[52]	valid_0's l1: 2.72538	valid_0's l2: 16.2534
[53]	valid_0's l1: 2.73256	valid_0's l2: 16.267
[54]	valid_0's l1: 2.73545	valid_0's l2: 16.3227
[55]	valid_0's l1: 2.73642	valid_0's l2: 16.3548
[56]	valid_0's l1: 2.73607	valid_0's l2: 16.31
[57]	valid_0's l1: 2.72831	valid_0's l2: 16.3087
[58]	valid_0's l1: 2.73048	valid_0's l2: 16.34
[59]	valid_0's l1: 2.72752	valid_0's l2: 16.3608
[60]	valid_0's l1: 2.73523	valid_0's l2: 16.4188
[61]	valid_0's l1: 2.7403	valid_0's l2: 16.4628
[62]	valid_0's l1: 2.73919	valid_0's l2: 16.426
[63]	valid_0's l1: 2.74364	valid_0's l2: 16.4654
[64]	valid_0's l1: 2.74519	valid_0's l2: 16.4937
[65]	valid_0's l1: 2.74492	valid_0's l2: 16.4566
[66]	valid_0's l1: 2.74756	valid_0's l2: 16.5328
[67]	valid_0's l1: 2.74524	valid_0's l2: 16.4903
[68]	valid_0's l1: 2.74352	valid_0's l2: 16.4523
[69]	valid_0's l1: 2.74337	valid_0's l2: 16.4446
[70]	valid_0's l1: 2.74791	valid_0's l2: 16.4916
[71]	valid_0's l1: 2.74684	valid_0's l2: 16.4851
[72]	valid_0's l1: 2.74972	valid_0's l2: 16.5216
[73]	valid_0's l1: 2.75258	valid_0's l2: 16.5458
[74]	valid_0's l1: 2.75049	valid_0's l2: 16.5218
[75]	valid_0's l1: 2.75005	valid_0's l2: 16.5376
[76]	valid_0's l1: 2.7454	valid_0's l2: 16.4674
[77]	valid_0's l1: 2.74347	valid_0's l2: 16.4555
[78]	valid_0's l1: 2.74529	valid_0's l2: 16.4622
[79]	valid_0's l1: 2.74857	valid_0's l2: 16.4924
[80]	valid_0's l1: 2.74406	valid_0's l2: 16.4474
[81]	valid_0's l1: 2.74543	valid_0's l2: 16.4718
[82]	valid_0's l1: 2.74534	valid_0's l2: 16.4898
[83]	valid_0's l1: 2.74702	valid_0's l2: 16.4914
[84]	valid_0's l1: 2.74529	valid_0's l2: 16.4553
[85]	valid_0's l1: 2.74377	valid_0's l2: 16.434
[86]	valid_0's l1: 2.74682	valid_0's l2: 16.4436
[87]	valid_0's l1: 2.75086	valid_0's l2: 16.4588
[88]	valid_0's l1: 2.75425	valid_0's l2: 16.514
[89]	valid_0's l1: 2.7563	valid_0's l2: 16.5461
[90]	valid_0's l1: 2.75635	valid_0's l2: 16.5385
[91]	valid_0's l1: 2.75892	valid_0's l2: 16.5633
[92]	valid_0's l1: 2.75495	valid_0's l2: 16.5573
[93]	valid_0's l1: 2.75868	valid_0's l2: 16.5822
[94]	valid_0's l1: 2.75643	valid_0's l2: 16.574
[95]	valid_0's l1: 2.75566	valid_0's l2: 16.5515
[96]	valid_0's l1: 2.758	valid_0's l2: 16.5612
[97]	valid_0's l1: 2.75754	valid_0's l2: 16.5546
[98]	valid_0's l1: 2.75908	valid_0's l2: 16.5781
[99]	valid_0's l1: 2.75996	valid_0's l2: 16.5889
[100]	valid_0's l1: 2.76181	valid_0's l2: 16.5993


[1]	valid_0's l2: 24.0111	valid_0's l1: 3.27364
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923
[3]	valid_0's l2: 21.8776	valid_0's l1: 3.04044
[4]	valid_0's l2: 19.9852	valid_0's l1: 2.82583
[5]	valid_0's l2: 18.2202	valid_0's l1: 2.74323
[6]	valid_0's l2: 17.676	valid_0's l1: 2.69581
[7]	valid_0's l2: 17.7967	valid_0's l1: 2.77143
Early stopping, best iteration is:
[6]	valid_0's l2: 17.676	valid_0's l1: 2.69581
Evaluating only: l2

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jun 8, 2019

@matsuken92 Also, here is another one thing which should be updated for the consistent behavior of first_metric_only:

params['metric'] = set(original_metric + eval_metric)

@matsuken92
Copy link
Contributor Author

@StrikerRUS
Oh, that's right. I will check it.

@StrikerRUS StrikerRUS force-pushed the bugfix/first_metric_only_train_metric branch from 5316ff4 to c3fbf6b Compare September 10, 2019 01:13
@StrikerRUS
Copy link
Collaborator

@matsuken92 Thanks a lot for your hard work! I pushed some commits in which I simplified code a little bit, moved initialization to init phase of callback where possible, removed unused code and increased tests readability with the aim to move this PR forward to merging. Unfortunately, I had to remove your logic of pre-computing iterations from tests, because it wastes CI time, and replace it with hardcoded values.

@StrikerRUS
Copy link
Collaborator

@matsuken92 The one (and the last one, I hope 😄 ) thing needed to fix is that in case of first_metric_only this if isn't evaluated:

if env.iteration == env.end_iteration - 1:
if verbose:
print('Did not meet early stopping. Best iteration is:\n[%d]\t%s' % (
best_iter[i] + 1, '\t'.join([_format_eval_result(x) for x in best_score_list[i]])))
raise EarlyStopException(best_iter[i], best_score_list[i])

In other words, message is not printed and EarlyStopException is not risen after the training process.

@StrikerRUS StrikerRUS mentioned this pull request Sep 10, 2019
@matsuken92
Copy link
Contributor Author

I checked it with the following settings (just change early_stopping_rounds=30), then the message Did not meet early stopping. Best iteration is: was shown and EarlyStopException was raised. Maybe I didn't fully understand your issue, so could you let me know in detail more?

  def metrics_combination_train_regression(valid_sets, metric_list, assumed_iteration,
                                                 first_metric_only, feval=None):
            params = {
                'objective': 'regression',
                'learning_rate': 1.1,
                'num_leaves': 10,
                'metric': metric_list,
                'verbose': -1,
                'seed': 123
            }
            gbm = lgb.train(dict(params, first_metric_only=first_metric_only), lgb_train,
                            num_boost_round=25, valid_sets=valid_sets, feval=feval,
                            early_stopping_rounds=30, verbose_eval=1)
            self.assertEqual(assumed_iteration, gbm.best_iteration)
        metrics_combination_train_regression(lgb_valid1, [], iter_valid1_l2, True)
[LightGBM] [Warning] Unknown parameter metric=[1]	valid_0's l2: 29.8981
Training until validation scores don't improve for 30 rounds
[2]	valid_0's l2: 23.2273
[3]	valid_0's l2: 23.3257
[4]	valid_0's l2: 21.267
[5]	valid_0's l2: 20.7759
[6]	valid_0's l2: 19.1572
[7]	valid_0's l2: 17.3846
[8]	valid_0's l2: 16.7308
[9]	valid_0's l2: 16.1299
[10]	valid_0's l2: 17.9467
[11]	valid_0's l2: 18.9755
[12]	valid_0's l2: 17.2129
[13]	valid_0's l2: 15.9185
[14]	valid_0's l2: 15.4884
[15]	valid_0's l2: 15.6829
[16]	valid_0's l2: 16.1173
[17]	valid_0's l2: 15.513
[18]	valid_0's l2: 16.5064
[19]	valid_0's l2: 16.5645
[20]	valid_0's l2: 16.1815
[21]	valid_0's l2: 15.6797
[22]	valid_0's l2: 15.4121
[23]	valid_0's l2: 15.7455
[24]	valid_0's l2: 15.7099
[25]	valid_0's l2: 16.1358
Did not meet early stopping. Best iteration is:
[22]	valid_0's l2: 15.4121
Evaluated only: l2

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Sep 10, 2019

@matsuken92

Maybe I didn't fully understand your issue, so could you let me know in detail more?

I'm sorry, I was not clear enough! That if statement if env.iteration == env.end_iteration - 1: is not reachable in case when in each for loop iteration any of the if statements above are passed and continue instruction is used. For example, this issue can be observed with the train data as evaluation set.

import lightgbm as lgb

from sklearn.datasets import load_boston

X, y = load_boston(True)
lgb_train = lgb.Dataset(X, y)

lgb.train({'objective': 'regression', 'verbose': 10},
          lgb_train, valid_sets=[lgb_train], early_stopping_rounds=1, verbose_eval=True)

I'm not sure, but it seems to me that it's the only case when it is possible. first_metric is computed at the first iteration of for loop and a metrics list cannot be changed during the training, so there is a guarantee that at least once this if statement first_metric_only and first_metric[0] != eval_name_splitted[-1]: will not trigger continue.

@matsuken92
Copy link
Contributor Author

@StrikerRUS Thanks. I got it, now I can reproduce the behavior. I will investigate it.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matsuken92 Thank you very much for fixing bug with final iteration! And of course many thanks for the whole work!

I'm really excited to approve this PR finally! 🎉

@StrikerRUS
Copy link
Collaborator

@guolinke @chivee @henry0312 Can you please review this?

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guolinke
Copy link
Collaborator

@StrikerRUS is this ready?

@StrikerRUS
Copy link
Collaborator

@guolinke

is this ready?

Yep! As no one else don't want to leave a review, I'm merging.

@StrikerRUS StrikerRUS merged commit 8475439 into microsoft:master Sep 15, 2019
@matsuken92
Copy link
Contributor Author

@StrikerRUS I’m happy to be merged this PR! Thank you for your a lot of supports!!!

@StrikerRUS
Copy link
Collaborator

@matsuken92 Awesome work and great patience from your side! Thanks! I hope you'll find some time and we'll see a PR for #2105 in the future as you've already dug deep into Python code.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants