[python] Bug fix for first_metric_only on earlystopping. #2209

matsuken92 · 2019-06-01T10:42:33Z

Order of metric list is not fixed even if it is defined by list. So explicitly indicating which metric is used on early stopping.
Due to introducing eval_train_metric feature, if the feature is enabled, then the first metric became train score of the metrics, so it does not appropriate for early stopping. So until the specified metrics of validation is coming, skipping to check early stopping in the loop.

(This PR is latest version of #2127. Once this PR is activated #2127 should be closed.)

… and local.

StrikerRUS

@matsuken92 Thank you very much for continuing the work on this PR! Unfortunately, I hadn't had a chance to take a closer look yet, but below are some comments which I can leave after a brief glance.

python-package/lightgbm/callback.py

tests/python_package_test/test_engine.py

StrikerRUS · 2019-06-01T20:39:12Z

@matsuken92 Seems that I've managed to relax num_boost_round to 25 (~~only for regression for now~~ UPD: for all checks). Don't you mind I'll push necessary changes directly into your branch? If you are OK with it, please make sure that Allow edits from maintainers is checked: https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork.

python-package/lightgbm/callback.py

matsuken92 · 2019-06-02T02:55:27Z

@StrikerRUS

Don't you mind I'll push necessary changes directly into your branch?

Yes, of course, please!

StrikerRUS · 2019-06-02T13:36:52Z

@matsuken92

Yes, of course, please!

Done! Please take a look at the latest commit.

Also, it seems to me that something is wrong with sklearn wrapper (first_metric_only=True disables early stopping completely (?)). Can you please check it?

tests/python_package_test/test_engine.py

python-package/lightgbm/callback.py

StrikerRUS · 2019-06-08T00:06:46Z

@matsuken92

I haven't realized the behavior, so I'm going to investigate it. Please wait for a while.

Below is the example about what I was saying.

import numpy as np
import lightgbm as lgb

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_valid = lgb.Dataset(X_test, y_test, reference=lgb_train)

_ = lgb.LGBMRegressor(learning_rate=1).fit(X_train, y_train, eval_set=[(X_test, y_test)],
                                           eval_metric='l1', early_stopping_rounds=1)
_ = lgb.train({'objective': 'regression', 'learning_rate': 1, 'metric': ['l2', 'l1']},
              lgb_train, valid_sets=[lgb_valid], early_stopping_rounds=1)

Output for master (just for the record):

[1]	valid_0's l2: 24.0111	valid_0's l1: 3.27364
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923
[3]	valid_0's l2: 21.8776	valid_0's l1: 3.04044
Early stopping, best iteration is:
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923


[1]	valid_0's l2: 24.0111	valid_0's l1: 3.27364
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923
[3]	valid_0's l2: 21.8776	valid_0's l1: 3.04044
Early stopping, best iteration is:
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923

And here are results for 9387197 (the latest commit in this PR), just add first_metric_only=True in sklearn model constructor and 'first_metric_only': True in params for train:

[1]	valid_0's l1: 3.27364	valid_0's l2: 24.0111
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l1: 3.03923	valid_0's l2: 21.9143
[3]	valid_0's l1: 3.04044	valid_0's l2: 21.8776
[4]	valid_0's l1: 2.82583	valid_0's l2: 19.9852
[5]	valid_0's l1: 2.74323	valid_0's l2: 18.2202
[6]	valid_0's l1: 2.69581	valid_0's l2: 17.676
[7]	valid_0's l1: 2.77143	valid_0's l2: 17.7967
[8]	valid_0's l1: 2.95588	valid_0's l2: 18.0791
[9]	valid_0's l1: 2.96642	valid_0's l2: 18.8399
[10]	valid_0's l1: 2.91047	valid_0's l2: 18.9361
[11]	valid_0's l1: 2.89317	valid_0's l2: 18.7892
[12]	valid_0's l1: 2.8452	valid_0's l2: 18.0382
[13]	valid_0's l1: 2.81333	valid_0's l2: 17.9244
[14]	valid_0's l1: 2.79163	valid_0's l2: 17.6823
[15]	valid_0's l1: 2.77444	valid_0's l2: 17.8778
[16]	valid_0's l1: 2.67583	valid_0's l2: 16.8548
[17]	valid_0's l1: 2.66395	valid_0's l2: 16.7515
[18]	valid_0's l1: 2.6448	valid_0's l2: 16.2761
[19]	valid_0's l1: 2.65737	valid_0's l2: 16.1425
[20]	valid_0's l1: 2.63126	valid_0's l2: 15.903
[21]	valid_0's l1: 2.67531	valid_0's l2: 16.2844
[22]	valid_0's l1: 2.66466	valid_0's l2: 16.0598
[23]	valid_0's l1: 2.67159	valid_0's l2: 15.8824
[24]	valid_0's l1: 2.66639	valid_0's l2: 16.2519
[25]	valid_0's l1: 2.67604	valid_0's l2: 16.5683
[26]	valid_0's l1: 2.62472	valid_0's l2: 16.1583
[27]	valid_0's l1: 2.64028	valid_0's l2: 16.1226
[28]	valid_0's l1: 2.64049	valid_0's l2: 16.0299
[29]	valid_0's l1: 2.63736	valid_0's l2: 15.8527
[30]	valid_0's l1: 2.65795	valid_0's l2: 15.9717
[31]	valid_0's l1: 2.65511	valid_0's l2: 16.0035
[32]	valid_0's l1: 2.69792	valid_0's l2: 16.2856
[33]	valid_0's l1: 2.69483	valid_0's l2: 16.2534
[34]	valid_0's l1: 2.70091	valid_0's l2: 16.3198
[35]	valid_0's l1: 2.70696	valid_0's l2: 16.303
[36]	valid_0's l1: 2.72469	valid_0's l2: 16.4889
[37]	valid_0's l1: 2.72439	valid_0's l2: 16.5529
[38]	valid_0's l1: 2.72801	valid_0's l2: 16.5713
[39]	valid_0's l1: 2.73169	valid_0's l2: 16.6174
[40]	valid_0's l1: 2.73144	valid_0's l2: 16.5682
[41]	valid_0's l1: 2.72428	valid_0's l2: 16.4991
[42]	valid_0's l1: 2.71914	valid_0's l2: 16.4974
[43]	valid_0's l1: 2.72347	valid_0's l2: 16.5122
[44]	valid_0's l1: 2.72092	valid_0's l2: 16.4841
[45]	valid_0's l1: 2.72379	valid_0's l2: 16.4423
[46]	valid_0's l1: 2.7121	valid_0's l2: 16.2953
[47]	valid_0's l1: 2.7094	valid_0's l2: 16.2658
[48]	valid_0's l1: 2.72105	valid_0's l2: 16.2631
[49]	valid_0's l1: 2.72685	valid_0's l2: 16.2346
[50]	valid_0's l1: 2.72393	valid_0's l2: 16.3498
[51]	valid_0's l1: 2.72445	valid_0's l2: 16.2387
[52]	valid_0's l1: 2.72538	valid_0's l2: 16.2534
[53]	valid_0's l1: 2.73256	valid_0's l2: 16.267
[54]	valid_0's l1: 2.73545	valid_0's l2: 16.3227
[55]	valid_0's l1: 2.73642	valid_0's l2: 16.3548
[56]	valid_0's l1: 2.73607	valid_0's l2: 16.31
[57]	valid_0's l1: 2.72831	valid_0's l2: 16.3087
[58]	valid_0's l1: 2.73048	valid_0's l2: 16.34
[59]	valid_0's l1: 2.72752	valid_0's l2: 16.3608
[60]	valid_0's l1: 2.73523	valid_0's l2: 16.4188
[61]	valid_0's l1: 2.7403	valid_0's l2: 16.4628
[62]	valid_0's l1: 2.73919	valid_0's l2: 16.426
[63]	valid_0's l1: 2.74364	valid_0's l2: 16.4654
[64]	valid_0's l1: 2.74519	valid_0's l2: 16.4937
[65]	valid_0's l1: 2.74492	valid_0's l2: 16.4566
[66]	valid_0's l1: 2.74756	valid_0's l2: 16.5328
[67]	valid_0's l1: 2.74524	valid_0's l2: 16.4903
[68]	valid_0's l1: 2.74352	valid_0's l2: 16.4523
[69]	valid_0's l1: 2.74337	valid_0's l2: 16.4446
[70]	valid_0's l1: 2.74791	valid_0's l2: 16.4916
[71]	valid_0's l1: 2.74684	valid_0's l2: 16.4851
[72]	valid_0's l1: 2.74972	valid_0's l2: 16.5216
[73]	valid_0's l1: 2.75258	valid_0's l2: 16.5458
[74]	valid_0's l1: 2.75049	valid_0's l2: 16.5218
[75]	valid_0's l1: 2.75005	valid_0's l2: 16.5376
[76]	valid_0's l1: 2.7454	valid_0's l2: 16.4674
[77]	valid_0's l1: 2.74347	valid_0's l2: 16.4555
[78]	valid_0's l1: 2.74529	valid_0's l2: 16.4622
[79]	valid_0's l1: 2.74857	valid_0's l2: 16.4924
[80]	valid_0's l1: 2.74406	valid_0's l2: 16.4474
[81]	valid_0's l1: 2.74543	valid_0's l2: 16.4718
[82]	valid_0's l1: 2.74534	valid_0's l2: 16.4898
[83]	valid_0's l1: 2.74702	valid_0's l2: 16.4914
[84]	valid_0's l1: 2.74529	valid_0's l2: 16.4553
[85]	valid_0's l1: 2.74377	valid_0's l2: 16.434
[86]	valid_0's l1: 2.74682	valid_0's l2: 16.4436
[87]	valid_0's l1: 2.75086	valid_0's l2: 16.4588
[88]	valid_0's l1: 2.75425	valid_0's l2: 16.514
[89]	valid_0's l1: 2.7563	valid_0's l2: 16.5461
[90]	valid_0's l1: 2.75635	valid_0's l2: 16.5385
[91]	valid_0's l1: 2.75892	valid_0's l2: 16.5633
[92]	valid_0's l1: 2.75495	valid_0's l2: 16.5573
[93]	valid_0's l1: 2.75868	valid_0's l2: 16.5822
[94]	valid_0's l1: 2.75643	valid_0's l2: 16.574
[95]	valid_0's l1: 2.75566	valid_0's l2: 16.5515
[96]	valid_0's l1: 2.758	valid_0's l2: 16.5612
[97]	valid_0's l1: 2.75754	valid_0's l2: 16.5546
[98]	valid_0's l1: 2.75908	valid_0's l2: 16.5781
[99]	valid_0's l1: 2.75996	valid_0's l2: 16.5889
[100]	valid_0's l1: 2.76181	valid_0's l2: 16.5993


[1]	valid_0's l2: 24.0111	valid_0's l1: 3.27364
Training until validation scores don't improve for 1 rounds.
[2]	valid_0's l2: 21.9143	valid_0's l1: 3.03923
[3]	valid_0's l2: 21.8776	valid_0's l1: 3.04044
[4]	valid_0's l2: 19.9852	valid_0's l1: 2.82583
[5]	valid_0's l2: 18.2202	valid_0's l1: 2.74323
[6]	valid_0's l2: 17.676	valid_0's l1: 2.69581
[7]	valid_0's l2: 17.7967	valid_0's l1: 2.77143
Early stopping, best iteration is:
[6]	valid_0's l2: 17.676	valid_0's l1: 2.69581
Evaluating only: l2

StrikerRUS · 2019-06-08T00:10:18Z

@matsuken92 Also, here is another one thing which should be updated for the consistent behavior of first_metric_only:

LightGBM/python-package/lightgbm/sklearn.py

Line 526 in dace050

params['metric'] = set(original_metric + eval_metric)

matsuken92 · 2019-09-03T01:27:19Z

@StrikerRUS
Oh, that's right. I will check it.

StrikerRUS · 2019-09-10T01:22:11Z

@matsuken92 Thanks a lot for your hard work! I pushed some commits in which I simplified code a little bit, moved initialization to init phase of callback where possible, removed unused code and increased tests readability with the aim to move this PR forward to merging. Unfortunately, I had to remove your logic of pre-computing iterations from tests, because it wastes CI time, and replace it with hardcoded values.

StrikerRUS · 2019-09-10T02:11:08Z

@matsuken92 The one (and the last one, I hope 😄 ) thing needed to fix is that in case of first_metric_only this if isn't evaluated:

LightGBM/python-package/lightgbm/callback.py

Lines 228 to 232 in b310fb4

    
           if env.iteration == env.end_iteration - 1: 
        
               if verbose: 
        
                   print('Did not meet early stopping. Best iteration is:\n[%d]\t%s' % ( 
        
                       best_iter[i] + 1, '\t'.join([_format_eval_result(x) for x in best_score_list[i]]))) 
        
               raise EarlyStopException(best_iter[i], best_score_list[i])

In other words, message is not printed and EarlyStopException is not risen after the training process.

matsuken92 · 2019-09-10T15:36:40Z

I checked it with the following settings (just change early_stopping_rounds=30), then the message Did not meet early stopping. Best iteration is: was shown and EarlyStopException was raised. Maybe I didn't fully understand your issue, so could you let me know in detail more?

  def metrics_combination_train_regression(valid_sets, metric_list, assumed_iteration,
                                                 first_metric_only, feval=None):
            params = {
                'objective': 'regression',
                'learning_rate': 1.1,
                'num_leaves': 10,
                'metric': metric_list,
                'verbose': -1,
                'seed': 123
            }
            gbm = lgb.train(dict(params, first_metric_only=first_metric_only), lgb_train,
                            num_boost_round=25, valid_sets=valid_sets, feval=feval,
                            early_stopping_rounds=30, verbose_eval=1)
            self.assertEqual(assumed_iteration, gbm.best_iteration)

        metrics_combination_train_regression(lgb_valid1, [], iter_valid1_l2, True)

[LightGBM] [Warning] Unknown parameter metric=[1]	valid_0's l2: 29.8981
Training until validation scores don't improve for 30 rounds
[2]	valid_0's l2: 23.2273
[3]	valid_0's l2: 23.3257
[4]	valid_0's l2: 21.267
[5]	valid_0's l2: 20.7759
[6]	valid_0's l2: 19.1572
[7]	valid_0's l2: 17.3846
[8]	valid_0's l2: 16.7308
[9]	valid_0's l2: 16.1299
[10]	valid_0's l2: 17.9467
[11]	valid_0's l2: 18.9755
[12]	valid_0's l2: 17.2129
[13]	valid_0's l2: 15.9185
[14]	valid_0's l2: 15.4884
[15]	valid_0's l2: 15.6829
[16]	valid_0's l2: 16.1173
[17]	valid_0's l2: 15.513
[18]	valid_0's l2: 16.5064
[19]	valid_0's l2: 16.5645
[20]	valid_0's l2: 16.1815
[21]	valid_0's l2: 15.6797
[22]	valid_0's l2: 15.4121
[23]	valid_0's l2: 15.7455
[24]	valid_0's l2: 15.7099
[25]	valid_0's l2: 16.1358
Did not meet early stopping. Best iteration is:
[22]	valid_0's l2: 15.4121
Evaluated only: l2

StrikerRUS · 2019-09-10T19:49:38Z

@matsuken92

Maybe I didn't fully understand your issue, so could you let me know in detail more?

I'm sorry, I was not clear enough! That if statement if env.iteration == env.end_iteration - 1: is not reachable in case when in each for loop iteration any of the if statements above are passed and continue instruction is used. For example, this issue can be observed with the train data as evaluation set.

import lightgbm as lgb

from sklearn.datasets import load_boston

X, y = load_boston(True)
lgb_train = lgb.Dataset(X, y)

lgb.train({'objective': 'regression', 'verbose': 10},
          lgb_train, valid_sets=[lgb_train], early_stopping_rounds=1, verbose_eval=True)

I'm not sure, but it seems to me that it's the only case when it is possible. first_metric is computed at the first iteration of for loop and a metrics list cannot be changed during the training, so there is a guarantee that at least once this if statement first_metric_only and first_metric[0] != eval_name_splitted[-1]: will not trigger continue.

matsuken92 · 2019-09-11T11:46:32Z

@StrikerRUS Thanks. I got it, now I can reproduce the behavior. I will investigate it.

StrikerRUS

@matsuken92 Thank you very much for fixing bug with final iteration! And of course many thanks for the whole work!

I'm really excited to approve this PR finally! 🎉

StrikerRUS · 2019-09-12T17:33:27Z

@guolinke @chivee @henry0312 Can you please review this?

guolinke

LGTM

guolinke · 2019-09-15T06:32:05Z

@StrikerRUS is this ready?

StrikerRUS · 2019-09-15T20:43:30Z

@guolinke

is this ready?

Yep! As no one else don't want to leave a review, I'm merging.

matsuken92 · 2019-09-20T10:36:31Z

@StrikerRUS I’m happy to be merged this PR! Thank you for your a lot of supports!!!

StrikerRUS · 2019-09-20T12:32:50Z

@matsuken92 Awesome work and great patience from your side! Thanks! I hope you'll find some time and we'll see a PR for #2105 in the future as you've already dug deep into Python code.

matsuken92 added 11 commits June 1, 2019 14:52

Bug fix for first_metric_only if the first metric is train metric.

0c9c77c

Update bug fix for feval issue.

130fe38

Disable feval for first_metric_only.

7ab1a59

Additional test items.

ba8a5aa

Fix wrong assertEqual settings & formating.

25850fa

Change dataset of test.

fddf8da

Fix random seed for test.

6b71ebc

Modiry assumed test result due to different sklearn verion between CI…

979f4df

… and local.

Remove f-string

5e68ae9

Applying variable assumed test result for test.

0f196e2

Fix flake8 error.

c0d61fa

StrikerRUS requested changes Jun 1, 2019

View reviewed changes

StrikerRUS requested review from StrikerRUS, henry0312 and guolinke June 1, 2019 15:32

StrikerRUS changed the title ~~Bug fix for first_metric_only on earlystopping.~~ [python] Bug fix for first_metric_only on earlystopping. Jun 1, 2019

matsuken92 added 2 commits June 2, 2019 01:31

Modifying in accordance with review comments.

0e91956

Modifying for pylint.

6f30b81

StrikerRUS reviewed Jun 1, 2019

View reviewed changes

python-package/lightgbm/callback.py Outdated Show resolved Hide resolved

simplified tests

3770cc1

StrikerRUS reviewed Jun 2, 2019

View reviewed changes

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved

matsuken92 added 2 commits June 3, 2019 19:34

Deleting error criteria if eval_metric is None.

c4e0af5

Delete test items of classification.

2f4e2b0

StrikerRUS reviewed Jun 3, 2019

View reviewed changes

python-package/lightgbm/callback.py Outdated Show resolved Hide resolved

Simplifying if condition.

9387197

StrikerRUS mentioned this pull request Sep 3, 2019

Unexpected Behavior for Early Stopping with Custom Metric #2371

Closed

StrikerRUS and others added 7 commits September 3, 2019 22:24

refined sklearn test

f5a2b74

Making "feval" effective on early stopping.

b7a03e7

Merge branch 'master' into bugfix/first_metric_only_train_metric

47b7a23

fixed conflicts

5c99e7e

allow feval and first_metric_only for cv

15a5fc2

removed unused code

d20b338

added tests for feval

c3fbf6b

StrikerRUS force-pushed the bugfix/first_metric_only_train_metric branch from 5316ff4 to c3fbf6b Compare September 10, 2019 01:13

StrikerRUS added 2 commits September 10, 2019 04:53

fixed printing

cbaadbe

add note about whitespaces in feval name

a2d6449

StrikerRUS mentioned this pull request Sep 10, 2019

v2.3.0 realese #2138

Merged

Modifying final iteration process in case valid set is training data.

88050da

StrikerRUS approved these changes Sep 12, 2019

View reviewed changes

guolinke approved these changes Sep 13, 2019

View reviewed changes

guolinke mentioned this pull request Sep 15, 2019

early stopping bugs: 1. training data is not ignored; 2. first_metric_only is actually checking the first metric of the first validation data #2410

Closed

StrikerRUS merged commit 8475439 into microsoft:master Sep 15, 2019

StrikerRUS mentioned this pull request Sep 15, 2019

Bug fix for first_metric_only on earlystopping #2127

Closed

lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Bug fix for first_metric_only on earlystopping. #2209

[python] Bug fix for first_metric_only on earlystopping. #2209

matsuken92 commented Jun 1, 2019

StrikerRUS left a comment

StrikerRUS commented Jun 1, 2019 •

edited

matsuken92 commented Jun 2, 2019 •

edited

StrikerRUS commented Jun 2, 2019 •

edited

StrikerRUS commented Jun 8, 2019

StrikerRUS commented Jun 8, 2019 •

edited

matsuken92 commented Sep 3, 2019

StrikerRUS commented Sep 10, 2019

StrikerRUS commented Sep 10, 2019

matsuken92 commented Sep 10, 2019

StrikerRUS commented Sep 10, 2019 •

edited

matsuken92 commented Sep 11, 2019

StrikerRUS left a comment

StrikerRUS commented Sep 12, 2019

guolinke left a comment

guolinke commented Sep 15, 2019

StrikerRUS commented Sep 15, 2019

matsuken92 commented Sep 20, 2019

StrikerRUS commented Sep 20, 2019

[python] Bug fix for first_metric_only on earlystopping. #2209

[python] Bug fix for first_metric_only on earlystopping. #2209

Conversation

matsuken92 commented Jun 1, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS commented Jun 1, 2019 • edited

matsuken92 commented Jun 2, 2019 • edited

StrikerRUS commented Jun 2, 2019 • edited

StrikerRUS commented Jun 8, 2019

StrikerRUS commented Jun 8, 2019 • edited

matsuken92 commented Sep 3, 2019

StrikerRUS commented Sep 10, 2019

StrikerRUS commented Sep 10, 2019

matsuken92 commented Sep 10, 2019

StrikerRUS commented Sep 10, 2019 • edited

matsuken92 commented Sep 11, 2019

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS commented Sep 12, 2019

guolinke left a comment

Choose a reason for hiding this comment

guolinke commented Sep 15, 2019

StrikerRUS commented Sep 15, 2019

matsuken92 commented Sep 20, 2019

StrikerRUS commented Sep 20, 2019

StrikerRUS commented Jun 1, 2019 •

edited

matsuken92 commented Jun 2, 2019 •

edited

StrikerRUS commented Jun 2, 2019 •

edited

StrikerRUS commented Jun 8, 2019 •

edited

StrikerRUS commented Sep 10, 2019 •

edited