[python-package] Use update() finish condition on booster loop #5193

Samsagax · 2022-05-02T19:00:19Z

Currently the boosting loop in python does not stop if the booster says it is finished. This patch captures the output of the booster.update() call and if it is finished it breaks the loop at the end of current iteration.

StrikerRUS · 2022-05-03T16:50:51Z

Hey @Samsagax , thanks for your interest in LightGBM!

is_finished indicates that only one current iteration has been finished successfully. You patch makes booster to stop after the first iteration which is wrong behavior.

Samsagax · 2022-05-03T18:09:04Z

Strange. Works for me with this change. If i don't apply this, then when a booster trains and finds no splittable features I get a spam of warnings and a constant output metric.

The warning says "Stopped training because there are no more leaves that meet the split requirements"

Looking up the code I find I'm hitting this line

LightGBM/src/boosting/gbdt.cpp

Line 438 in da9072f

    
           Log::Warning("Stopped training because there are no more leaves that meet the split requirements");

If I understood correctly that if will return true to the caller when training stops. Following the python code I see the C++ function TrainOneIter() is called by booster.update() python function and that is passed to the loop. Maybe is specific to GBDT booster, anyway.

Thanks for having the time to review the patch.

StrikerRUS · 2022-05-03T20:11:07Z

Works for me with this change.

This may mean that due to incorrect parameters your training actually builds only one tree and that's why there's no difference for your case.

This mechanism shouldn't work as an early stopping. That warning should help to understand that something is wrong in users' setup. Refer to #4178 for more details.

jameslamb · 2022-05-04T04:04:22Z

I just want to add one more comment, for those finding this PR from searching that warning.

It's related to #5051 (comment).

Another reason this shouldn't be used as an early stopping mechanism is that "can't find any more splits" in one iteration doesn't mean LightGBM automatically won't be able to in the next iteration. For example, if you're using feature_fraction to randomly choose a subset of features at each iteration, then it's possible that even if no informative splits are found in iteration i, some could be in iteration i + 1.

Samsagax · 2022-05-04T12:10:35Z

I just want to add one more comment, for those finding this PR from searching that warning.

It's related to #5051 (comment).

Another reason this shouldn't be used as an early stopping mechanism is that "can't find any more splits" in one iteration doesn't mean LightGBM automatically won't be able to in the next iteration. For example, if you're using feature_fraction to randomly choose a subset of features at each iteration, then it's possible that even if no informative splits are found in iteration i, some could be in iteration i + 1.

Thanks for the clarification and I myself experienced that situation (with feature_fraction < 1). I agree it should continue and this patch is not desired behaviour. I was just replicating the behaviour of the Train() method of the CLI version here:

LightGBM/src/boosting/gbdt.cpp

Line 266 in da9072f

void GBDT::Train(int snapshot_freq, const std::string& model_output_path) {

I still find a discrepancy on how the CLI version works since its implementation of GBDT Train() method does stop on the first tree that can't be split

LightGBM/src/boosting/gbdt.cpp

Line 271 in da9072f

is_finished = TrainOneIter(nullptr, nullptr);

Maybe it should not use the return of TrainOneIter() as is_finished condition to bring both implementations in line?

jameslamb · 2022-05-07T03:23:53Z

Maybe it should not use the return of TrainOneIter() as is_finished condition to bring both implementations in line?

I think that's exactly what the conclusion of the discussion in #5051 was, yep!

If you'd like to attempt a pull request to implement that change in behavior, please comment on #5051 and see if @shiyu1994 or @guolinke will agree to help review and answer questions.

Thanks for your continued interest in LightGBM!

As per discussion on GH-microsoft#5051 and GH-microsoft#5193 the python package does not stop training if a single leaf tree (stupm) is found and relies on early stopping methods to stop training. This commits removes the finish condition on training based on the result of `TrainOneIter()` and sets the `is_finished` flag on early stopping alone.

github-actions · 2023-08-19T03:53:59Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

Use update finish condition on booster loop

7443b16

Samsagax requested review from StrikerRUS, shiyu1994, jameslamb, hzy46, tongwu-sh and jmoralez as code owners May 2, 2022 19:00

Samsagax changed the title ~~Use update() finish condition on booster loop~~ [python-package] Use update() finish condition on booster loop May 2, 2022

StrikerRUS closed this May 3, 2022

StrikerRUS added the invalid label May 3, 2022

Samsagax deleted the fix-booster-update branch May 3, 2022 21:44

Samsagax mentioned this pull request Feb 4, 2023

Continue training in CLI if one iteration produces a single-leaf tree #5699

Open

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Use update() finish condition on booster loop #5193

[python-package] Use update() finish condition on booster loop #5193

Samsagax commented May 2, 2022

StrikerRUS commented May 3, 2022

Samsagax commented May 3, 2022

StrikerRUS commented May 3, 2022

jameslamb commented May 4, 2022

Samsagax commented May 4, 2022 •

edited

Loading

jameslamb commented May 7, 2022

github-actions bot commented Aug 19, 2023

[python-package] Use update() finish condition on booster loop #5193

[python-package] Use update() finish condition on booster loop #5193

Conversation

Samsagax commented May 2, 2022

StrikerRUS commented May 3, 2022

Samsagax commented May 3, 2022

StrikerRUS commented May 3, 2022

jameslamb commented May 4, 2022

Samsagax commented May 4, 2022 • edited Loading

jameslamb commented May 7, 2022

github-actions bot commented Aug 19, 2023

Samsagax commented May 4, 2022 •

edited

Loading