[ML] Stop early (based on non-decreasing validation error) when adding trees to the forest #875

tveasey · 2019-12-02T16:22:58Z

We can relatively cheaply (around a 2% overhead) compute predictions for the test rows at the same time as we compute them for the training rows. This means we can cheaply track the validation error as we add additional trees to the forest during training.

The validation error curve is fairly predictable: it decreases quickly (typically exponentially) at the start, hits a minimum and then often increases slightly as the model starts to overfit the training data. This change introduces a very simple exit condition designed to ensure we pay a fixed relative runtime cost for ensuring we don't exit too soon. Specifically, add at least f * "maximum number of trees" trees to the lowest validation error forest and resize the forest to minimise validation error at the end of training. We set f to be 0.05, i.e. 5% of the total cost of training the forest.

I don't think this is the final version of early stopping. When we have a compute budget, or some way for the user to fix a scale between the run time and accuracy they'd be happy with, we can be more clever in how we stop. However, I've tested this on 15 benchmark data sets and it often slightly improves QoR and I've seen large (6x) drops in runtime for some data sets. I think this therefore represents a clear improvement over our current strategy.

tveasey · 2019-12-09T09:52:44Z

retest

valeriy42

LGTM although I also think it's not the final version. I have only a minor comment and a question about computeEta call.

lib/api/CDataFrameTrainBoostedTreeClassifierRunner.cc

valeriy42 · 2019-12-09T10:51:51Z

lib/maths/CBoostedTreeFactory.cc

-        computeMaximumNumberTrees(m_TreeImpl->m_Eta)};
+    double eta{m_TreeImpl->m_EtaOverride != boost::none
+                   ? *m_TreeImpl->m_EtaOverride
+                   : computeEta(frame.numberColumns())};


Why is it not sufficient to have computeEta within initializeHyperparameters? Here it seems to be an unexpected place for the call.

There is an order problem: we need to set up progress monitoring before starting initializeHyperparameters. However, this needs to anticipate the correct value for eta so we monitor progress "correctly". In fact, there is still a problem because eta gets set to different values in the hyperparameter optimisation loop, but this is at least better than what we had before. (I want to re-evaluate our strategy for eta in a following PR, at which point I'll also fully fix progress monitoring.)

…g trees to the regression/classification forest (elastic#875)

…adding trees to the regression/classification forest (#880) Backport #875.

Early stopping when adding trees

575d668

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.6.0 labels Dec 2, 2019

tveasey requested a review from valeriy42 December 2, 2019 16:22

tveasey and others added 7 commits December 2, 2019 16:24

Build warnings

2e60e27

Docs

ec13cb3

Some test fall out

a046b02

More test fallout

a55a278

Final test fallout

9ddc092

Merge branch 'master' into early-stopping

0ad7fd3

Test threshold

9d673dd

tveasey mentioned this pull request Dec 6, 2019

[ML] Relax test expected memory usage elastic/elasticsearch#49932

Closed

valeriy42 approved these changes Dec 9, 2019

View reviewed changes

Review comment

637acc6

tveasey merged commit ca0f75f into elastic:master Dec 9, 2019

tveasey deleted the early-stopping branch December 9, 2019 13:36

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Dec 9, 2019

[ML] Stop early (based on non-decreasing validation error) when addin…

d82ed6c

…g trees to the regression/classification forest (elastic#875)

tveasey mentioned this pull request Dec 9, 2019

[7.6][ML] Stop early (based on non-decreasing validation error) when adding trees to the regression/classification forest #880

Merged

tveasey added a commit that referenced this pull request Dec 9, 2019

[7.6][ML] Stop early (based on non-decreasing validation error) when …

37b323b

…adding trees to the regression/classification forest (#880) Backport #875.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Stop early (based on non-decreasing validation error) when adding trees to the forest #875

[ML] Stop early (based on non-decreasing validation error) when adding trees to the forest #875

tveasey commented Dec 2, 2019

tveasey commented Dec 9, 2019

valeriy42 left a comment

valeriy42 Dec 9, 2019

tveasey Dec 9, 2019

[ML] Stop early (based on non-decreasing validation error) when adding trees to the forest #875

[ML] Stop early (based on non-decreasing validation error) when adding trees to the forest #875

Conversation

tveasey commented Dec 2, 2019

tveasey commented Dec 9, 2019

valeriy42 left a comment

Choose a reason for hiding this comment

valeriy42 Dec 9, 2019

Choose a reason for hiding this comment

tveasey Dec 9, 2019

Choose a reason for hiding this comment