[ML] Stop cross-validation early if the parameters have high predicted test loss #915

tveasey · 2019-12-19T10:15:04Z

This makes two changes:

It sets the number of folds to use based on the amount of data available: we ensure each training fold has sufficient data per feature.
During hyperparameter optimisation, we use a linear model to predict the cross-validation loss on the remaining folds and stop early if the predicted test loss has a low chance of being less than the best values found so far.

…t loss

valeriy42

Looks really good. Since there is some very sophisticated code there, I included some comments to simplify code understanding.

lib/maths/CBoostedTreeFactory.cc

lib/maths/CBoostedTreeImpl.cc

Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>

tveasey · 2020-01-09T16:18:42Z

Thanks for the review @valeriy42. I think I've now addressed all your comments.

valeriy42

LGTM. Good job on explaining complex algorithmic bits.

valeriy42 · 2020-01-10T15:22:09Z

lib/maths/CBoostedTreeFactory.cc

@@ -46,7 +46,10 @@ const double MIN_DOWNSAMPLE_LINE_SEARCH_RANGE{2.0};
 const double MAX_DOWNSAMPLE_LINE_SEARCH_RANGE{144.0};
 const double MIN_DOWNSAMPLE_FACTOR_SCALE{0.3};
 const double MAX_DOWNSAMPLE_FACTOR_SCALE{3.0};
-const std::size_t MAX_NUMBER_FOLDS{5};
+// This isn't a hard limit be we increase the number of default training folds


I think, something wrong with this sentence.

valeriy42 · 2020-01-10T15:26:42Z

lib/maths/CBoostedTreeFactory.cc

+        // So how does the following work: we'd like "c * f * # rows" training rows.
+        // For k folds we'll have "(1 - 1 / k) * # rows" training rows. So we want
+        // to find the smallest integer k s.t. c * f * # rows <= (1 - 1 / k) * # rows.
+        // This gives k = ceil(1 / (1 - c * f)). However, we also upper bound this
+        // by MAX_NUMBER_FOLDS.


This is a very nice explanation!

…d test loss (elastic#915)

…dicted test loss (#931) Backport #915.

tveasey added 2 commits December 18, 2019 15:21

Stop cross-validation early if the parameters have high predicted tes…

692fdbb

…t loss

Fix progress reporting for large numbers of features

1829c6d

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.6.0 labels Dec 19, 2019

Docs

7f557e5

valeriy42 reviewed Jan 9, 2020

View reviewed changes

tveasey and others added 6 commits January 9, 2020 13:53

Improve comment

ed67d85

Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>

Merge branch 'master' into early-stopping-cv

4e1c097

Expand comment and rejig calculation of default number folds

01c5d45

Factor out named constant

294e0b9

Expand comment explaining fold error estimation for early stopping

b6943e4

Rejig slightly for readability

de99d29

Fix limiting maximum number of folds

87f7c6c

valeriy42 approved these changes Jan 10, 2020

View reviewed changes

tveasey and others added 3 commits January 10, 2020 17:32

Correct comment

c4d89ed

Tweak comment

8873dac

Merge branch 'master' into early-stopping-cv

1506d28

tveasey merged commit c2c436c into elastic:master Jan 10, 2020

tveasey deleted the early-stopping-cv branch January 10, 2020 20:37

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Jan 13, 2020

[ML] Stop cross-validation early if the parameters have high predicte…

291b6e0

…d test loss (elastic#915)

tveasey mentioned this pull request Jan 13, 2020

[7.6][ML] Stop cross-validation early if the parameters have high predicted test loss #931

Merged

tveasey added a commit that referenced this pull request Jan 13, 2020

[7.6][ML] Stop cross-validation early if the parameters have high pre…

c9043dd

…dicted test loss (#931) Backport #915.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Stop cross-validation early if the parameters have high predicted test loss #915

[ML] Stop cross-validation early if the parameters have high predicted test loss #915

tveasey commented Dec 19, 2019

valeriy42 left a comment

tveasey commented Jan 9, 2020

valeriy42 left a comment

valeriy42 Jan 10, 2020

valeriy42 Jan 10, 2020

[ML] Stop cross-validation early if the parameters have high predicted test loss #915

[ML] Stop cross-validation early if the parameters have high predicted test loss #915

Conversation

tveasey commented Dec 19, 2019

valeriy42 left a comment

Choose a reason for hiding this comment

tveasey commented Jan 9, 2020

valeriy42 left a comment

Choose a reason for hiding this comment

valeriy42 Jan 10, 2020

Choose a reason for hiding this comment

valeriy42 Jan 10, 2020

Choose a reason for hiding this comment