[ML] Stratified cross validation for regression #784

tveasey · 2019-10-25T18:23:38Z

This implements stratified fractional cross validation for regression, i.e. independently and uniformly random sample from each decile of the target variable. It also better controls the proportion of samples per fold (to be nearest integer to "total training examples" / "number folds") for classification.

valeriy42

LGTM. Just a single minor comment about the variable naming.

Since stratified regression samplings for regression are not as common as for classification, it would be great if you would provide an activation switch for it. It would fine if you would do it in a follow-up PR.

valeriy42 · 2019-10-28T14:01:38Z

lib/maths/CDataFrameUtils.cc

+        }
+    };
+
+    TDoubleVec weights;


The variable name weights is confusing here. Can you maybe rename it to something more self-explanatory, i.e. decentileCounts?

Ok good suggestion. I renamed in this commit. (Note I don't use decile because I passed in the number of buckets to optionally disable.)

droberts195 · 2019-10-28T22:34:27Z

lib/maths/unittest/CDataFrameUtilsTest.cc

+            }
+            LOG_DEBUG(<< "variance in test set target percentile = "
+                      << maths::CBasicStatistics::variance(testTargetDecileMoments));
+            BOOST_REQUIRE(maths::CBasicStatistics::variance(testTargetDecileMoments) < 0.02);


It's worth noting that BOOST_TEST_REQUIRE here would print the values either side of the < in the event of failure, whereas BOOST_REQUIRE won't.

It seems that BOOST_REQUIRE(condition) is the equivalent of BOOST_TEST_REQUIRE((condition)). I wasn't aware of this until now, and BOOST_REQUIRE is the cleaner solution for complex conditions where diagnostics cannot be printed. But for this line the improved diagnostics could be printed by BOOST_TEST_REQUIRE.

I changed this in #788

Backport #784.

tveasey added 5 commits October 23, 2019 11:48

Stratified cross validation for regression

d485a2d

Merge branch 'master' into stratified-cv-for-regression

3c70835

Comments bits and pieces

d1a8922

Merge branch 'master' into stratified-cv-for-regression

8cc6607

Fix test

abb2623

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.6.0 labels Oct 25, 2019

tveasey requested a review from valeriy42 October 25, 2019 18:23

Docs

35e4986

valeriy42 approved these changes Oct 28, 2019

View reviewed changes

tveasey added 3 commits October 28, 2019 18:14

Add an option to disable stratified cross-validation for regression

289f909

Better variable names

7c23e98

Use consistent naming

6b9701d

tveasey merged commit 4cde1d7 into elastic:master Oct 28, 2019

tveasey deleted the stratified-cv-for-regression branch October 28, 2019 20:57

droberts195 reviewed Oct 28, 2019

View reviewed changes

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Oct 29, 2019

[ML] Stratified cross validation for regression (elastic#784)

52bfb8f

tveasey mentioned this pull request Oct 29, 2019

[7.6][ML] Stratified cross validation for regression #792

Merged

tveasey added a commit that referenced this pull request Oct 29, 2019

[7.6][ML] Stratified cross validation for regression (#792)

4dd26ac

Backport #784.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Stratified cross validation for regression #784

[ML] Stratified cross validation for regression #784

tveasey commented Oct 25, 2019

valeriy42 left a comment

valeriy42 Oct 28, 2019

tveasey Oct 28, 2019

droberts195 Oct 28, 2019

droberts195 Oct 29, 2019

tveasey Oct 29, 2019

[ML] Stratified cross validation for regression #784

[ML] Stratified cross validation for regression #784

Conversation

tveasey commented Oct 25, 2019

valeriy42 left a comment

Choose a reason for hiding this comment

valeriy42 Oct 28, 2019

Choose a reason for hiding this comment

tveasey Oct 28, 2019

Choose a reason for hiding this comment

droberts195 Oct 28, 2019

Choose a reason for hiding this comment

droberts195 Oct 29, 2019

Choose a reason for hiding this comment

tveasey Oct 29, 2019

Choose a reason for hiding this comment