[ML] Add r_squared eval metric to regression #44248

benwtrent · 2019-07-11T22:22:56Z

This adds the RSquared metric to Regression evaluations. RSquared (also called Coefficient of determination) is a useful, well known, and widely used evaluation metric for regression type models.

This was easily enough done utilizing a sum aggregation with a script (for sum of residual squares) and utilizing the variance in the extended_stats aggregation.

I initially thought I could use the sum_of_squares result from the extended_stats aggregation, but the value is literally the sum of the squares of each of the values. So, values of [1, 2, 3] the sum_of_squares would be 1 + 4 + 9 = 14. This is not to be confused with total sum of squares, which is what we actually need.

tveasey

Looks good Ben. I have one minor comment on normalisation of variance, but otherwise LGTM

tveasey · 2019-07-12T08:28:39Z

.../src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java

+        ExtendedStats extendedStats = aggs.get(ExtendedStatsAggregationBuilder.NAME + "_actual");
+        return residualSumOfSquares == null || extendedStats == null || extendedStats.getCount() == 0 ?
+            null :
+            new Result(1 - (residualSumOfSquares.value() / (extendedStats.getVariance() * extendedStats.getCount())));


One question, does extendedStats use the maximum likelihood or unbiased estimate of variance? This would be either a * extendedStats.getCount() or * (extendedStats.getCount() - 1) depending on the answer. I'm guessing this would have been flushed by the tests, but the difference is obviously small for large doc count, I think either way it is probably worth a comment here to say what the aggregation does use.

Looking at InternalExtendedStats this does use the maximum likelihood form, so multiplying by count is the right thing to do. So maybe just worthwhile adding the comment that the definition of the sample variance used by getVariance() is sum squared residuals divided by count.

@tveasey I will add the comment. FWIW, I have tested the calculation on some various datasets to make sure it matched what other tools (namely scikit) calculate for the r_squared metric.

.../src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java

…ml/dataframe/evaluation/regression/RSquared.java Co-Authored-By: David Kyle <david.kyle@elastic.co>

tveasey

LGTM

benwtrent · 2019-07-12T15:13:01Z

@elasticmachine update branch

davidkyle

LGTM

davidkyle · 2019-07-15T09:33:38Z

...rc/main/java/org/elasticsearch/client/ml/dataframe/evaluation/regression/RSquaredMetric.java

+/**
+ * Calculates R-Squared between two known numerical fields.
+ *
+ * equation: mse = 1 - SSres/SStot


Suggested change

* equation: mse = 1 - SSres/SStot

* equation: R-Squared = 1 - SSres/SStot

davidkyle · 2019-07-15T10:02:23Z

...lasticsearch/client/ml/dataframe/evaluation/softclassification/BinarySoftClassification.java

@@ -88,7 +94,10 @@ public BinarySoftClassification(String actualField, String predictedProbabilityF
                                    @Nullable List<EvaluationMetric> metrics) {
        this.actualField = Objects.requireNonNull(actualField);
        this.predictedProbabilityField = Objects.requireNonNull(predictedProbabilityField);
-        this.metrics = Objects.requireNonNull(metrics);
+        if (metrics != null) {


What does it mean if there are no Evaluation Metrics, does that make any sense to allow this?

Either way this is an improvement on the code that tagged metrics as nullable then has requireNonNull(metrics) 3 lines later

Oh I see now. The server side has a set of default metrics that are used if this parameter is null

yeah, null => use default value. This follows the implementation pattern we have elsewhere in the HLRC.

...c/test/java/org/elasticsearch/client/ml/dataframe/evaluation/regression/RegressionTests.java

...csearch/client/ml/dataframe/evaluation/softclassification/BinarySoftClassificationTests.java

davidkyle · 2019-07-15T10:19:15Z

.../src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java

+    private static final String PAINLESS_TEMPLATE = "def diff = doc[''{0}''].value - doc[''{1}''].value;return diff * diff;";
+    private static final String SS_RES = "residual_sum_of_squares";
+
+    private static String buildScript(Object...args) {


nit: why make this a varargs parameter when we know the template takes 2 replacements

This allows the call down to the MessageFormat#format to be done without warnings or errors and no explicit casting. using a String var1, String var2 requires casting to get around warnings, and using Object var1, Object var2 requires manual construction of an array so that it matches the appropriate function definition.

…benwtrent/elasticsearch into feature/ml-add-r_squared-metric-to-eval

benwtrent · 2019-07-15T17:34:58Z

run elasticsearch-ci/2
run elasticsearch-ci/packaging-sample

* [ML] Add r_squared eval metric to regression * fixing tests and binarysoftclassification class * Update RSquared.java * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java Co-Authored-By: David Kyle <david.kyle@elastic.co> * removing unnecessary debug test

[ML] Add r_squared eval metric to regression

ea89212

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.4.0 labels Jul 11, 2019

tveasey reviewed Jul 12, 2019

View reviewed changes

benwtrent and others added 2 commits July 12, 2019 07:44

fixing tests and binarysoftclassification class

1651974

Update RSquared.java

335a9e1

davidkyle reviewed Jul 12, 2019

View reviewed changes

.../src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java Outdated Show resolved Hide resolved

Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/…

d707de8

…ml/dataframe/evaluation/regression/RSquared.java Co-Authored-By: David Kyle <david.kyle@elastic.co>

tveasey approved these changes Jul 12, 2019

View reviewed changes

Merge branch 'master' into feature/ml-add-r_squared-metric-to-eval

d2b6adf

benwtrent requested a review from davidkyle July 12, 2019 17:59

davidkyle approved these changes Jul 15, 2019

View reviewed changes

benwtrent added 2 commits July 15, 2019 07:20

removing unnecessary debug test

cfcddce

Merge branch 'feature/ml-add-r_squared-metric-to-eval' of github.com:…

372a949

…benwtrent/elasticsearch into feature/ml-add-r_squared-metric-to-eval

benwtrent merged commit b4e16b6 into elastic:master Jul 15, 2019

benwtrent deleted the feature/ml-add-r_squared-metric-to-eval branch July 15, 2019 19:00

benwtrent mentioned this pull request Jul 15, 2019

[7.x] [ML] Add r_squared eval metric to regression (#44248) #44378

Merged

codebrain mentioned this pull request Oct 14, 2019

7.4 meta ticket elastic/elasticsearch-net#4133

Closed

56 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add r_squared eval metric to regression #44248

[ML] Add r_squared eval metric to regression #44248

benwtrent commented Jul 11, 2019

tveasey left a comment

tveasey Jul 12, 2019

tveasey Jul 12, 2019 •

edited

Loading

benwtrent Jul 12, 2019

tveasey left a comment

benwtrent commented Jul 12, 2019

davidkyle left a comment

davidkyle Jul 15, 2019

davidkyle Jul 15, 2019

davidkyle Jul 15, 2019

davidkyle Jul 15, 2019

benwtrent Jul 15, 2019

davidkyle Jul 15, 2019

benwtrent Jul 15, 2019

benwtrent commented Jul 15, 2019

	* equation: mse = 1 - SSres/SStot
	* equation: R-Squared = 1 - SSres/SStot

[ML] Add r_squared eval metric to regression #44248

[ML] Add r_squared eval metric to regression #44248

Conversation

benwtrent commented Jul 11, 2019

tveasey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tveasey Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

benwtrent commented Jul 12, 2019

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Jul 15, 2019

tveasey Jul 12, 2019 •

edited

Loading