Do not copy mapping from dependent variable to prediction field in regression analysis #51227

przemekwitek · 2020-01-20T14:51:55Z

Currently, in case of regression analysis, the mapping is copied from dependent variable to prediction field.
When the dependent variable is of a discrete type (i.e. integer, long, etc.) the prediction field is indexed as a discrete type as well, increasing total prediction error (MSE, R^2).
This PR addresses that by making prediction field mapped dynamically (as float).

Closes https://github.com/elastic/machine-learning-qa/issues/661

elasticmachine · 2020-01-20T14:51:58Z

Pinging @elastic/ml-core (:ml)

benwtrent

Looks good to me :D.

I wonder if we should force the mapping for regression to always be double instead of it being float? I am not sure if the precision loss is a concern or not.

droberts195 · 2020-01-21T12:31:08Z

It's a good point about using double instead of float. float only stores 7 digits of accuracy, so if the dependent variable contained integers greater than 10 million then using float will lose more accuracy than sticking with integer. (integer will lose the fractions from the predictions, but for a number between 10 and 100 million float would make the units and fraction random.) double would be a reasonable compromise as it has 15 digits of accuracy. long has 19, but we'd only suffer the problem at the (unusual) extreme sizes, whereas integers over 10 million are not that uncommon.

przemekwitek · 2020-01-21T15:42:27Z

I wonder if we should force the mapping for regression to always be double instead of it being float? I am not sure if the precision loss is a concern or not.

This change required bigger changes in the logic that calculates mappings as now Regression imposes constant mapping while Classification copies the mapping from dependent variable.
You might want to take another look @benwtrent and @droberts195.

benwtrent

I think this looks good.

There are a bunch of failing tests due to the change :).

benwtrent · 2020-01-21T18:07:22Z

...core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/DataFrameAnalysis.java

     *
     * @param resultsFieldName name of the results field under which all the results are stored
-     * @return {@link Map} containing fields for which the mappings should be copied from source index to destination index
+     * @return {@link List} containing fields for which the mappings should be handled explicitly


Suggested change

* @return {@link List} containing fields for which the mappings should be handled explicitly

* @return {@link Map} containing fields for which the mappings should be handled explicitly

…gression analysis

…ce.field" field in the results

…gression analysis (elastic#51227)

… in regression analysis (#51227) (#51288)

… in regression analysis (#51227) (#51289)

przemekwitek added >bug WIP :ml Machine learning v8.0.0 v7.6.0 v7.7.0 labels Jan 20, 2020

przemekwitek force-pushed the fix_regression_mapping_bug branch from a7fe7f6 to 12a80bf Compare January 20, 2020 15:06

przemekwitek removed the WIP label Jan 20, 2020

przemekwitek marked this pull request as ready for review January 20, 2020 15:06

benwtrent approved these changes Jan 21, 2020

View reviewed changes

benwtrent self-assigned this Jan 21, 2020

benwtrent self-requested a review January 21, 2020 18:00

benwtrent approved these changes Jan 21, 2020

View reviewed changes

przemekwitek force-pushed the fix_regression_mapping_bug branch from b35b069 to d92811a Compare January 21, 2020 18:21

przemekwitek added 7 commits January 22, 2020 08:29

Do not copy mapping from dependent variable to prediction field in re…

ddd18c4

…gression analysis

Fix unit tests

9403767

Relax the assertion so that it expects at least one "feature_importan…

5cb5dbc

…ce.field" field in the results

Apply review comments

5bba73a

Restore Regression's explicitly mapped field in tests

e12b1a7

Fix integration tests

9581356

Apply review comment

90b3986

przemekwitek force-pushed the fix_regression_mapping_bug branch from d92811a to 90b3986 Compare January 22, 2020 07:29

przemekwitek merged commit 927b14e into elastic:master Jan 22, 2020

przemekwitek deleted the fix_regression_mapping_bug branch January 22, 2020 08:26

This was referenced Jan 22, 2020

[7.x] Do not copy mapping from dependent variable to prediction field in regression analysis (#51227) #51288

Merged

[7.6] Do not copy mapping from dependent variable to prediction field in regression analysis (#51227) #51289

Merged

przemekwitek added a commit to przemekwitek/elasticsearch that referenced this pull request Jan 22, 2020

Do not copy mapping from dependent variable to prediction field in re…

503404c

…gression analysis (elastic#51227)

przemekwitek added a commit that referenced this pull request Jan 22, 2020

[7.x] Do not copy mapping from dependent variable to prediction field…

bfcfcde

… in regression analysis (#51227) (#51288)

przemekwitek added a commit that referenced this pull request Jan 22, 2020

[7.6] Do not copy mapping from dependent variable to prediction field…

83ffe96

… in regression analysis (#51227) (#51289)

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not copy mapping from dependent variable to prediction field in regression analysis #51227

Do not copy mapping from dependent variable to prediction field in regression analysis #51227

przemekwitek commented Jan 20, 2020 •

edited

Loading

elasticmachine commented Jan 20, 2020

benwtrent left a comment

droberts195 commented Jan 21, 2020

przemekwitek commented Jan 21, 2020 •

edited

Loading

benwtrent left a comment

benwtrent Jan 21, 2020

przemekwitek Jan 21, 2020

	* @return {@link List} containing fields for which the mappings should be handled explicitly
	* @return {@link Map} containing fields for which the mappings should be handled explicitly

Do not copy mapping from dependent variable to prediction field in regression analysis #51227

Do not copy mapping from dependent variable to prediction field in regression analysis #51227

Conversation

przemekwitek commented Jan 20, 2020 • edited Loading

elasticmachine commented Jan 20, 2020

benwtrent left a comment

Choose a reason for hiding this comment

droberts195 commented Jan 21, 2020

przemekwitek commented Jan 21, 2020 • edited Loading

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Jan 21, 2020

Choose a reason for hiding this comment

przemekwitek Jan 21, 2020

Choose a reason for hiding this comment

przemekwitek commented Jan 20, 2020 •

edited

Loading

przemekwitek commented Jan 21, 2020 •

edited

Loading