Emit predicted category using an appropriate JSON type. #877

przemekwitek · 2019-12-05T09:09:27Z

Currently, classification analysis allows dependent variable of integer or boolean type but in the results field, the prediction field is always emitted as JSON string (so true becomes "true", 1 becomes "1" etc.).

A solution to that problem is to pass desired prediction_field_type from Java to C++ and make C++ emit bool, int or string JSON field depending on the prediction_field_type passed.
This PR implements the C++ part of this solution.

Relates elastic/elasticsearch#49796

droberts195

If you'd rather use standard library functions to do the string to number conversions instead of my core::CStringUtils suggestions then that's fine, but please make sure exceptions won't fail the entire analysis, all possible values a 64 bit signed integer can hold are covered on all platforms and we log when there are unexpected conversion errors.

lib/api/CDataFrameTrainBoostedTreeClassifierRunner.cc

tveasey

LGTM (modulo adhering to naming conventions).

lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc

… the field is really used in C++ code

przemekwitek

If you'd rather use standard library functions to do the string to number conversions instead of my core::CStringUtils suggestions then that's fine, but please make sure exceptions won't fail the entire analysis, all possible values a 64 bit signed integer can hold are covered on all platforms and we log when there are unexpected conversion errors.

You might think that problem 1 can be solved by using stol instead of stoi, but this is not the case, because on Windows a Java long is 64 bits but a C++ long is 32 bits. The C++ type that reliably corresponds to Java's long is int64_t.

Sounds like you've gone through this kind of issues before. Thanks for sharing. I'll gladly use the library method for doing conversions.

Additionally, I've renamed dependent_variable_type to prediction_field_type as that's how the field is really used in the code

lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc

lib/api/CDataFrameTrainBoostedTreeClassifierRunner.cc

droberts195

LGTM

przemekwitek · 2019-12-05T13:56:58Z

run elasticsearch-ci

droberts195 · 2019-12-05T15:20:36Z

@przemekwitek to get the tests to pass you need to run clang-format:

A format error has been detected within the following files:
lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc

przemekwitek · 2019-12-06T07:30:17Z

A format error has been detected

Thanks for the hint. I'm wondering if it was possible to make this message stand out more in the console output. I was looking for the failure reason yesterday before you hint but couldn't find it.

droberts195 · 2019-12-06T10:10:57Z

I'm wondering if it was possible to make this message stand out more in the console output.

It gets printed from here:

ml-cpp/dev-tools/check-style.sh

Lines 68 to 72 in 881e5d0

    
           if [ -n "${WRONG_FORMAT_FILES}" ] ; then 
        
               echo "A format error has been detected within the following files:" 
        
               printf "%s\n" "${WRONG_FORMAT_FILES[@]}" 
        
               RC=4 
        
           else

You are welcome to open a PR to change that so that it stands out more. If you want to use something like escape sequences to change colours you can use the PR build to check the escape sequences are interpreted correctly by Jenkins - make a PR that messes up the formatting in a source file and modifies check-style.sh, check the PR build log, iterate on check-style.sh, when it's working as desired correct the formatting of the chosen source file leaving just the change to check-style.sh in the PR.

przemekwitek added >enhancement WIP :ml v8.0.0 v7.6.0 labels Dec 5, 2019

Emit predicted category using an appropriate JSON type.

1eb3b44

przemekwitek force-pushed the prediction_field_type branch from 0825066 to 1eb3b44 Compare December 5, 2019 09:42

droberts195 reviewed Dec 5, 2019

View reviewed changes

lib/api/CDataFrameTrainBoostedTreeClassifierRunner.cc Outdated Show resolved Hide resolved

lib/api/CDataFrameTrainBoostedTreeClassifierRunner.cc Outdated Show resolved Hide resolved

lib/api/CDataFrameTrainBoostedTreeClassifierRunner.cc Outdated Show resolved Hide resolved

tveasey approved these changes Dec 5, 2019

View reviewed changes

lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc Outdated Show resolved Hide resolved

przemekwitek removed the WIP label Dec 5, 2019

przemekwitek marked this pull request as ready for review December 5, 2019 10:32

przemekwitek added 2 commits December 5, 2019 11:59

Apply review comments

8acd4c3

Rename dependent_variable_type to prediction_field_type as that's how…

2d085dc

… the field is really used in C++ code

przemekwitek commented Dec 5, 2019

View reviewed changes

przemekwitek added 3 commits December 5, 2019 13:33

Make m_PredictionFieldType field of an enum type

becdf58

Update changelog

1a09305

Apply review comment

8f409e2

droberts195 approved these changes Dec 5, 2019

View reviewed changes

przemekwitek mentioned this pull request Dec 5, 2019

Pass prediction_field_type to C++ analytics process elastic/elasticsearch#49861

Merged

Apply clang-format

d983afb

przemekwitek merged commit 881e5d0 into elastic:master Dec 6, 2019

przemekwitek deleted the prediction_field_type branch December 6, 2019 10:05

przemekwitek mentioned this pull request Dec 6, 2019

[7.x] Emit predicted category using an appropriate JSON type. (#877) #878

Merged

przemekwitek added a commit to przemekwitek/ml-cpp that referenced this pull request Dec 6, 2019

Emit predicted category using an appropriate JSON type. (elastic#877)

5dc608d

przemekwitek added a commit that referenced this pull request Dec 6, 2019

Emit predicted category using an appropriate JSON type. (#877) (#878)

742a711

alvarezmelissa87 mentioned this pull request Dec 10, 2019

[ML] DF Analytics: create classification jobs results view elastic/kibana#52584

Merged

4 tasks

alvarezmelissa87 mentioned this pull request Dec 11, 2019

[ML] Meta - Classification UI elastic/kibana#51310

Closed

22 tasks

alvarezmelissa87 mentioned this pull request Dec 19, 2019

[ML] DF Analytics Classification: ensure confusion matrix can be fetched elastic/kibana#53629

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit predicted category using an appropriate JSON type. #877

Emit predicted category using an appropriate JSON type. #877

przemekwitek commented Dec 5, 2019 •

edited

droberts195 left a comment

tveasey left a comment

przemekwitek left a comment

droberts195 left a comment

przemekwitek commented Dec 5, 2019

droberts195 commented Dec 5, 2019

przemekwitek commented Dec 6, 2019

droberts195 commented Dec 6, 2019

Emit predicted category using an appropriate JSON type. #877

Emit predicted category using an appropriate JSON type. #877

Conversation

przemekwitek commented Dec 5, 2019 • edited

droberts195 left a comment

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

przemekwitek left a comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

przemekwitek commented Dec 5, 2019

droberts195 commented Dec 5, 2019

przemekwitek commented Dec 6, 2019

droberts195 commented Dec 6, 2019

przemekwitek commented Dec 5, 2019 •

edited