[ML] take training_percent into account when estimating memory #1111

benwtrent · 2020-04-01T17:49:28Z

This change adjust the memory estimation by taking the parameter training_percent into account.

It is an optional parameter that defaults to 100.0. The Java side will write out the parameter as it is input by the user.

tveasey

Overall looks good Ben! Just a couple of points for consistency with repo style and I suggest asserting that increasing test percentage does decrease estimated memory usage.

tveasey · 2020-04-02T08:34:41Z

docs/CHANGELOG.asciidoc

+
+=== Enhancements
+
+* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].)


Maybe worth mentioning analysis types it applies to

Suggested change

* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].)

* Take `training_percent` into account when estimating memory usage for classification and regression. (See {ml-pull}1111[#1111].)

tveasey · 2020-04-02T08:36:51Z

include/api/CDataFrameTrainBoostedTreeRunner.h


    std::string m_DependentVariableFieldName;
    std::string m_PredictionFieldName;
+    double m_trainingPercent;


nit: consistency and naming conventions

Suggested change

double m_trainingPercent;

double m_TrainingPercent;

tveasey · 2020-04-02T08:39:02Z

lib/maths/unittest/CBoostedTreeTest.cc

+
+    for (std::size_t test = 0; test < 3; ++test) {
+        TDoubleVecVec x(cols - 1);
+        std::size_t num_test_rows{((test + 1) * 100)};


nit: can we stick with camel case

Suggested change

std::size_t num_test_rows{((test + 1) * 100)};

std::size_t numTestRows{((test + 1) * 100)};

tveasey · 2020-04-02T08:42:27Z

lib/maths/unittest/CBoostedTreeTest.cc

+        LOG_DEBUG(<< "estimated memory usage = " << estimatedMemory);
+        LOG_DEBUG(<< "high water mark = " << instrumentation.maxMemoryUsage());
+
+        BOOST_TEST_REQUIRE(instrumentation.maxMemoryUsage() < estimatedMemory);


👍 also let's assert that estimated memory decreases for each test since you're increasing test percentage.

tveasey

One minor point, but this looks like it is ready to go once you've resolved the merge conflict.

tveasey · 2020-04-02T11:55:07Z

lib/maths/unittest/CBoostedTreeTest.cc

+    std::size_t rows{1000};
+    std::size_t cols{6};
+    std::size_t capacity{600};
+    std::int64_t previousEstimatedMemory{LLONG_MAX};


nit: is the C++ way of doing this

Suggested change

std::int64_t previousEstimatedMemory{LLONG_MAX};

std::int64_t previousEstimatedMemory{std::numeric_limits<std::int64_t>::max()};

…d-training-percent-param

#1116) This change adjust the memory estimation by taking the parameter `training_percent` into account. It is an optional parameter that defaults to `100.0`. The Java side will write out the parameter as it is input by the user.

This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations. See native side pr: elastic/ml-cpp#1111

Add training_percent parameter and use it for memory estimation

94de7e5

benwtrent added >enhancement :ml v8.0.0 v7.8.0 labels Apr 1, 2020

adding change log entry

95f9179

benwtrent mentioned this pull request Apr 1, 2020

[ML] add training_percent to analytics process params elastic/elasticsearch#54605

Merged

tveasey reviewed Apr 2, 2020

View reviewed changes

addressing pr comments

83f1e94

benwtrent requested a review from tveasey April 2, 2020 11:38

tveasey approved these changes Apr 2, 2020

View reviewed changes

benwtrent added 2 commits April 2, 2020 08:22

addressing pr comments

393c344

Merge remote-tracking branch 'upstream/master' into feature/ml-dfa-ad…

3894840

…d-training-percent-param

benwtrent merged commit 45da194 into elastic:master Apr 2, 2020

benwtrent deleted the feature/ml-dfa-add-training-percent-param branch April 2, 2020 14:09

benwtrent mentioned this pull request Apr 2, 2020

[7.x] [ML] take training_percent into account when estimating memory (#1111) #1116

Merged

benwtrent mentioned this pull request Apr 2, 2020

[7.x][ML] add training_percent to analytics process params (#54605) elastic/elasticsearch#54678

Merged

droberts195 mentioned this pull request Apr 7, 2020

[ML] Memory estimates way too high for very simple analyses #1106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] take training_percent into account when estimating memory #1111

[ML] take training_percent into account when estimating memory #1111

Uh oh!

benwtrent commented Apr 1, 2020

Uh oh!

tveasey left a comment

Uh oh!

tveasey Apr 2, 2020

Uh oh!

tveasey Apr 2, 2020

Uh oh!

tveasey Apr 2, 2020

Uh oh!

tveasey Apr 2, 2020

Uh oh!

tveasey left a comment

Uh oh!

tveasey Apr 2, 2020 •

edited

Loading

Uh oh!

Uh oh!


		=== Enhancements

		* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].)

	std::size_t num_test_rows{((test + 1) * 100)};
	std::size_t numTestRows{((test + 1) * 100)};

	std::int64_t previousEstimatedMemory{LLONG_MAX};
	std::int64_t previousEstimatedMemory{std::numeric_limits<std::int64_t>::max()};

[ML] take training_percent into account when estimating memory #1111

[ML] take training_percent into account when estimating memory #1111

Uh oh!

Conversation

benwtrent commented Apr 1, 2020

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

tveasey Apr 2, 2020

Choose a reason for hiding this comment

Uh oh!

tveasey Apr 2, 2020

Choose a reason for hiding this comment

Uh oh!

tveasey Apr 2, 2020

Choose a reason for hiding this comment

Uh oh!

tveasey Apr 2, 2020

Choose a reason for hiding this comment

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

tveasey Apr 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tveasey Apr 2, 2020 •

edited

Loading