Skip to content

Conversation

benwtrent
Copy link
Member

This change adjust the memory estimation by taking the parameter training_percent into account.

It is an optional parameter that defaults to 100.0. The Java side will write out the parameter as it is input by the user.

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good Ben! Just a couple of points for consistency with repo style and I suggest asserting that increasing test percentage does decrease estimated memory usage.


=== Enhancements

* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning analysis types it applies to

Suggested change
* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].)
* Take `training_percent` into account when estimating memory usage for classification and regression. (See {ml-pull}1111[#1111].)


std::string m_DependentVariableFieldName;
std::string m_PredictionFieldName;
double m_trainingPercent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistency and naming conventions

Suggested change
double m_trainingPercent;
double m_TrainingPercent;


for (std::size_t test = 0; test < 3; ++test) {
TDoubleVecVec x(cols - 1);
std::size_t num_test_rows{((test + 1) * 100)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we stick with camel case

Suggested change
std::size_t num_test_rows{((test + 1) * 100)};
std::size_t numTestRows{((test + 1) * 100)};

LOG_DEBUG(<< "estimated memory usage = " << estimatedMemory);
LOG_DEBUG(<< "high water mark = " << instrumentation.maxMemoryUsage());

BOOST_TEST_REQUIRE(instrumentation.maxMemoryUsage() < estimatedMemory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 also let's assert that estimated memory decreases for each test since you're increasing test percentage.

@benwtrent benwtrent requested a review from tveasey April 2, 2020 11:38
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor point, but this looks like it is ready to go once you've resolved the merge conflict.

std::size_t rows{1000};
std::size_t cols{6};
std::size_t capacity{600};
std::int64_t previousEstimatedMemory{LLONG_MAX};
Copy link
Contributor

@tveasey tveasey Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is the C++ way of doing this

Suggested change
std::int64_t previousEstimatedMemory{LLONG_MAX};
std::int64_t previousEstimatedMemory{std::numeric_limits<std::int64_t>::max()};

@benwtrent benwtrent merged commit 45da194 into elastic:master Apr 2, 2020
@benwtrent benwtrent deleted the feature/ml-dfa-add-training-percent-param branch April 2, 2020 14:09
benwtrent added a commit that referenced this pull request Apr 2, 2020
#1116)

This change adjust the memory estimation by taking the parameter `training_percent` into account. 

It is an optional parameter that defaults to `100.0`. The Java side will write out the parameter as it is input by the user.
benwtrent added a commit to elastic/elasticsearch that referenced this pull request Apr 2, 2020
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations.

See native side pr: elastic/ml-cpp#1111
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Apr 2, 2020
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations.

See native side pr: elastic/ml-cpp#1111
benwtrent added a commit to elastic/elasticsearch that referenced this pull request Apr 2, 2020
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations.

See native side pr: elastic/ml-cpp#1111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants