-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] take training_percent into account when estimating memory #1111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] take training_percent into account when estimating memory #1111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good Ben! Just a couple of points for consistency with repo style and I suggest asserting that increasing test percentage does decrease estimated memory usage.
docs/CHANGELOG.asciidoc
Outdated
|
||
=== Enhancements | ||
|
||
* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth mentioning analysis types it applies to
* Take `training_percent` into account when estimating memory usage. (See {ml-pull}1111[1111].) | |
* Take `training_percent` into account when estimating memory usage for classification and regression. (See {ml-pull}1111[#1111].) |
|
||
std::string m_DependentVariableFieldName; | ||
std::string m_PredictionFieldName; | ||
double m_trainingPercent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consistency and naming conventions
double m_trainingPercent; | |
double m_TrainingPercent; |
|
||
for (std::size_t test = 0; test < 3; ++test) { | ||
TDoubleVecVec x(cols - 1); | ||
std::size_t num_test_rows{((test + 1) * 100)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we stick with camel case
std::size_t num_test_rows{((test + 1) * 100)}; | |
std::size_t numTestRows{((test + 1) * 100)}; |
LOG_DEBUG(<< "estimated memory usage = " << estimatedMemory); | ||
LOG_DEBUG(<< "high water mark = " << instrumentation.maxMemoryUsage()); | ||
|
||
BOOST_TEST_REQUIRE(instrumentation.maxMemoryUsage() < estimatedMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 also let's assert that estimated memory decreases for each test since you're increasing test percentage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor point, but this looks like it is ready to go once you've resolved the merge conflict.
std::size_t rows{1000}; | ||
std::size_t cols{6}; | ||
std::size_t capacity{600}; | ||
std::int64_t previousEstimatedMemory{LLONG_MAX}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is the C++ way of doing this
std::int64_t previousEstimatedMemory{LLONG_MAX}; | |
std::int64_t previousEstimatedMemory{std::numeric_limits<std::int64_t>::max()}; |
…d-training-percent-param
#1116) This change adjust the memory estimation by taking the parameter `training_percent` into account. It is an optional parameter that defaults to `100.0`. The Java side will write out the parameter as it is input by the user.
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations. See native side pr: elastic/ml-cpp#1111
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations. See native side pr: elastic/ml-cpp#1111
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations. See native side pr: elastic/ml-cpp#1111
This change adjust the memory estimation by taking the parameter
training_percent
into account.It is an optional parameter that defaults to
100.0
. The Java side will write out the parameter as it is input by the user.