Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Improve autodetect logic for persistence #437

Merged

Conversation

edsavage
Copy link
Contributor

@edsavage edsavage commented Mar 8, 2019

Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes #393

Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
@droberts195 droberts195 removed the v6.7.1 label Mar 8, 2019
if (m_NumRecordsHandled == 0) {
LOG_DEBUG(<< "Zero records were handled - will not attempt to persist "
<< description << ".");
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a categorizer chained to an anomaly detector this might not be correct - it's possible that time has been advanced on the anomaly detector.

So this method needs a // Pass on the request in case we're chained step like some of the other methods in this class have. If the chained processor's isPersistenceNeeded() returns true then this method should return true, otherwise it should make its own decision using the code that's there now.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I saw one nit, but I'm happy to merge as-is if you want to move onto something else.

@@ -336,6 +336,11 @@ bool CFieldDataTyper::persistState(core::CDataAdder& persister) {
}

bool CFieldDataTyper::isPersistenceNeeded(const std::string& description) const {
// Pass on the request in case we're chained
if (m_OutputHandler.isPersistenceNeeded(description) == true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: although we prefer == false to !, I think == true is unnecessary for methods that clearly return a bool

@edsavage edsavage merged commit 7c13681 into elastic:master Mar 18, 2019
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

closes elastic#393
edsavage added a commit that referenced this pull request Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

Backport #437
edsavage added a commit that referenced this pull request Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

Backports #437
edsavage added a commit that referenced this pull request Mar 18, 2019
Changed the logic surrounding persistence of both state and quantiles on
graceful shutdown so that persistence only occurs if and only if at
least one input record has been processed or time has been advanced.

Backports #437
@edsavage edsavage deleted the autodetect_quantile_state_graceful_close branch March 18, 2019 16:57
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in elastic#437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Fixes elastic#394
droberts195 added a commit that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in #437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Fixes #394
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in elastic#437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Backport of elastic#512
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in elastic#437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Backport of elastic#512
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in elastic#437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Backport of elastic#512
droberts195 added a commit that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in #437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Backport of #512
droberts195 added a commit that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in #437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Backport of #512
droberts195 added a commit that referenced this pull request Jun 27, 2019
Similar to the fix for state and quantiles in #437,
if no input is received and time is not advanced then
there is no need to write model size stats when the
autodetect process exits. Doing this can actually
cause a problem for a job that has never ever seen
any input, as the unnecessary model size stats were
written with a negative timestamp. This change also
adds an extra defensive check to prevent that ever
happening, although the only situation when it is
thought to be possible should be prevented by the
first change.

Backport of #512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Flawed logic for when autodetect writes state/quantiles on graceful close
2 participants