Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix for "No counts available" error message #2414

Merged
merged 5 commits into from
Nov 1, 2022

Conversation

edsavage
Copy link
Contributor

When restarting a look-back job as a real-time one on occasion error messages similar to

[CBucketGatherer.cc@464] No counts available at 1585698300, current bucket = [1585699200,1585700100)

are seen.

Investigation reveals that these error messages stem from the situation where an incomplete initial bucket for a partition was persisted at the close of the look-back job. When the job is re-opened as real-time the start time of the anomaly detector in question is still before that of the associated bucket gatherer, triggering the error.

This PR adds a check to determine if an incomplete initial bucket has been restored from persisted state, and if so to skip any attempt to output results for that bucket (which is essentially what would have happened prior to this change except without logging the error).

Fixes #2411

When restarting a look-back job as a real-time one on occasion error
messages similar to
'[CBucketGatherer.cc@464] No counts available at 1585698300, current bucket = [1585699200,1585700100)'
are seen.

Investigation reveals that these error messages stem from the situation
where an incomplete initial bucket for a partition was persisted at the
close of the look-back job. When the job is re-opened as real-time the
start time of the anomaly detector in question is still before that of
the associated bucket gatherer, triggering the error.

This PR adds a check to determine if an incomplete initial bucket has
been restored from persisted state, and if so to skip any attempt to
output results for that bucket - which is essentially what would have
happened prior to this change except without logging the error.

Fixes elastic#2411
@@ -496,8 +504,14 @@ class API_EXPORT CAnomalyJob : public CDataProcessor {
//! Flag indicating whether or not time has been advanced.
bool m_TimeAdvanced{false};

//! The initial value of the end time of the last bucket
//! out of latency window we've seen
core_t::TTime m_InitialLastFinalisedBucketEndTime{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand the comment to say how this works for jobs that ran successfully for many buckets before being persisted by a version earlier than 8.6.

As far as I can see in this scenario this variable stays as 0 forever. Is that right? I think that’s OK because we don’t need the functionality for a job that ran successfully for many buckets. But even if I am right it’s really important to document that this variable cannot be assumed to be non-zero and must not be used for any other purpose because of this.

Or if I am wrong about it staying zero forever for previously successful jobs, how does it work?

Expanding on documentation of m_InitialLastFinalisedBucketEndTime.
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsavage edsavage merged commit baff94f into elastic:main Nov 1, 2022
@edsavage edsavage deleted the no_counts_available branch November 1, 2022 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Error message 'No counts available' seen when re-opening lookback job as realtime
3 participants