-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fix for "No counts available" error message #2414
Conversation
When restarting a look-back job as a real-time one on occasion error messages similar to '[CBucketGatherer.cc@464] No counts available at 1585698300, current bucket = [1585699200,1585700100)' are seen. Investigation reveals that these error messages stem from the situation where an incomplete initial bucket for a partition was persisted at the close of the look-back job. When the job is re-opened as real-time the start time of the anomaly detector in question is still before that of the associated bucket gatherer, triggering the error. This PR adds a check to determine if an incomplete initial bucket has been restored from persisted state, and if so to skip any attempt to output results for that bucket - which is essentially what would have happened prior to this change except without logging the error. Fixes elastic#2411
@@ -496,8 +504,14 @@ class API_EXPORT CAnomalyJob : public CDataProcessor { | |||
//! Flag indicating whether or not time has been advanced. | |||
bool m_TimeAdvanced{false}; | |||
|
|||
//! The initial value of the end time of the last bucket | |||
//! out of latency window we've seen | |||
core_t::TTime m_InitialLastFinalisedBucketEndTime{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand the comment to say how this works for jobs that ran successfully for many buckets before being persisted by a version earlier than 8.6.
As far as I can see in this scenario this variable stays as 0 forever. Is that right? I think that’s OK because we don’t need the functionality for a job that ran successfully for many buckets. But even if I am right it’s really important to document that this variable cannot be assumed to be non-zero and must not be used for any other purpose because of this.
Or if I am wrong about it staying zero forever for previously successful jobs, how does it work?
Expanding on documentation of m_InitialLastFinalisedBucketEndTime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When restarting a look-back job as a real-time one on occasion error messages similar to
are seen.
Investigation reveals that these error messages stem from the situation where an incomplete initial bucket for a partition was persisted at the close of the look-back job. When the job is re-opened as real-time the start time of the anomaly detector in question is still before that of the associated bucket gatherer, triggering the error.
This PR adds a check to determine if an incomplete initial bucket has been restored from persisted state, and if so to skip any attempt to output results for that bucket (which is essentially what would have happened prior to this change except without logging the error).
Fixes #2411