Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix for 'No statistics' error message #2410

Merged
merged 2 commits into from
Sep 30, 2022

Conversation

edsavage
Copy link
Contributor

@edsavage edsavage commented Sep 30, 2022

An anomaly detector job ignores records that fall in the initial, incomplete bucket. However, a model is created with a start time of -1 in the expectation that more records will be received that fall into subsequent buckets. In the case that no more records are received the model continues to exist with start time of -1.

In the scenario that a job is created initially as lookback and has such models with start time of -1 in existence at the point the job closes, and then is re-opened as a realtime job at some point in the future, any attempt to sample the model will fail with the 'No statistics at ...' error message due to the start time (-1) falling outside the current bucket.

This change always updates the model's current bucket stats start time to that of the current record, even if the record data is not to be added to the associated data gatherer.

An anomaly detector job ignores records that fall in the initial, incomplete bucket. However, a model is created with a start time of -1 in the expectation that more records will be received that fall into subsequent buckets. In the case that no more records are received the model continues to exist with start time of -1.

In the scenario that a job is created initially as lookback and has such models with start time of -1 in existence at the point the job closes, and then is re-opened as a realtime job at some point in the future, any attempt to sample the model will fail with the 'No statistics at ...' error message due to the start time (-1) falling outside the current bucket.

This change always updates the model's start time to that of the current record, even if the record data is not to be added to the associated data gatherer.
@droberts195
Copy link
Contributor

This change always updates the model's start time to that of the current record, even if the record data is not to be added to the associated data gatherer.

Shouldn't it be the model's current bucket stats start time?

@edsavage
Copy link
Contributor Author

Yes, I'll update the description.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Since Tom's off I'll approve this so we can get it merged and monitor for any unforeseen side effects over the longest possible period before release. Please ask Tom to review once he's back and if he has any suggestions they can go into a followup PR.

@edsavage edsavage merged commit 2b0056c into elastic:main Sep 30, 2022
@davidkyle
Copy link
Member

I am seeing this error in serverless logs

[gallery_sum_bytes] [autodetect/179] [CIndividualModel.cc@136] No statistics at 1719197550 for sum bytes partition=clientip, current bucket = [-1,899), partitionFieldValue = 175.144.21.153, personName = | repeated [5]

Looks very siimilar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants