Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix sparse data edge cases for periodicity testing #28

Merged
merged 2 commits into from Mar 28, 2018
Merged

[ML] Fix sparse data edge cases for periodicity testing #28

merged 2 commits into from Mar 28, 2018

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Mar 27, 2018

Description
This fixes all remaining errors reported in #20. These were down to some more untrapped calls on inputs to the functions, varianceAtPercentile and autocorrelationAtPercentile, which generate the errors. Digging into the root cause, they were all down to very sparse data over the window we maintain to test for periodicity. This in turn showed up the need to, in addition to handling the error cases, lower bound the count of buckets with periodic repeats when testing for periodic partitions.

Effects
This change avoids some false positives in the periodicity test and so can change results. This only affects sparse data modelling, the effects are small and the chance of related false positives in the test is low. For example, on the "gallery" data set I get very slightly different job level anomalies:
screen shot 2018-03-27 at 19 34 19
The top 25 anomalies are the same in both jobs, albeit the orders are slightly different.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a suggestion.

// We need to observe a minimum number of repeated values to test with
// an acceptable false positive rate.
if (this->periodicallyRepeatedBuckets(buckets, period)
< static_cast<double>(period) * ACCURATE_TEST_POPULATED_FRACTION / 3.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts to feel like the whole id condition could be a function. It'd be nice to encapsulate that check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I think that is better.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey merged commit 09f401c into elastic:master Mar 28, 2018
tveasey added a commit that referenced this pull request Mar 28, 2018
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants