New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fix sparse data edge cases for periodicity testing #28
[ML] Fix sparse data edge cases for periodicity testing #28
Conversation
…rrelations when testing for periodicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left a suggestion.
// We need to observe a minimum number of repeated values to test with | ||
// an acceptable false positive rate. | ||
if (this->periodicallyRepeatedBuckets(buckets, period) | ||
< static_cast<double>(period) * ACCURATE_TEST_POPULATED_FRACTION / 3.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It starts to feel like the whole id
condition could be a function. It'd be nice to encapsulate that check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, I think that is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the window we maintain to test for periodicity. This showed up the need to lower bound the count of buckets with periodic repeats when testing for periodic partitions.
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the window we maintain to test for periodicity. This showed up the need to lower bound the count of buckets with periodic repeats when testing for periodic partitions.
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the window we maintain to test for periodicity. This showed up the need to lower bound the count of buckets with periodic repeats when testing for periodic partitions.
Description
This fixes all remaining errors reported in #20. These were down to some more untrapped calls on inputs to the functions,
varianceAtPercentile
andautocorrelationAtPercentile
, which generate the errors. Digging into the root cause, they were all down to very sparse data over the window we maintain to test for periodicity. This in turn showed up the need to, in addition to handling the error cases, lower bound the count of buckets with periodic repeats when testing for periodic partitions.Effects
This change avoids some false positives in the periodicity test and so can change results. This only affects sparse data modelling, the effects are small and the chance of related false positives in the test is low. For example, on the "gallery" data set I get very slightly different job level anomalies:
The top 25 anomalies are the same in both jobs, albeit the orders are slightly different.