Skip to content

ElE-1668: Docs#1169

Merged
dapollak merged 10 commits into
masterfrom
ele-1668-improve-accuracy-by-combining-z-score-with-another-mechanism-docs
Oct 2, 2023
Merged

ElE-1668: Docs#1169
dapollak merged 10 commits into
masterfrom
ele-1668-improve-accuracy-by-combining-z-score-with-another-mechanism-docs

Conversation

@dapollak
Copy link
Copy Markdown
Contributor

No description provided.

@linear
Copy link
Copy Markdown

linear Bot commented Sep 19, 2023

ELE-1668 Improve accuracy by combining z-score with another mechanism

Describe the problem
Using only z-score to detect anomalies can be problematic sometimes, depending on the training set. Here are some examples:

  • When the training set is too static, the stdev is very small, causing even a slight change to be considered an anomaly even though it probably isn't. if the row count in a table is [100000, 99999, 100000, 100001, 100000, 99998] then a row count of 99995 will have a z-score of -4 and create a false positive
  • When the training set is too dynamic, we can miss out on important drops in volume. for example in a row count of [500, 1000, 1500, 200, 2000], a row count of 0 will have a zscore of -1.4 and will not be considered an anomaly

Describe the solution
There are several ways to improve accuracy, most of which include combining the z-score with a second method and only alert if both methods are showing an anomaly.

We discussed several methods and came to the conclusion using percentiles from a baseline is good because it's intuitive for our users to understand, and we can set the defaults according to what we believe but they can me modified if needed.

Applying percentile threshold will include:

  1. calculate the baseline that will be the 100% (the average value in the training set?)
  2. add threshold params like spike_percentile_threshold: 150, drop_percentile_threshold: 20 and figure out the right defaults
  3. only fail the anomaly test if the zscore is higher than the sensitivity AND the percentile is above / below the threshold, accordingly
  4. update the docs
  5. update the UI to show the thresholds in the report
  6. update the alert content, test description in the alert and report etc

Describe the tests of the solution
recreate datasets as in the above example, create tests that fail without the threshold and make sure they do not fail when adding them

Describe the documentation
we might want to add the examples above to the docs to explain why using threshold is necessary

Additional context
graph from here

slack thread here

@github-actions
Copy link
Copy Markdown
Contributor

👋 @dapollak
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

Comment thread docs/guides/anomaly-detection-configuration/ignore_small_changes.mdx Outdated
Comment thread docs/guides/anomaly-detection-configuration/fail_on_zero.mdx Outdated
Comment thread docs/guides/anomaly-detection-configuration/fail_on_zero.mdx
Comment thread docs/guides/anomaly-detection-configuration/fail_on_zero.mdx
@dapollak dapollak requested a review from ellakz September 26, 2023 09:43
@dapollak dapollak enabled auto-merge October 2, 2023 13:22
@dapollak dapollak disabled auto-merge October 2, 2023 13:23
@dapollak dapollak enabled auto-merge October 2, 2023 13:28
@dapollak dapollak merged commit 108e592 into master Oct 2, 2023
@dapollak dapollak deleted the ele-1668-improve-accuracy-by-combining-z-score-with-another-mechanism-docs branch October 2, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants