Skip to content

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented May 16, 2022

We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the modelling when we stop receiving data for a partition, which was an edge case when this could occur.

In all problematic cases, the underlying reason is that if we know the data are non-negative we truncate the prediction when we report the typical value. However, up until now we haven't truncated when computing probabilities.

This change switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which would condition that the predicted distribution is non-negative.) This should be good enough to ensure we never create non-zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The overall approach looks good. I don't know the code well enough to say whether the truncation has been applied everywhere where it needs to be. Probably the biggest chance of a mistake in this PR is a line that hasn't been changed rather than one that has, but at least that would be no worse than now.

@tveasey tveasey merged commit d51b461 into elastic:main May 19, 2022
@tveasey tveasey deleted the non-negative branch May 19, 2022 12:51
tveasey added a commit that referenced this pull request May 24, 2022
…og error (#2281)

#2270 showed up an edge case which caused us to start generating error messages in one of our QA tests. We simply
need to exit early in this case.
tveasey added a commit that referenced this pull request Jul 27, 2023
…nomalies (#2270)

We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the
typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the
modelling when we stop receiving data for a partition, which was an edge case when this could occur.

In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when
we report the typical value. However, up until now we haven't truncated when computing probabilities.

This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which
would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non-
zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
tveasey added a commit that referenced this pull request Jul 27, 2023
…nomalies (#2270)

We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the
typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the
modelling when we stop receiving data for a partition, which was an edge case when this could occur.

In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when
we report the typical value. However, up until now we haven't truncated when computing probabilities.

This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which
would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non-
zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
tveasey added a commit that referenced this pull request Jul 27, 2023
…nomalies (#2270)

We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the
typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the
modelling when we stop receiving data for a partition, which was an edge case when this could occur.

In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when
we report the typical value. However, up until now we haven't truncated when computing probabilities.

This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which
would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non-
zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
tveasey added a commit that referenced this pull request Jul 27, 2023
…nomalies (#2270)

We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the
typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the
modelling when we stop receiving data for a partition, which was an edge case when this could occur.

In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when
we report the typical value. However, up until now we haven't truncated when computing probabilities.

This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which
would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non-
zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants