-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] Address the root cause for "actual equals typical equals zero" anomalies #2270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The overall approach looks good. I don't know the code well enough to say whether the truncation has been applied everywhere where it needs to be. Probably the biggest chance of a mistake in this PR is a line that hasn't been changed rather than one that has, but at least that would be no worse than now.
…lly calling the wrong implementation
…nomalies (#2270) We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the modelling when we stop receiving data for a partition, which was an edge case when this could occur. In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when we report the typical value. However, up until now we haven't truncated when computing probabilities. This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non- zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
…nomalies (#2270) We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the modelling when we stop receiving data for a partition, which was an edge case when this could occur. In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when we report the typical value. However, up until now we haven't truncated when computing probabilities. This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non- zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
…nomalies (#2270) We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the modelling when we stop receiving data for a partition, which was an edge case when this could occur. In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when we report the typical value. However, up until now we haven't truncated when computing probabilities. This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non- zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
…nomalies (#2270) We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the modelling when we stop receiving data for a partition, which was an edge case when this could occur. In all problematic cases, the underlying reason is the if we know the data are non-negative we truncate the prediction when we report the typical value. However, up until now we haven't truncated when computing probabilities. This changes switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which would condition the predicted distribution to be non-negative.) This should be good enough to ensure we never create non- zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.
We've had multiple issues reported in the past where we report high anomaly scores when the actual is the same as the typical and both are equal to zero. This has been exacerbated in the past because we've also had some instabilities in the modelling when we stop receiving data for a partition, which was an edge case when this could occur.
In all problematic cases, the underlying reason is that if we know the data are non-negative we truncate the prediction when we report the typical value. However, up until now we haven't truncated when computing probabilities.
This change switches to always truncating our prediction. (This is a halfway house to a more rigorous approach which would condition that the predicted distribution is non-negative.) This should be good enough to ensure we never create non-zero anomaly scores for cases where actual equals typical equals zero except when the result is a multi-bucket anomaly.