-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] Improve initialisation of the residual model after detecting new decomposition components #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Improve initialisation of the residual model after detecting new decomposition components #218
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few minor comments.
scale(core_t::TTime time, double variance, double confidence, bool smooth = true) const; | ||
|
||
//! Get the values in a recent time window if they are available. | ||
virtual TTimeDoublePrVec windowValues(core_t::TTime time, bool forced = false) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you document what forcing means here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection I think a slight refactor makes this easier to understand. I factored out the logic to test to see if the components might be added so this now always returns the decomposition window values. See this commit.
lib/maths/CTimeSeriesModel.cc
Outdated
const std::string IS_NON_NEGATIVE_6_3_TAG{"b"}; | ||
const std::string IS_FORECASTABLE_6_3_TAG{"c"}; | ||
const std::string RNG_6_3_TAG{"d"}; | ||
//const std::string RNG_6_3_TAG{"d"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we usually just comment out removed tags? Might be better to include a message that includes the version of removal.
lib/maths/CTimeSeriesModel.cc
Outdated
seed = CChecksum::calculate(seed, m_CurrentChangeInterval); | ||
seed = CChecksum::calculate(seed, m_ChangeDetector); | ||
seed = CChecksum::calculate(seed, m_RecentSamples); | ||
seed = CChecksum::calculate(seed, m_AnomalyModel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's being calculated twice now in this function
@dimitris-athanasiou, I addressed your comments in my last commit. Can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM The refactoring made things clearer indeed. Left another comment for explaining a hard number but it's good to go.
bool CTimeSeriesDecompositionDetail::CPeriodicityTest::shouldTest(ETest test, | ||
core_t::TTime time) const { | ||
// We need to test more frequently than we compress because it | ||
// only happens each 336 buckets and would significantly delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this 336
coming from? It would be nice to explain that too in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an explanation in this commit.
25ee047
to
5cc6d6c
Compare
5cc6d6c
to
7bc3e87
Compare
… decomposition components (elastic#218)
Currently we use a small random sample of historical values to initialise the prediction residual model after we've detected new components of the time series decomposition. This sample is not large enough to be reliably representative of true variation we should expect and can occasionally lead to spurious anomalies immediately after a component is detected. As a result of the increased sensitivity related to #181 this has become more important.
We actually have a better sample available in the window of values we use to perform decomposition (albeit aggregated at a different time scale than the bucketing interval). This change switches to use these values instead to reinitialise the residual model.