Skip to content

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Sep 27, 2018

Currently we use a small random sample of historical values to initialise the prediction residual model after we've detected new components of the time series decomposition. This sample is not large enough to be reliably representative of true variation we should expect and can occasionally lead to spurious anomalies immediately after a component is detected. As a result of the increased sensitivity related to #181 this has become more important.

We actually have a better sample available in the window of values we use to perform decomposition (albeit aggregated at a different time scale than the bucketing interval). This change switches to use these values instead to reinitialise the residual model.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a few minor comments.

scale(core_t::TTime time, double variance, double confidence, bool smooth = true) const;

//! Get the values in a recent time window if they are available.
virtual TTimeDoublePrVec windowValues(core_t::TTime time, bool forced = false) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document what forcing means here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection I think a slight refactor makes this easier to understand. I factored out the logic to test to see if the components might be added so this now always returns the decomposition window values. See this commit.

const std::string IS_NON_NEGATIVE_6_3_TAG{"b"};
const std::string IS_FORECASTABLE_6_3_TAG{"c"};
const std::string RNG_6_3_TAG{"d"};
//const std::string RNG_6_3_TAG{"d"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we usually just comment out removed tags? Might be better to include a message that includes the version of removal.

seed = CChecksum::calculate(seed, m_CurrentChangeInterval);
seed = CChecksum::calculate(seed, m_ChangeDetector);
seed = CChecksum::calculate(seed, m_RecentSamples);
seed = CChecksum::calculate(seed, m_AnomalyModel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's being calculated twice now in this function

@tveasey
Copy link
Contributor Author

tveasey commented Sep 27, 2018

@dimitris-athanasiou, I addressed your comments in my last commit. Can you take another look?

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM The refactoring made things clearer indeed. Left another comment for explaining a hard number but it's good to go.

bool CTimeSeriesDecompositionDetail::CPeriodicityTest::shouldTest(ETest test,
core_t::TTime time) const {
// We need to test more frequently than we compress because it
// only happens each 336 buckets and would significantly delay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this 336 coming from? It would be nice to explain that too in the comment.

Copy link
Contributor Author

@tveasey tveasey Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation in this commit.

@tveasey tveasey force-pushed the enhancement/new-component-initialisation branch 2 times, most recently from 25ee047 to 5cc6d6c Compare September 28, 2018 08:50
@tveasey tveasey force-pushed the enhancement/new-component-initialisation branch from 5cc6d6c to 7bc3e87 Compare September 28, 2018 09:32
@tveasey tveasey merged commit 3f7eb0c into elastic:master Sep 28, 2018
@tveasey tveasey changed the title [ML] Improve initialisation of the residual model after detecting new decomposition component [ML] Improve initialisation of the residual model after detecting new decomposition components Sep 28, 2018
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Sep 28, 2018
tveasey added a commit that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants