[ML] Improve adaption of the modelling of cyclic components to very localised features #134

tveasey · 2018-06-28T15:35:53Z

This change primarily targets fixing an edge case where we can fail to learn periodic spikes and so generate repeated anomalies for predictable behaviour. Whilst investigating resulting changes in the unit tests I made two further minor enhancements: 1) an improvement to the trend component confidence interval calculation for forecasting, 2) an improvement to avoid a source of undesirable step changes in our predictions. I decided not to factor these out into separate commits because they are small and in a related area. Finally, this change also pushes up common interface from the derived classes to maths::CAdaptiveBucketing, since there was enough that I think it makes sense to inherit publicly from this class instead.

By way of a little extra background on the motivating problem and fix, we currently use a family of regression models, parameterised by offset in the period, to model cyclic components (calendar and seasonal). We optimise the "placement" of these models by minimising an estimate of the mean error introduced by interpolating them. Over time this learns an effective placement based on significant features, such as spikes, steps, etc. However, at short bucket lengths there is an edge case due to the interaction with the approach we use to robustify our predictions. We can effectively miss important features and as a result generate repeated false positives. For example:

The main part of this change is to maintain additional information to detect this problem (by looking for a statistically significant non-uniform distribution of large errors in the period) and adding additional models where necessary. The result on the same data set is, with this change:

…based on large error statistics

…-components

…t it's safer)

hendrikmuhs · 2018-06-29T11:24:28Z

lib/maths/CAdaptiveBucketing.cc

+}
+
+CAdaptiveBucketing::CAdaptiveBucketing(double decayRate, double minimumBucketLength)
+    : m_DecayRate{std::max(decayRate, MINIMUM_DECAY_RATE)}, m_MinimumBucketLength{minimumBucketLength} {


some of the members are not initialized, is that ok?

I think they are no? The only non-class types are the literals m_TargetSize, m_LastLargeErrorBucket and m_LastLargeErrorPeriod which all have default member initialisers. Everything else will be default constructed.

hendrikmuhs · 2018-06-29T11:38:49Z

lib/maths/CAdaptiveBucketing.cc

+                    CTools::safeCdfComplement(binomial, m_LargeErrorCounts[i - 1])};
+                m_LargeErrorCountSignificances.add({oneMinusCdf, i - 1});
+            } catch (const std::exception& e) {
+                LOG_ERROR(<< "Failed to calculate splitting significance: " << e.what());


in case we hit this, does it make sense to continue the for-loop?

It's somewhat moot, i.e. this really shouldn't happen, but I think it is ok to continue: we'll find the highest significance for which there was no error.

hendrikmuhs · 2018-06-29T12:53:05Z

lib/maths/CCalendarComponentAdaptiveBucketing.cc

        count += CBasicStatistics::count(value);
    }
-    count /= (endpoints[m] - endpoints[0]);
+    count /= (oldEndpoints[m] - oldEndpoints[0]);


any possibility this can become 0?

In practice no, this is full period being modelled which in this case is always 1 day in seconds. We also rely on the fact that we force the individual endpoints to be at least the data bucketing interval apart.

Arguably, we could make all this code safe, i.e. if things went wrong we'd end up in 0/0 situations which we'd want to treat as one. I think on balance though I'd prefer to keep the code simpler and simply assume this invariant holds.

hendrikmuhs

LGTM

…ocalised features (elastic#134)

…ery localised features (#138) Backport #134.

tveasey added 10 commits June 8, 2018 14:07

Towards automatically splitting and merging cyclic component buckets …

30fa768

…based on large error statistics

More work using large error statistics in adaptive bucketing

5d3157d

Merge branch 'master' into enhancement/trend-model-split-merge-cyclic…

696c1d2

…-components

Fixes

12023ef

Tidy ups

b8791fb

Merge branch 'master' into enhancement/trend-model-split-merge-cyclic…

42b6554

…-components

Fix remaining unit tests

a70b6ce

Fix formatting

022328a

Explaining comments

e0b2004

Better comment and use regression count (although should be equivalen…

94467bb

…t it's safer)

tveasey added >enhancement v7.0.0 :ml v6.4.0 affects-results labels Jun 28, 2018

tveasey assigned hendrikmuhs Jun 28, 2018

Documentation

c708e7c

tveasey requested a review from hendrikmuhs June 28, 2018 17:35

tveasey unassigned hendrikmuhs Jun 28, 2018

hendrikmuhs reviewed Jun 29, 2018

View reviewed changes

hendrikmuhs approved these changes Jun 29, 2018

View reviewed changes

Correct test thresholds

35d3bf0

tveasey merged commit e16816e into elastic:master Jun 29, 2018

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Jul 2, 2018

[ML] Improve adaption of the modelling of cyclic components to very l…

2d66154

…ocalised features (elastic#134)

tveasey mentioned this pull request Jul 2, 2018

[6.4][ML] Improve adaption of the modelling of cyclic components to very localised features #138

Merged

tveasey added a commit that referenced this pull request Jul 2, 2018

[6.4][ML] Improve adaption of the modelling of cyclic components to v…

7b6d73c

…ery localised features (#138) Backport #134.

tveasey mentioned this pull request Jul 23, 2018

[ML] Don't use higher resolution for cyclic component models than data bucketing supports #163

Merged

tveasey deleted the enhancement/trend-model-split-merge-cyclic-components branch May 1, 2019 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Improve adaption of the modelling of cyclic components to very localised features #134

[ML] Improve adaption of the modelling of cyclic components to very localised features #134

tveasey commented Jun 28, 2018

hendrikmuhs Jun 29, 2018

tveasey Jun 29, 2018 •

edited

Loading

hendrikmuhs Jun 29, 2018

tveasey Jun 29, 2018

hendrikmuhs Jun 29, 2018

tveasey Jun 29, 2018

hendrikmuhs left a comment

[ML] Improve adaption of the modelling of cyclic components to very localised features #134

[ML] Improve adaption of the modelling of cyclic components to very localised features #134

Conversation

tveasey commented Jun 28, 2018

hendrikmuhs Jun 29, 2018

Choose a reason for hiding this comment

tveasey Jun 29, 2018 • edited Loading

Choose a reason for hiding this comment

hendrikmuhs Jun 29, 2018

Choose a reason for hiding this comment

tveasey Jun 29, 2018

Choose a reason for hiding this comment

hendrikmuhs Jun 29, 2018

Choose a reason for hiding this comment

tveasey Jun 29, 2018

Choose a reason for hiding this comment

hendrikmuhs left a comment

Choose a reason for hiding this comment

tveasey Jun 29, 2018 •

edited

Loading