Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support standard deviation in reduce functions #808

Closed
ktsaou opened this issue Aug 21, 2016 · 11 comments
Closed

support standard deviation in reduce functions #808

ktsaou opened this issue Aug 21, 2016 · 11 comments
Labels
feature request New features

Comments

@ktsaou
Copy link
Member

ktsaou commented Aug 21, 2016

https://en.wikipedia.org/wiki/Standard_deviation

This is useful in health monitoring - alarms to get more accurate values compared to average, when spikes need to be eliminated.

@paulfantom
Copy link
Contributor

paulfantom commented Aug 25, 2016

Apart from standard deviation, I would be overjoyed if we could somehow include Holt-Winters anomaly detection. This would be the first monitoring stack I now about, which by default includes prediction based on history trends (I know it is doable in Bosun or Riemann, but not as default function).

@ktsaou
Copy link
Member Author

ktsaou commented Aug 26, 2016

@paulfantom this algorithm seems pretty simple, but I am not sure how to use it. Have you used it in the past?

@paulfantom
Copy link
Contributor

Unfortunately no, but I am learning how to.

@ktsaou
Copy link
Member Author

ktsaou commented Nov 5, 2016

I did some research on Holt-Winters function. I also found a few issues (pierre/holt-winters#1) to the repo you mentioned, which I fixed with a PR (pierre/holt-winters#2).

So, holt-winters is like double exponential smoothing, but with seasonality (i.e. triple exponential smoothing). Since we don't know anything about seasonality on our data, holt-winters won't help further.

I collected / developed several statistical functions (including double exponential smoothing = holt-winters without seasonality).

Here they are:

long double average(long double *series, size_t entries) {
    size_t i, count = 0;
    long double sum = 0;

    for(i = 0; i < entries ; i++) {
        long double value = series[i];
        if(isnan(value) || isinf(value)) continue;
        count++;

        sum += value;
    }

    long double avg = sum / (long double)count;

    fprintf(stderr, "average % 12.7Lf\n", avg);
    return avg;
}

long double moving_average(long double *series, size_t entries, size_t period) {
    size_t i, count = 0;
    long double sum = 0, avg = 0;
    long double p[period];

    for(i = 0; i < entries; i++) {
        long double value = series[i];
        if(isnan(value) || isinf(value)) continue;

        if(count < period) {
            sum += value;
            avg = (count == period - 1) ? sum / (long double)period : 0;
        }
        else {
            sum = sum - p[count % period] + value;
            avg = sum / (long double)period;
        }

        p[count % period] = value;

        count++;
        fprintf(stderr, " > i = % 3zd, current = % 12.7Lf, movavg = % 12.7Lf\n", i, value, avg);
    }

    fprintf(stderr, "moving average (period = %zu) % 12.7Lf\n", period, avg);
    return avg;
}

long double standard_deviation(long double *series, size_t entries) {
    size_t i, count = 0;
    long double sum = 0;

    for(i = 0; i < entries ; i++) {
        long double value = series[i];
        if(isnan(value) || isinf(value)) continue;
        count++;

        sum += value;
    }
    long double average = sum / (long double)count;

    for(i = 0, count = 0, sum = 0; i < entries ; i++) {
        long double value = series[i];
        if(isnan(value) || isinf(value)) continue;
        count++;

        sum += pow(value - average, 2);
    }
    long double variance = sum / (long double)count;

    long double stddev = sqrt(variance);
    fprintf(stderr, "standard deviation % 12.7Lf\n", stddev);
    return stddev;
}

long double single_exponential_smoothing(long double *series, size_t entries, long double alpha) {
    size_t i, count = 0;
    long double level = 0, sum = 0;

    if(isnan(alpha))
        alpha = 1.0 / (entries / 2);

    for(i = 0; i < entries ; i++) {
        long double value = series[i];
        if(isnan(value) || isinf(value)) continue;
        count++;

        sum += value;

        long double last_level = level;
        level = alpha * value + (1.0 - alpha) * last_level;
        fprintf(stderr, " > i = % 3zd, current = % 12.7Lf, 1expavg = % 12.7Lf, avg = % 12.7Lf\n", i, value, level, sum/count);
    }

    fprintf(stderr, "single exponential average (alpha = % 12.7Lf) % 12.7Lf\n", alpha, level);
    return level;
}

// http://grisha.org/blog/2016/02/16/triple-exponential-smoothing-forecasting-part-ii/
long double double_exponential_smoothing(long double *series, size_t entries, long double alpha, long double beta) {
    size_t i, count = 0;
    long double level = series[0], trend, sum, forecast = series[0];

    if(isnan(alpha))
        alpha = 0.01;

    if(isnan(beta))
        beta = 0.9;

    if(entries > 1)
        trend = series[1] - series[0];
    else
        trend = 0;

    sum = series[0];

    for(i = 1; i < entries ; i++) {
        long double value = series[i];
        if(isnan(value) || isinf(value)) continue;
        count++;

        sum += value;

        long double last_level = level;

        level = alpha * value + (1.0 - alpha) * (level + trend);
        trend = beta * (level - last_level) + (1.0 - beta) * trend;
        forecast = level + trend;

        fprintf(stderr, " > i = % 3zd, current = % 12.7Lf, forecast = (% 12.7Lf + % 12.7Lf) = % 12.7Lf, avg = % 12.7Lf\n", i, value, level, trend, forecast, sum/(count+1));
    }

    fprintf(stderr, "double exponential average (alpha = % 12.7Lf, beta = % 12.7Lf) % 12.7Lf, forecast % 12.7Lf\n", alpha, beta, level, forecast);
    return forecast;
}

In single exponential smoothing, double exponential smoothing and holt-winters, alpha is the importance of the recent values: 0.0 = not important at all (i.e. no exponential smoothing), to 1.0 = only the most recent value is important. This is used to favor recent values over older ones. This is a decimal number.

In double exponential smoothing and holt-winters, beta is the importance of the trend: 0.0 = not important at all (i.e. no double exponential smoothing), to 1.0 = the trend is most important. This is a decimal number.

@ktsaou
Copy link
Member Author

ktsaou commented Nov 6, 2016

I also found FANN, a deep learning library.

I am still learning this stuff, so I opened an issue there to find out if we can use neural networks in netdata for alarms: libfann/fann#83

@ktsaou
Copy link
Member Author

ktsaou commented Nov 6, 2016

hm... I also found this: https://github.com/rubygarage/holtwinters/blob/master/index.js
It seems that it can detect the seasonality for holtwinters (although it is a brute force).

It can only detect proper values for alpha and beta - it does not detect seasonality.

@ghost
Copy link

ghost commented Mar 22, 2017

How about combining anomaly detection with taking a snapshot (#309) to a long term store? This significantly reduces the storage requirements in long term without sacrificing the value of the detail around when 'something happened'. It also spreads the work of anomaly detection across the monitored pool, so no huge MI platform in the middle...

@ktsaou
Copy link
Member Author

ktsaou commented Mar 23, 2017

@PhlashGBG I am not sure I get it. Could you please explain it a bit more?

@ghost
Copy link

ghost commented Mar 23, 2017

Heh ok :) As a possible 'enterprise user' of a monitoring tool like netdata, I would likely want to centralise storage and apply machine intelligence (MI) to detect / trace issues across my estate of thousands of real/virtual machines, much as monitoring services like NewRelic and cloud providers such as Azure and AWS can already do. This sort of works (see below) when sampling at 5min+ intervals, but will fail horribly at 1sec sample intervals due to the volume of data involved.

Why would I want 1sec sampling fed to MI? Because many incidents are ephemeral, and 5min+ sampling completely misses what actually happened, more detail gives me much more ability to diagnose and fix.

My suggestion is thus to limit the amount of data being sent to storage / MI, by filtering out the boring stuff using local anomaly detection (simple MI) on each server, which also scales much better than central MI. When anomalies are detected, netdata can send a snapshot of monitoring data around the anomaly (both before, during and after) to storage, much like a hardware logic analyser or storage oscilloscope can capture the events leading up to and around a trigger point. This gives the MI (or the humans) the detail required to diagnose and fix without overloading the centralised storage / MI with yards of boring junk.

@ktsaou
Copy link
Member Author

ktsaou commented Mar 23, 2017

ok, I see. Nice idea. Although I am not sure how MI will work with non-regular data. MI is supposed to find issues on data that follow the same principles. If suddenly they get so much detail, I am not sure what the outcome will be. I am not a data scientist though, so I don't know.

Keep in mind, you can use multiple netdata (running even on the same host), to broadcast the same metrics to different backends with different detail. For example:

  1. netdata 1: receives all metrics from all hosts, maintains a small db, archives metrics to time-series database A with 5 second resolution (A has a retention of 1 month) and sends all metrics to netdata 2.

  2. netdata 2: receives all metrics from netdata 1, maintains a small db, archives all metrics to time-series database B with 1 minute resolution (B has a retention of 3 months) and sends all metrics to netdata 3.

  3. netdata 3: receives all metrics from netdata 2, maintains a large database (a few days) with memory mode map (swap like) and archives all metrics to time-series database C with 5 minute resolution (C has a retention of 1 year).

So, in this setup:

i. you have a large high resolution database maintained by netdata 3
ii. you have 3 time-series databases, with different resolution and retention policy each.

@paulfantom
Copy link
Contributor

I think this was already implemented. Closing.

@paulfantom paulfantom added feature request New features and removed module/core labels Nov 18, 2018
vkalintiris pushed a commit to vkalintiris/netdata that referenced this issue Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New features
Projects
None yet
Development

No branches or pull requests

2 participants