Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature idea: moving windows for aggregation #7806

Closed
bakura10 opened this issue Sep 20, 2014 · 1 comment
Closed

Feature idea: moving windows for aggregation #7806

bakura10 opened this issue Sep 20, 2014 · 1 comment

Comments

@bakura10
Copy link

Hi everyone,

This issue is a follow-up from this thread (https://groups.google.com/forum/#!msg/elasticsearch/BSWVafBefp8/tiBEomy7OAgJ).

Moving window is a very handy analytics feature that can be found in Oracle and PostgreSQL, and it would be awesome to have something similar in Elasticsearch. The idea is to define a "window" that is moving (typically inside a specified time range) to perform various analytics operations. The implementation in Oracle is a bit complex and cover more uses that what I need, so I'd suggest you have a look at how it works there too :).

Moving windows allow to answer that kind of questions: for each day inside a timeframe, give me the sum amount for the last 30 days, starting from the given day. For instance, with following data:

[
    {
        "date": "2014-01-01",
        "amount": 10
    },
    {
        "date": "2014-01-01",
        "amount": 15
    },
    {
        "date": "2014-01-02",
        "amount": 5
    },  
    {
        "date": "2014-01-03",
        "amount": 25
    },
    {
        "date": "2014-01-05",
        "amount": 8
    }
] 

If we want the sum for the last 3 days at each point, the result would be:

2014-01-01: 10+15 = 25
2014-01-02: 10+15+5 = 30
2014-01-03: 10+15+5+25 = 55
2014-01-04: 5+25+0 = 30
2014-01-05: 5+25+0+8 = 38

As you can see, some interesting optimizations could be done, so that Elasticsearch does not need to recompute the sum at each window, but rather reuse the result of the previous window.

Currently, the only way to do that in Elasticsearch is either to define one bucket for each couple (so "now-60d" to "now-30d", "now-59d" to "now-29"... and so on), but this is very inefficient. Or asking for more data and manually create the rolling sum/avg/... in client side.

I don't have any idea about what the syntax could be:

{
    "query": {
        "filtered": {
            "filter": {
                "range": {
                    "my_date": {
                        "from": "now-60d",
                        "to": "now"
                    }
                }
            }
        }
    },

    "aggs": {
        "by_date": {
            "date_histogram": {
                "field": "my_date",
                "interval": "day"
            },

            "window": {
                "period": "-30d",
                "min": "now-30d",

                "aggs": {
                    "my_sum": {
                        "sum": "amount"
                    }
                }
            }
        }
    }
}

The idea would be to first filter data so that we have enough data to cover the "-30d" in the window. The window would then define a period (if positive, it would be "all the records after", or if negative "all the records before"). The "min" would allow to limit results (because we want 30 data points with each sum being the result of last 30 days BEFORE the given day, that's why we need to filter "now-60d" but we only want for the last 30 days).

The syntax here only cover date, while Oracle/PostgreSQL window functions can only be used for operations like "return sum of last 30 records".

Let me know if you need any more details about the use case :).

Thanks!

@clintongormley
Copy link

Hi @bakura10

While this is not exactly the same as #4404, it is related. There are plans afoot for such a feature, and I'm using #4404 as the "main" issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants