Add "integral" function to InfluxQL #8194

jsternberg · 2017-03-23T19:33:36Z

The integral function is an aggregator that returns the area under the curve. The integral function also accepts an optional second argument as a time duration to determine the unit of the returned values. The default time duration is 1s (similar to derivative()).

The area under the curve can also be grouped into buckets, but integral acts slightly differently than other aggregates. First, integral does not support FILL() and will ignore any FILL() function on the query. Second, the area under the curve will automatically interpolate the area under the curve using a point in the next interval if it exists. So if you group every 20 seconds and record metrics every 10 seconds, the point at 20 seconds will be used to find the area under the curve between 10s and 20s. If you record a point every 15 seconds and group every 20 seconds, then the point at 30 seconds will be used to interpolate the area under the curve between 15 and 20 seconds and the point at 15 seconds will be used to interpolate the area under the curve between 20 and 30 seconds.

Unlike derivative(), you cannot use a function inside of integral(). If you wish to perform a query like that, subqueries are the easiest way.

If there are multiple points at the same time, this is considered a vertical line. Vertical lines do not add anything to the area under the curve, but they do change the line so the next point will be calculated based on the last point at a timestamp rather than just being completely ignored. This behavior differs from the traditional behavior of just ignoring duplicate points in a stream.

Example queries:

SELECT integral(value) FROM cpu
SELECT integral(value, 1m) FROM cpu
SELECT integral(value) FROM cpu GROUP BY time(20s)
SELECT integral(value, 1m) FROM cpu GROUP BY time(20s)
SELECT integral(mean) FROM (SELECT mean(value) FROM cpu GROUP BY time(10s)) GROUP BY time(1m)

Rebased/mergable
Tests pass
CHANGELOG.md updated
Sign CLA (if not already signed) (@Tomcat-Engineering has signed)
Provide example syntax
InfluxData Documentation: issue filed or pull request submitted [InfluxDB] Document INTEGRAL() aggregate function docs.influxdata.com-ARCHIVE#1076

jsternberg · 2017-03-23T23:12:15Z

There's a small lingering issue that I'm encountering with the interpolation. Since integral acts so weird, there's a lingering question I have. What happens when it is used with fill() and should that even be possible?

My original idea was just to read in points as a stream and perform interpolation like that. But, what happens when a fill specification is included? Imagine you have data equally spaced every 10 seconds and you call integral and tell it to group every 20 seconds. The interpolation feature allows this to learn where the next point is to complete the line going to the next interval. But, if you were to have the next point skip 1 minute into the future, should it perform an interpolation between those two points or should it cut off the area calculation at the last time before that point?

A specific example:

cpu,value=1 0s
cpu,value=2 10s
cpu,value=3 20s
cpu,value=4 30s
cpu,value=5 50s

> SELECT integral(value) FROM cpu GROUP BY time(20s)

For the first bucket, 0s to 20s, I think this is pretty simple. You would find the area between 0s and 10s and then find the area between 10s and 20s. But, the point at 20s isn't technically in the first bucket. It's the first point of the next bucket. If that first point in the next bucket was 21s and the fill is null, none, or some number, what should happen here? You can see that issue comes up later in that series because 40s is missing and it's the beginning of an interval.

@pauldix any thoughts on this? I don't think integral is complete without some form of interpolation handling the area between different intervals personally so I would like to hash out how this should work.

jsternberg · 2017-03-23T23:18:12Z

Note, my current favored plan for that is just to say fill() doesn't work with integral and tell people to use subqueries with an aggregate if they really need that functionality.

pauldix · 2017-03-24T14:07:08Z

+1 for making fill not work with integral. Should validate this at query parse and return an error. This is another one of those cases for a stream/functional oriented query language ;)

jsternberg · 2017-03-24T14:19:10Z

The potential issue with an error though is should this type of query be allowed?

SELECT mean(value), integral(value) FROM cpu GROUP BY time(20s) FILL(0)

Since we allow multiple aggregates to be queried, that FILL() would refer to MEAN().

pauldix · 2017-03-24T14:21:00Z

Hmmm yeah. Maybe with multiple aggregates it would just apply to the ones that work while leaving integral alone.

jsternberg · 2017-03-24T15:59:52Z

We don't seem to throw any kind of error when FILL() is used in a situation where it doesn't do anything so I think we should just document it and plan in the future to improve query parsing. Integral is already going to be a very weird function.

pauldix · 2017-03-24T16:00:46Z

so you're thinking don't allow fill on any query that has integral in it? So if they wanted mean and integral they'd just issue two queries?

…

On Fri, Mar 24, 2017 at 11:59 AM, Jonathan A. Sternberg < ***@***.***> wrote: We don't seem to throw any kind of error when FILL() is used in a situation where it doesn't do anything so I think we should just document it and plan in the future to improve query parsing. Integral is already going to be a very weird function. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8194 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAQ6wl3l1LTOxJV8EnmeQrfmPB0PhMCks5ro-h5gaJpZM4MnQZR> .

jsternberg · 2017-03-24T16:03:30Z

No, I mean just ignore the FILL() function and let it be used. We don't seem to have any verification to see if the FILL() function is used properly anyway. We likely need to start thinking of a plan for a v2 query parser that prevents these PHP-style things, but the current query parser's philosophy is mostly to ignore things that don't make sense silently.

So this would be valid, but also useless:

SELECT integral(value) FROM cpu GROUP BY time(1m) FILL(0)

pauldix · 2017-03-24T16:25:02Z

Sounds good. Well, as good as we have for now until query V2 is born.

…

On Fri, Mar 24, 2017 at 12:03 PM, Jonathan A. Sternberg < ***@***.***> wrote: No, I mean just ignore the FILL() function and let it be used. We don't seem to have any verification to see if the FILL() function is used properly anyway. We likely need to start thinking of a plan for a v2 query parser that prevents these PHP-style things, but the current query parser's philosophy is mostly to ignore things that don't make sense silently. So this would be valid, but also useless: SELECT integral(value) FROM cpu GROUP BY time(1m) FILL(0) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8194 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAQ62SkG1B0so57eO9Xu9A0sirFAxctks5ro-lSgaJpZM4MnQZR> .

Sineos · 2017-03-25T14:37:39Z

While playing with Grafana for graphing impulse meters (e.g. S0 meter) or consumption values I noted that Grafana plots the graph in a way not suitable for such a use case:

At time 2 the impulses give a power value of 10
We assume between time 1 and time 2 we had: Work = Delta(t) x 10
The "test left" graph is wrong here, since for the same time it would give a Work of 0
As of today, Grafana plots "test left" for stepped graphs

This is exactly the use case, why I'm eagerly waiting for the Integral implementation. I was wondering if this is only a "graphical problem" or if it would affect this upcoming feature as well.

jsternberg · 2017-03-28T14:54:51Z

@Sineos I'm not sure I understand your point, but I'm going to give it a guess. Is your point that the integral emits the wrong timestamp and that affects the final graph? I think we're currently emitting the later timestamp rather than the earlier timestamp for the area so I would imagine you run into the same problem. Am I understanding what you're saying correctly?

Sineos · 2017-03-28T21:16:46Z

@jsternberg
Yes, this is my concern currently.
Let me give you an example: Lets measure power consumption in Watts.

At t1 = 00:00:00 --> 100 Watts
At t2 = 01:00:00 --> 500 Watts
Delta(Power) = 500 - 100 = 400
Delta(Time) = t2 - t1 = 1h

Energy = 400W x 1h = 400 Wh

So, a typical use case for an integral.

Now the example as a Graph:

The blue graph would show the correct result, whereas the black graph would give an Energy value of 100 Wh (the respective areas under the graph).

This would be true for all consumption based calculations and also for rate based calculations if the derivative function follows the same logic.

I guess the idea for a consumption calculation is that I can only look in the past. So if we measure 500 Watts at t2, it is safe to assume that this happened in the time between t1 and t2. So if we choose Delta(t) small enough, the calculation will be pretty accurate.

The same logic applies the other way round: If we measure network traffic and our counter shows 100MB at t1 and 400MB at t2, then 300MB of traffic have been generated between t1 and t2. Given the time and the traffic we can calculate the network bandwidth that has caused the traffic. The result in words would be: From t1 onward we had a rate of X MB/s that would eventually lead to an increase of 300MB at t2.

dgnorton · 2017-03-30T17:42:18Z

I tried creating an adhoc test for @Sineos 's example above and came across what appears to be an inconsistency in the timestamps in the output. E.g., given the following data:

> select * from pwr
name: pwr
time                 watts
----                 -----
1970-01-01T00:00:00Z 100
1970-01-01T01:00:00Z 500
1970-01-01T02:00:00Z 100

It returns the following:

> SELECT integral(watts,1h) FROM pwr WHERE time >= 0 AND time <= 7200000000000 GROUP BY time(1h)
name: pwr
time                 integral
----                 --------
1970-01-01T00:00:00Z 300
1970-01-01T02:00:00Z 300

Note that the timestamp for the first bucket is at the beginning of the first bucket and the timestamp for the second bucket is at the end of the third bucket.

Tomcat-Engineering · 2017-03-30T17:45:06Z

@Sineos in your energy example the algorithm will give an integral of 300Wh (as per @dgnorton's comment).

You can think of this as the average power multiplied by the time period, or as linear interpolation or as the trapezium rule - they are all the same thing and give the same answer!

…the curve

jsternberg · 2017-03-30T17:53:58Z

I fixed the bug that caused the wrong output @dgnorton found. I also added an additional condition where if your last point is at the very start of a new interval (so there is no area yet), no point will be pushed out for the last interval even though other aggregates would. This is solely due to the unique nature of integral.

The time that gets output for a bucket is the start of the interval to match with the same behavior that other aggregates do. So the area between 0:00 and 1:00 will have a time of 0:00 when the query is ascending. It'll be the opposite when descending.

dgnorton · 2017-03-30T18:15:25Z

@pauldix any thoughts on whether timestamps in the output should be from the beginning or end of each bucket?

pauldix · 2017-03-30T19:21:14Z

@dgnorton I think it makes sense to match the behavior of the other ones like @jsternberg did

Sineos · 2017-03-30T20:25:08Z

@Tomcat-Engineering
Thanks for the clarification. Appreciated.

inselbuch · 2017-05-01T05:55:49Z

you gots a typo fella

SELECT integral(value, 1m) FROM cpu GROU PBY time(20s)

jsternberg · 2017-05-01T05:57:13Z

Fixed the typo for anyone who encounters this from a search engine. Unfortunately, the commit message will be there for all time :(

jsternberg added the in progress label Mar 23, 2017

jsternberg mentioned this pull request Mar 23, 2017

Add "integral" function to InfluxQL #7591

Closed

3 tasks

jsternberg force-pushed the js-integral-function branch 2 times, most recently from e4cf04e to 0cf6b72 Compare March 24, 2017 15:16

jsternberg force-pushed the js-integral-function branch from 0cf6b72 to a88da40 Compare March 24, 2017 16:03

jsternberg force-pushed the js-integral-function branch from a88da40 to 550a47f Compare March 24, 2017 16:27

jsternberg mentioned this pull request Mar 24, 2017

[InfluxDB] Document INTEGRAL() aggregate function influxdata/docs.influxdata.com-ARCHIVE#1076

Closed

jsternberg requested a review from dgnorton March 24, 2017 20:26

jsternberg force-pushed the js-integral-function branch from 550a47f to dd239da Compare March 28, 2017 15:44

rbetts added review and removed in progress labels Mar 30, 2017

Add "integral" function to InfluxQL

cac94a1

jsternberg force-pushed the js-integral-function branch from dd239da to 9abf2dd Compare March 30, 2017 17:10

Interpolate between different intervals to find the whole area under …

2ea805c

…the curve

jsternberg force-pushed the js-integral-function branch from 9abf2dd to 2ea805c Compare March 30, 2017 17:52

dgnorton approved these changes Mar 30, 2017

View reviewed changes

jsternberg merged commit a221e32 into master Mar 30, 2017

jsternberg deleted the js-integral-function branch March 30, 2017 23:24

jsternberg removed the review label Mar 30, 2017

cantino mentioned this pull request May 18, 2017

Aggregate Integral Function #1400

Closed

rkuchan mentioned this pull request Jun 26, 2017

Add documentation for newly-implemented "integral" InfluxQL function. influxdata/docs.influxdata.com-ARCHIVE#855

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "integral" function to InfluxQL #8194

Add "integral" function to InfluxQL #8194

jsternberg commented Mar 23, 2017 •

edited

Loading

jsternberg commented Mar 23, 2017

jsternberg commented Mar 23, 2017

pauldix commented Mar 24, 2017

jsternberg commented Mar 24, 2017

pauldix commented Mar 24, 2017

jsternberg commented Mar 24, 2017

pauldix commented Mar 24, 2017 via email

jsternberg commented Mar 24, 2017

pauldix commented Mar 24, 2017 via email

Sineos commented Mar 25, 2017

jsternberg commented Mar 28, 2017

Sineos commented Mar 28, 2017 •

edited

Loading

dgnorton commented Mar 30, 2017

Tomcat-Engineering commented Mar 30, 2017

jsternberg commented Mar 30, 2017

dgnorton commented Mar 30, 2017

pauldix commented Mar 30, 2017

Sineos commented Mar 30, 2017

inselbuch commented May 1, 2017

jsternberg commented May 1, 2017 •

edited

Loading

Add "integral" function to InfluxQL #8194

Add "integral" function to InfluxQL #8194

Conversation

jsternberg commented Mar 23, 2017 • edited Loading

jsternberg commented Mar 23, 2017

jsternberg commented Mar 23, 2017

pauldix commented Mar 24, 2017

jsternberg commented Mar 24, 2017

pauldix commented Mar 24, 2017

jsternberg commented Mar 24, 2017

pauldix commented Mar 24, 2017 via email

jsternberg commented Mar 24, 2017

pauldix commented Mar 24, 2017 via email

Sineos commented Mar 25, 2017

jsternberg commented Mar 28, 2017

Sineos commented Mar 28, 2017 • edited Loading

dgnorton commented Mar 30, 2017

Tomcat-Engineering commented Mar 30, 2017

jsternberg commented Mar 30, 2017

dgnorton commented Mar 30, 2017

pauldix commented Mar 30, 2017

Sineos commented Mar 30, 2017

inselbuch commented May 1, 2017

jsternberg commented May 1, 2017 • edited Loading

jsternberg commented Mar 23, 2017 •

edited

Loading

Sineos commented Mar 28, 2017 •

edited

Loading

jsternberg commented May 1, 2017 •

edited

Loading