Proposal – Time Interpolation #2428

sanderson · 2020-01-24T16:49:52Z

I've talked with three users just this week who want/need to be able to generate points at given intervals if no data exists – time interpolation. I've been thinking through possible implementations and have the following proposals:

Proposal 1 - Create an `interpolate()` function

|> interpolate(every: 10s)

This function would essentially "window" the data by the specified duration and check to see if a point exists in each window. If the window is empty, it inserts a row with a _time value calculated from the right bound of the window. All columns that are in the group key inherit the key values. All columns that are not in the group key are set to null.

You could then use fill() to replace the null values. I don't think we could fill as part of the
interpolate() function because we don't necessarily know what or how many columns will be null in the output. So it's best left to the user to fill the columns as they need to.

Proposal 2 - Add interpolation param(s) to `fill()`

fill(
  column: "_value",
  value: "value"
  usePrevious: false,
  interpolate: true,
  interpolateEvery: 10s
)

This solution would add interpolate and interpolateEvery parameters to the fill() function. As in proposal 1, it would "window" the data by the specified interpolateEvery duration and check to see if a point exists in each window. If the window is empty, it inserts a row with a _time value calculated from the right bound of the window and fills the specified column with the specified fill value. All columns that are in the group key inherit the key values. All columns that are not the specified fill column and are not in the group key are dropped from the table.

The downside of this approach is that the user may want to preserve columns that would get dropped in this process.

The text was updated successfully, but these errors were encountered:

invernizzie · 2020-07-07T16:23:49Z

Is there an update on this proposal? This is a much needed feature when joining data, since there's no "outer" join method either.

KoenWindey · 2020-07-25T02:07:31Z

Highly recommended function!! Particularly for set ups where you only record data 'on change' to preserve bandwith. It results in big gaps in the data by design, but then impacts on many mathematical operations if you can't recreate the 'missing' data on the fly.

Seems people at Timescale understand this very well:
https://docs.timescale.com/latest/tutorials/tutorial-howto-visualize-missing-data-grafana

Cheers
K

ggear · 2020-09-26T09:51:38Z

+1, this would be a great feature and quite necessary

giter · 2020-10-05T12:25:27Z

+1, quite necessary

ryo0ka · 2020-10-20T19:00:03Z

+1 I'm surprised this is not an option.

pbegg · 2020-10-21T22:54:16Z

+1

tedblad · 2020-11-03T09:58:22Z

+1

fredmef · 2020-11-12T16:12:02Z

+1, this would be a major feature and necessary

collinmcnulty · 2020-12-03T15:13:47Z

I think this is a critical feature. I have written up a small but challenging test case. Note that in this case the interpolation interval (how frequently result points are spaced out) and the interpolation radius (how far a raw point can be away from the result point while still being considered) are the same at 5 minutes.

Time	Raw Data
1:59	6
2:01	5
2:02	1
2:07	3
2:21	4
2:25	2
2:29	0.7

Interpolate data with 5 minute interpolation interval, 5 minute interpolation radius, from 2:00 to 2:35:

Time	5 Min Interpolated Value	Explanation
2:00	5.5	Interpolated b/w 1:59 and 2:01, note point outside query interval is considered
2:05	2.2	Interpolated b/w 2:02 and 2:07
2:10	3	Only one point within interpolation radius, 2:07
2:15	nan	No points within interpolation radius
2:20	4	Only one point within interpolation radius, 2:21
2:25	2	Exact point available
2:30	0.7	Only one point within interpolation radius
2:35	nan	No points within interpolation radius

igabe79 · 2020-12-16T20:55:24Z

+1 much needed here as well!

konstk1 · 2020-12-28T02:51:09Z

+1

tillsc · 2021-01-03T13:44:22Z

I did a small workaround with an union/pivot (some sort of "outer join") of two querys: https://gist.github.com/tillsc/2d71f3e89409049833e75ed22689ad40

Maybe this helps someone in the meantime... Use "staircase" plotting mode (not the "Step" plotting mode of the influx ui itself!).

nathanielc · 2021-01-22T18:04:33Z

We have added a first implementation of an interpolation function here https://github.com/influxdata/flux/blob/master/stdlib/interpolate/interpolate.flux It does simple linear interpolation on a regular time interval. We have plans to implement other types of interpolation as well.

For now that is a link to the source code with a comment that has some docs. Public docs are in progress.

empz · 2021-05-10T11:33:13Z

I'm also very interested in this. Both TimescaleDB and Azure Data Explorer (which is not even an actual time-series DB) can do this.
It's extremely useful for systems that do delta-logging (storing values only on change to save bandwidth and storage costs).

Besides linear interpolation, I'd like to propose having the option to do backward and forward fill interpolation.

Looking forward for this to be available on InfluxDB Cloud.

414501408 · 2021-06-11T09:38:51Z

This did take me a lot of time, but eventually I managed to solve it with some ideas to share with you

step1:
search the last data in your database , and then store it in a hashmap
like this:
from(bucket:"dcs_data")
|> range(start: -2000000h, stop: 1621526400)
|> filter(fn: (r) => r["_measurement"] == "dcs_real")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["point"] == "TLLGHG1B051002402PSV" or r["point"] == " TLLGHG1B0101036010FQ")
|> group(columns: ["point"])
|> last()

in my example , I want to get the last value of 2 points before the stop timestamp ( which time is the next query of start time of your query)

step2:
white your query in range ( start: .... , stop: ....)

you may find the fill option (usePrevious: true) is not working, because it is mentioned in documents:

it is Terrible！but you can use your data get from. STEP1, write some code to fill it your self

By The Way:
I usually got nothing , when I use range(start: xxx, stop: xxx) to query data, when no data in this time arrange, not every time, but too often...
Although I Added （createEmpty:true） in my query Window
still got nothing

I think this maybe a new BUG I found

after a lot of tests, I find another way to solve it, I use
|> timeShift(duration: xxxx)
instead of stop in range

so, I rewrote code in STEP2 to:

every thing seems work for me

pseudocoder · 2021-06-14T10:54:57Z

Unless I'm mistaken, previous comment doesn't interpolate?

Repeating the same value across successive time windows is straightforward using fill, but then you get a step function not interpolation which is what this topic is about.

Scrumplex · 2021-12-13T18:10:17Z

I use this:

from(bucket: "hass")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "W")
  |> filter(fn: (r) => r["entity_id"] == "smart_power_plug")
  |> filter(fn: (r) => r["_field"] == "value")
// let's pretend we are interpolating
  |> aggregateWindow(every: 1m, fn: last, createEmpty: true)
// fill nulls generated by aggregateWindow with previous
  |> fill(column: "_value", usePrevious: true)
  |> aggregateWindow(every: v.windowPeriod, fn: mean)
  |> yield(name: "mean")

It's slow with large queries AND inaccurate with smaller queries. So the worst of both worlds.

sanderson · 2021-12-13T18:23:19Z

@Scrumplex since this issue was create, the interpolate package was introduced and provides interpolate.linear() for linear interpolation.

Scrumplex · 2021-12-13T18:27:13Z

@Scrumplex since this issue was create, the interpolate package was introduced and provides interpolate.linear() for linear interpolation.

Sadly that's not what I am looking for. I would rather have it interpolate nulls and deal with it with fill instead.

vicent-dev · 2021-12-14T12:56:02Z

Is the issue still open or the main problem can be fixed using interpolate.linear()? I don't know how could I fill nulls with that function because as far as I can see interpolate.linear() will generate more rows in the measurement instead of fill null values.
I can't imagine a way to combine those functions together to achieve the desired result.

Thanks.

fidovo · 2022-02-17T10:09:03Z

finally we need this +1

TicoM1 · 2022-03-16T13:35:20Z

Too bad the original proposal wasn't considered. Finding the correct timestamps is the issue, not filling the new data with values. So I'd have loved to see interpolate.null() alongside interpolate.linear(). Espacially as the fill() documentation explicitly serves that issue on a silver platter already.

If I'm not mistaken even if I'm only missing data points due to "drop data point with non-changing values" I still have to fix my unregularly timed data to a regular interval (that I have to decide on manually) via aggregation on group by time(interval) and then fill(previous).

Good probability I'm missing something here, but I expected this to be an everyday use case, that deserves an easy 1-2 liner in a temp database.

srittau · 2022-04-30T18:37:40Z

A null interpolation would be very welcome for working with Grafana time series graphs. Currently, when using solid line style, it will always connect the data points. The only way to suppress this is to use null data points. interpolate.null() would be ideal to fill these missing data points.

M0LTE · 2022-08-11T16:54:54Z

Yet another backwards step in Flux vs InfluxQL

Bostwickenator · 2022-11-21T03:57:15Z

I too would really like a interpolate.null() concept

andrus801 · 2023-01-30T16:07:24Z

+1 for "interpolate.null()"

DenisMir · 2023-02-12T21:30:17Z

Is there still no solution for that one here?

The linear function is not enough.

In the end we get graphs like:

The first available data point is out of the time window. So linear interpolation doesn't help at all here.

This was no problem with InfluxDB1. WTF.

Danoxv · 2023-02-18T14:42:25Z

@DenisMir, how did you fill in the blank spaces between the values??

gabrielmocan · 2023-04-14T15:08:37Z

Any progress on that matter?

twosdai · 2023-04-26T18:16:57Z

Just wanted to post my workaround in case others find it useful. I use InfluxDB as a backend DB which is wrapped by an API. If you're looking to just use influx directly this method probably isn't a good idea since it requires that you have an application client calling influx, and then you return that data to a different process. In my case I'm using the NodeJS client lib

In code solution step
Basically if the response from your query doesn't contain any data you'd need to do something like the following in your program afterwards

           // Basic Client setup example
         const client = new InfluxDB({ url:'myInfluxURL', token: "myToken", timeout: 60000 })
        const query = ` some really cool query with aggregationIntervals `; // This would be your query
        const queryAPI = client.getQueryApi("myOrg");
        const data = queryApi.collectRows<any>(query);
        
        
        // Actual work around logic
                if (data.length === 0) {
                    const start = new Date(startTime); // The startTime of the query
                    const end = new Date(endTime); // The endTime of the query
                    const interval = end.getTime() - start.getTime(); // The time in ms between them
                    const intervalInMilliseconds = aggregationIntervalInMS[argumentAgg]; // Convert the aggregation interval chosen to ms, I used a static mapping since my aggregation intervals were well known and only a subset of all the other ones. 
                    const numberOfIntervals = Math.ceil(interval / intervalInMilliseconds); // Get the number of intervals
                    const aggregatedResults = [];
                    for (let i = 0; i < numberOfIntervals; i++) {
                    // You may want to change this result to include any relevant metadata you want assigned to each interval. 
                        aggregatedResults.push(
                                 {
                                value: '0',
                                startTime: new Date(start.getTime() + i * intervalInMilliseconds).toISOString(),
                                endTime: new Date(start.getTime() + (i + 1) * intervalInMilliseconds).toISOString(),
                            }
                        );
                    }
                    return aggregatedResults
                }

The aggregationIntervalInMS mapping I have is simply the following:

{

    hour:3600000,
    day:86400000,
    month:2592000000,

}

I think in specific use cases this can help fill in data if you're looking to wrap influx.

SindreBrurberg · 2024-01-31T12:03:10Z

If it is of interest to anyone, This is the closest i managed to get with a update on change type data stream. Example is with data from homeassistant.

from(bucket: "home_assistant")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "%")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "sensor")
  |> filter(fn: (r) => r["entity_id"] == "meter_plus_8fe5_humidity")
  |> aggregateWindow(every: 10s, fn: mean, createEmpty: true)
  |> fill(usePrevious: true)
  |> filter(fn: (r) => r._value > 0)
  |> yield(name: "mean")

It still can't fix the missing data before the time range, however it does a good job at representing the inactivity / staircase of data that does not change linearly.

github-actions · 2024-04-01T01:41:36Z

This issue has had no recent activity and will be closed soon.

srittau · 2024-04-01T09:40:47Z

This is still an unsolved problem, closing it would be premature.

Zincr0 · 2024-05-17T17:58:45Z

Four years and counting. Commenting here so the bot doesn't close this (still relevant) issue.

jsternberg added the proposal label Jan 31, 2020

invernizzie mentioned this issue Jul 7, 2020

Monitor checks or custom task for state change detection #2969

Closed

OlafHaalstra mentioned this issue Dec 9, 2020

influxdb2 does not send notifications influxdata/influxdb#18769

Open

leeoniya mentioned this issue Dec 19, 2023

State timeline with gap in data still display grafana/grafana#79675

Closed

github-actions bot added the no-issue-activity label Apr 1, 2024

sanderson removed the no-issue-activity label Apr 1, 2024

leeoniya mentioned this issue Jun 1, 2024

Panels: Disconnect Value Threshold does not interact well with automatic data point interval calculation grafana/grafana#83031

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal – Time Interpolation #2428

Proposal – Time Interpolation #2428

sanderson commented Jan 24, 2020

invernizzie commented Jul 7, 2020

KoenWindey commented Jul 25, 2020

ggear commented Sep 26, 2020

giter commented Oct 5, 2020 •

edited

Loading

ryo0ka commented Oct 20, 2020

pbegg commented Oct 21, 2020

tedblad commented Nov 3, 2020

fredmef commented Nov 12, 2020

collinmcnulty commented Dec 3, 2020 •

edited

Loading

igabe79 commented Dec 16, 2020

konstk1 commented Dec 28, 2020

tillsc commented Jan 3, 2021

nathanielc commented Jan 22, 2021

empz commented May 10, 2021 •

edited

Loading

414501408 commented Jun 11, 2021

pseudocoder commented Jun 14, 2021

Scrumplex commented Dec 13, 2021

sanderson commented Dec 13, 2021

Scrumplex commented Dec 13, 2021

vicent-dev commented Dec 14, 2021 •

edited

Loading

fidovo commented Feb 17, 2022

TicoM1 commented Mar 16, 2022 •

edited

Loading

srittau commented Apr 30, 2022

M0LTE commented Aug 11, 2022

Bostwickenator commented Nov 21, 2022

andrus801 commented Jan 30, 2023

DenisMir commented Feb 12, 2023 •

edited

Loading

Danoxv commented Feb 18, 2023

gabrielmocan commented Apr 14, 2023

twosdai commented Apr 26, 2023 •

edited

Loading

SindreBrurberg commented Jan 31, 2024 •

edited

Loading

github-actions bot commented Apr 1, 2024

srittau commented Apr 1, 2024

Zincr0 commented May 17, 2024

Proposal – Time Interpolation #2428

Proposal – Time Interpolation #2428

Comments

sanderson commented Jan 24, 2020

Proposal 1 - Create an interpolate() function

Proposal 2 - Add interpolation param(s) to fill()

invernizzie commented Jul 7, 2020

KoenWindey commented Jul 25, 2020

ggear commented Sep 26, 2020

giter commented Oct 5, 2020 • edited Loading

ryo0ka commented Oct 20, 2020

pbegg commented Oct 21, 2020

tedblad commented Nov 3, 2020

fredmef commented Nov 12, 2020

collinmcnulty commented Dec 3, 2020 • edited Loading

igabe79 commented Dec 16, 2020

konstk1 commented Dec 28, 2020

tillsc commented Jan 3, 2021

nathanielc commented Jan 22, 2021

empz commented May 10, 2021 • edited Loading

414501408 commented Jun 11, 2021

pseudocoder commented Jun 14, 2021

Scrumplex commented Dec 13, 2021

sanderson commented Dec 13, 2021

Scrumplex commented Dec 13, 2021

vicent-dev commented Dec 14, 2021 • edited Loading

fidovo commented Feb 17, 2022

TicoM1 commented Mar 16, 2022 • edited Loading

srittau commented Apr 30, 2022

M0LTE commented Aug 11, 2022

Bostwickenator commented Nov 21, 2022

andrus801 commented Jan 30, 2023

DenisMir commented Feb 12, 2023 • edited Loading

Danoxv commented Feb 18, 2023

gabrielmocan commented Apr 14, 2023

twosdai commented Apr 26, 2023 • edited Loading

SindreBrurberg commented Jan 31, 2024 • edited Loading

github-actions bot commented Apr 1, 2024

srittau commented Apr 1, 2024

Zincr0 commented May 17, 2024

Proposal 1 - Create an `interpolate()` function

Proposal 2 - Add interpolation param(s) to `fill()`

giter commented Oct 5, 2020 •

edited

Loading

collinmcnulty commented Dec 3, 2020 •

edited

Loading

empz commented May 10, 2021 •

edited

Loading

vicent-dev commented Dec 14, 2021 •

edited

Loading

TicoM1 commented Mar 16, 2022 •

edited

Loading

DenisMir commented Feb 12, 2023 •

edited

Loading

twosdai commented Apr 26, 2023 •

edited

Loading

SindreBrurberg commented Jan 31, 2024 •

edited

Loading