Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill(previous) should find most recent value, even if outside query time range #6878

Open
beckettsean opened this issue Jun 20, 2016 · 120 comments

Comments

@beckettsean
Copy link
Contributor

Feature Request

Proposal: [Description of the feature]

When executing fill(previous) the query should always have a value for previous, even if there is no point with that field in the query time range.

Current behavior: [What currently happens]

> select * from fp
name: fp
--------
time            value
2016-06-20T16:09:13Z    10
2016-06-20T16:19:13Z    100

> select max(value) from fp where time > now() - 20m group by time(5m) fill(previous)
name: fp
--------
time            max
2016-06-20T16:05:00Z    10
2016-06-20T16:10:00Z    10
2016-06-20T16:15:00Z    100
2016-06-20T16:20:00Z    100
2016-06-20T16:25:00Z    100

> select max(value) from fp where time > now() - 18m group by time(5m) fill(previous)
name: fp
--------
time            max
2016-06-20T16:10:00Z    
2016-06-20T16:15:00Z    100
2016-06-20T16:20:00Z    100
2016-06-20T16:25:00Z    100

Note the null value for the 16:10-16:15 bucket, despite there being a point at 16:09 with a value.

Desired behavior: [What you would like to happen]

> select * from fp
name: fp
--------
time            value
2016-06-20T16:09:13Z    10
2016-06-20T16:19:13Z    100

> select max(value) from fp where time > now() - 20m group by time(5m) fill(previous)
name: fp
--------
time            max
2016-06-20T16:05:00Z    10
2016-06-20T16:10:00Z    10
2016-06-20T16:15:00Z    100
2016-06-20T16:20:00Z    100
2016-06-20T16:25:00Z    100

> select max(value) from fp where time > now() - 18m group by time(5m) fill(previous)
name: fp
--------
time            max
2016-06-20T16:10:00Z    10
2016-06-20T16:15:00Z    100
2016-06-20T16:20:00Z    100
2016-06-20T16:25:00Z    100

Use case: [Why is this important (helps with prioritizing requests)]

Currently customers have to know when the last value was recorded in order to make sure that point is included in the time range. For irregular series that's a significant burden. If the system can always find the most recent value regardless of the lower time bound, then many state change queries become useful.

@jwheeler-gs
Copy link

Are there any good work-arounds for this right now? I'm collecting sparse data and trying to graph it using fill(previous), which results in the first several values being null because the previous value falls outside the desired query time range.

The only thing I can think to do right now is to execute a second query to get last(field) with the time ending at the start time of the above query, then use that result to fill in the null values.

@beckettsean
Copy link
Contributor Author

@jwheeler-gs that's the best workaround for now

@jsternberg
Copy link
Contributor

I have a crude solution to this in the branch for #5943. The proper solution to this requires #5943 to be implemented, but the crude solution will be good enough for 1.0 if I can't get the proper solution working in time.

@jsternberg jsternberg self-assigned this Jun 27, 2016
@beckettsean
Copy link
Contributor Author

what happens if fill(previous) is used but there was no data in any interval? Presumably we could pull the tag set for the previous point and use that? Specifically wondering about #6967

@jsternberg
Copy link
Contributor

I... have no idea. I will have to get back to you on that.

@jwheeler-gs
Copy link

If there was no previous data, wouldn't it make sense to just return null until the first value is encountered? That would be the same as the current behavior but only in the case where there are no previous values.

@jsternberg
Copy link
Contributor

@jwheeler-gs yes, that is the current behavior and would be for this too. The issue is what happens when there is no data in that interval at all. Right now, fill will not fill a series that doesn't exist in the interval. @beckettsean was asking what would happen if fill(previous) was used and there was no data in the interval, but there was data in the past at some point for that series.

@jwheeler-gs
Copy link

Aaah, I get it now. That's actually going to be a possible case for my use as well. I'd expect (hope for) fill to return that previous value for the entire interval. But then again, I'd also expect it to not return any data past the current point in time.

I'm using fill to provide data directly to a chart which can be adjusted to show any time window, past, present, or even future (where future data is filled in using prediction data from another dataset). What I really want is to see the previous value up to some specified point (the present timestamp) and then no data past that. This is probably a bit much to ask of influx and is outside the scope of what it really needs to provide. That can still be handled easily enough on the receiving end by nulling out all values past the current timestamp.

@retorquere
Copy link

Would it be possible to add fill(previous) for non-grouped queries? Something like

select GPS.latitude, GPS.longitude, Weather.temperature from car where time >=... and time <=... fill(previous)

Weather.temperature is written to the DB sparsely, and the reader must assume that as long as no new temperature is reported, the previous value holds.

I know I can do

select GPS.latitude, GPS.longitude, Weather.temperature from car where time >=... and time <=... group by time(1s) fill(previous)

but that slows down the request considerably, and I get loads of empty lines for times in the time range where there are no measurements.

@beckettsean
Copy link
Contributor Author

@retorquere the issue with having fill(previous) with no GROUP BY time() clause is that the system doesn't know when to fill. Should it return one point per nanosecond? Per second? A group by interval is needed to create a regular time series, so that the "missing" points are clear.

@retorquere
Copy link

@beckettsean I will defer to your superior knowledge on the matter of course, but conceptually, I'd figure it would return exactly the same points as with a regular non-grouped selected, just with the nulls filled in by the value in the column in one of the rows already selected.

@beckettsean
Copy link
Contributor Author

@retorquere InfluxDB does not store nulls. There are no nulls returned in a non-grouped query.

@retorquere
Copy link

If I submit this however:

curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb" --data-urlencode "q=select GPS.latitude, Weather.temperature from car where time >= '2015-10-13T14:16:13Z' limit 10"

I get this (where I'd love for there to be a way to have those nulls be replaced by 5)

{
    "results": [
        {
            "series": [
                {
                    "name": "car",
                    "columns": [
                        "time",
                        "GPS.latitude",
                        "Weather.temperature"
                    ],
                    "values": [
                        [
                            "2015-10-13T14:16:14Z",
                            51.9893696,
                            5
                        ],
                        [
                            "2015-10-13T14:16:15Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:16Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:17Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:18Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:19Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:20Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:21Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:22Z",
                            51.9893696,
                            null
                        ],
                        [
                            "2015-10-13T14:16:23Z",
                            51.9893696,
                            5
                        ]
                    ]
                }
            ]
        }
    ]
}

@beckettsean
Copy link
Contributor Author

That's an interesting use case, where one field is more densely populated
than another. It might make sense to have fill(previous) in that case. Can
you open a feature request
https://github.com/influxdata/influxdb/issues/new describing that use
case?

On Thu, Aug 25, 2016 at 1:10 PM, retorquere notifications@github.com
wrote:

If I submit this however:

curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb" --data-urlencode "q=select GPS.latitude, Weather.temperature from car where time >= '2015-10-13T14:16:13Z' limit 10"

I get this (where I'd love for there to be a way to have those nulls be
replaced by 5)

{
"results": [
{
"series": [
{
"name": "car",
"columns": [
"time",
"GPS.latitude",
"Weather.temperature"
],
"values": [
[
"2015-10-13T14:16:14Z",
51.9893696,
5
],
[
"2015-10-13T14:16:15Z",
51.9893696,
null
],
[
"2015-10-13T14:16:16Z",
51.9893696,
null
],
[
"2015-10-13T14:16:17Z",
51.9893696,
null
],
[
"2015-10-13T14:16:18Z",
51.9893696,
null
],
[
"2015-10-13T14:16:19Z",
51.9893696,
null
],
[
"2015-10-13T14:16:20Z",
51.9893696,
null
],
[
"2015-10-13T14:16:21Z",
51.9893696,
null
],
[
"2015-10-13T14:16:22Z",
51.9893696,
null
],
[
"2015-10-13T14:16:23Z",
51.9893696,
5
]
]
}
]
}
]
}


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6878 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGPAcegxWGsVX6UsH3lBkSNnty064F4Gks5qjei9gaJpZM4I54m9
.

Sean Beckett
Director of Support and Professional Services
InfluxDB

@richardeaxon
Copy link

+1 for this feature.

@njbuch
Copy link

njbuch commented Feb 11, 2017

+1 for this feature!

@titilambert
Copy link

Yes please ! This could be release useful !

@jeremyleijssen
Copy link

Any updates on this? or anyone know a workaround or something?

Currently needing this function

@yellowpattern
Copy link

+1 for this feature. Matter of fact, I thought "fill(previous)" and "fill(linear)" would do the job already.

@Cjkeenan
Copy link

Cjkeenan commented Sep 8, 2021

Switched to Prometheus, here i don't have the problem of missing graphs if the value stays the same over a long period of time (with Home Assistant and Grafana).

Wait so does Prometheus have this issue or not? This particular feature is critical for my use case of monitoring smart home equipment as they are only uploaded to the DB on changes and not on a regular interval. Also, is there any way to import/copy an InfluxDB into a Prometheus DB?

I also like how the official FAQ touches on this issue, even referencing this issue thread, but just leave it at that with no update.
https://docs.influxdata.com/influxdb/v1.8/troubleshooting/frequently-asked-questions/#why-does-fillprevious-return-empty-results

@KlausHans
Copy link

I am not sure to be honest. Maybe the Prometheus connector of Home Assistant is different to the Influx connector and writes the same value regularly. It did work in this combination (HA + Prometheus) so i did not investigated further.

@vdwpsmt
Copy link

vdwpsmt commented Sep 10, 2021

related issue: influxdata/flux#702

@Kuzj
Copy link

Kuzj commented Nov 14, 2021

Add the last value to the request, an example that works in grafana
SELECT mean("value") FROM
(SELECT mean("value") AS value FROM "db"."measurement"
WHERE $timeFilter AND ("entity" = 'entity_id')
GROUP BY time($__interval)),
(SELECT last("value") AS value FROM "db"."measurement"
WHERE ("entity" = 'entity_id'))
GROUP BY time($__interval) fill(previous)

@OptrixAU
Copy link

Add the last value to the request, an example that works in grafana SELECT mean("value") FROM (SELECT mean("value") AS value FROM "db"."measurement" WHERE $timeFilter AND ("entity" = 'entity_id') GROUP BY time($__interval)), (SELECT last("value") AS value FROM "db"."measurement" WHERE ("entity" = 'entity_id')) GROUP BY time($__interval) fill(previous)

@Kuzj - your function only works if you're querying the latest data, since it doesn't have the time filter in the second query. If I was querying some time in the past, it would give me the time-frame I'm asking for, plus the most recent value - not something I want.

IE. I ask for data from the 1st of April to the 23rd of April, but I'm running the query in December - I end up getting not only the data I asked for, but one point of data for today.

The only way you can make your query work is if you put the upper time threshold in the second query.

And then it's still very inefficient, since you're running two queries where one really should be enough.

It's simply a case of returning data from the time index before and the time index after your search range, and is a core feature of most time-series historian products out there in the commercial space that support compressed or sparse data (eDNA, Wonderware Historian, OSI PI, IP21 etc.).

@OptrixAU
Copy link

@Kuzj - And a minor note - with the GROUP BY statement you've got there, there is a distinct chance that your 2nd query will be out-of-phase with your 1st, resulting in that last timestamp not being regularly-spaced compared to the timestamps from the outer query.

Ie. the samples in your time range begin at 10:45:22, while the samples in your entire history (which need to be scanned for that 2nd result to work) begin at 15:53:48, meaning that your last sample will be out-of-step with the others.

If the system natively supported dealing with sparse data, it could help avoid this by resampling/back-filling/forward-filling the sparse data before the GROUP BY is performed.

@silviuchingaru
Copy link

A solution working for me (at least in Grafana - State timelines) with intervals ending on previous dates, not only now():
Add 2 queries, one for last known value (ignore the calculation *2 in print screens, i use it to display 2 colors, my graph has in fact 4 queries but for fixing this, 2 will be enough):

A:
SELECT last("value") FROM /^[A-z]+_HeatingRequest$/ WHERE time <= '${__from:date}'
B:
SELECT last("value") FROM /^[A-z]+_HeatingNeed$/ WHERE $timeFilter GROUP BY time($__interval) fill(none)

A query selects last known value before selected interval or the first one (i thing it will be best with < instead of <=).
B is the regular query for values in time interval.

And add the following transformations:
1: Group by and select Group by into all returned fields
2: Series to rows
3: Prepare time series
image
image

@dmitriypo1989
Copy link

A solution working for me (at least in Grafana - State timelines) with intervals ending on previous dates, not only now(): Add 2 queries, one for last known value (ignore the calculation *2 in print screens, i use it to display 2 colors, my graph has in fact 4 queries but for fixing this, 2 will be enough):

A:
SELECT last("value") FROM /^[A-z]+_HeatingRequest$/ WHERE time <= '${__from:date}'
B:
SELECT last("value") FROM /^[A-z]+_HeatingNeed$/ WHERE $timeFilter GROUP BY time($__interval) fill(none)

A query selects last known value before selected interval or the first one (i thing it will be best with < instead of <=). B is the regular query for values in time interval.

And add the following transformations: 1: Group by and select Group by into all returned fields 2: Series to rows 3: Prepare time series image image

Hi! if it is possible- explain, what should be changed in your example according to other project. And code from your example is different in comparison with code at your picture.

@sauermouth
Copy link

@fiftyz Kudos for your work-around!

I have a setpoint of an airco unit that produces sparse data.

I tried your workaround using a normal time series graph:
image
image

It kind of works. The graph is drawing okay, but the tooltip is showing the wrong data.
image

It looks like the tooltip data is in reverse order, while the graph does show it right. The data in de query explorer is also showing the data in descending order:
image

I tried putting a 'Sort by' transformation on column 'Time', but that breaks the work-around.
Anyone else can find the last piece of the puzzle?

It's really unfortunate the community has to deal with this issue for nearly 6 years!!

@matejsp
Copy link

matejsp commented Jan 19, 2022

It is really sad that this issue is still not resolved.

In my previous company in 2021 we migrated from InfluxDB to VictoriaMetrics.
InfluxDB was having constant issues with timeouts when ingesting large amounts of metrics. VictoriaMetrics was around 7x times faster without an error using the same protocol. Time to import busines data from our relational DB to timeseries DB went from 10 hours to 1.5 hours.

And now in my new company we also use InfluxDB that is having constant stability problems. Our sysops team prepared proof of concept with VictoriaMetrics. It supports InfluxDB protocol and free clustering!

@jacobhallgren
Copy link

  • 1 please make a fix for this

@garcipat
Copy link

garcipat commented Mar 9, 2022

I cant beleave this is an Issue that i open since about 5 years. At least provide a query that works to join first, last and the data with the query language of Influxdb2 (the one with the piping) would be nice.

@darrepac
Copy link

darrepac commented May 1, 2022

Still no news?? would be good to have fill(previous) working as expected...

@silviuchingaru
Copy link

Actualy there is an easy workaround for this or even can be marked as "worked as designed" in InfluxDB 2.4 at least using Flux as query language.
The logic is:

  1. query for last non empty record for each table stream before the v.timeRangeStart to have a previous value to use in fill;
  2. then make the actual query with aggregateWindow from that point to v.timeRangeStart
  3. filter only values from v.timeRangeStart to v.timeRangeStop.

I'll put a print screen from my code in Grafana with each line commented (I used InfluxDB editor because Grafana's one does not know to format the comments but the code was actaully copied from Grafana):
image
And the code for copy + paste 😉:

// v.windowPeriod is a variable referring to the current optimized window period (currently: $interval)
// Import date library of Flux to have access to date.sub()
import "date"
// Set the measurement variable to query data for. I use REGEX for this query, but could also be just a string like:
// measurement = "Warehouse_Light1", take not about double quote (").
measurement = /^Warehouse_Light[1-4]$/
// Query for last record before v.timeRangeStart, to ensure we will have a value to fill on our graph interval.
last_record = from(bucket: v.bucket)
  // I set the range for one week (1w) but if your sensor is reporting more frequently,
  // for example at least once a day, you could substract only one day for example (d: 1d),
  // one minute (d: 1m) or one hour (d: 1h), and so on.
  // For performance reason, this value I think should be as short as possible but long enough
  // just to have one record for each sensor (they have different streams, see below).
  |> range(start: date.sub(d: 1w, from: v.timeRangeStart), stop: v.timeRangeStart)
  // Filter results by measurement variable set above. Take care at logic operator, if you
  // use REGEX should be like below (=~) but if you use a string in measurement variable
  // above, should be ==.
  |> filter(fn: (r) => r["_measurement"] =~ measurement)
  // We want only last record.
  |> last()
  // Next two lines (map() and sort()) are needed only if the query returns multiple streams like
  // in my case where I query multiple sensors (Warehouse_Light1, Warehouse_Light2, ...) using 
  // REGEX because InfluxDB returns one table stream per measurement. 
  // Using map() here I actualy "flattern" the results to a single table stream, but if your
  // query has only one table stream (you don't use REGEX or filter by multiple sensors), there is no need for this.
  // It will work also with those two lines but I definitely recommend to remove them
  // for performance reasons if you have a single table stream returned. There is no need to
  //  map this data in memory if only one table stream is already returned.
  |> map(fn: (r) => ({_time: r._time, measurement: r._measurement}))
  // Sort results by time (sort() default order is ASCending) so we will have series like:
  // _time                  measurement
  // 2022-10-06 00:00:00    Warehouse_Light6 - WE ARE interested only by this row (see below) 
  // 2022-10-06 00:01:00    Warehouse_Light2
  // ...
  // 2022-10-06 00:01:34    Warehouse_Light1
  |> sort(columns: ["_time"])
  // Return only the first record, the one who reported the most in the past, to make sure
  // we include in the aggregateWindow() previous data for all sensors.
  |> findRecord(fn: (key) => true, idx: 0)

// This is the actaul query with data aggregated as requested by graph, starting from
// time of last record instead of v.timeRangeStart, but we put it into last_to_end variable
// to filter it by the graph requirements once more.
last_to_end = from(bucket: v.bucket)
  // Set the range of the query from last record instead of graph start to have a value for
  // usePrevious in fill().
  |> range(start: last_record._time, stop: v.timeRangeStop)
  // Should be the same like in last_record, take care at == (string) or =~ (REGEX)
  |> filter(fn: (r) => r["_measurement"] =~ measurement)
  // Aggregate the data like required by the graph and it's period
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: true)
  // Fill all NULL periods created above by using createEmpty: true in aggregateWindow()
  // with previous values.
  |> fill(usePrevious: true)

// Finaly, return the data above to graph filtered by graph v.timeRangeStart and v.timeRangeStop,
// not the one above that has data from the sensor last read.
last_to_end
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)

Happy graphing 😉!

@OptrixAU
Copy link

OptrixAU commented Oct 6, 2022

@fiftyz interesting idea, but A) That is a very complex query, and there should be a MUCH simpler method for such an obvious use case, and B) I often query for large numbers of channels, and I've found that performance for queries like this scales HORRIBLY.

@Kortenbach
Copy link

@fiftyz First of all, thank you for this elaborate post with excellent comments!
However... Your post illustrates exactly why this functionality should be built into InfluxDB. It's such a basic thing but not trivial to get working perfectly.
The user shouldn't be bothered with the quirks of sparse data storage. He should be able to make a (visible!) graph even if the first datapoint is a week before and the next a week after the graph period. (Yes, not only before but also after. I would make your query even more complex)

@silviuchingaru
Copy link

@fiftyz interesting idea, but A) That is a very complex query, and there should be a MUCH simpler method for such an obvious use case, and B) I often query for large numbers of channels, and I've found that performance for queries like this scales HORRIBLY.

@OptrixAU
A) Hmmm... yes, it is complex, it took me some time to build all together but maybe was my poor experience in Flux. But now, if it is documented and should be easy for everyone.
B) I'm also querying multiple devices at once (multiple table streams) and because we reduce the query by using last, it should not be a problem. It should be tested more by community here...

@fiftyz First of all, thank you for this elaborate post with excellent comments! However... Your post illustrates exactly why this functionality should be built into InfluxDB. It's such a basic thing but not trivial to get working perfectly. The user shouldn't be bothered with the quirks of sparse data storage. He should be able to make a (visible!) graph even if the first datapoint is a week before and the next a week after the graph period. (Yes, not only before but also after. I would make your query even more complex)

@Kortenbach
It will be great if it will be included in core (probably by adding it in fill() logic, maybe with a parameter), but until then, I don't like truncated graphs for intervals with no start data 😉 .

I don't understand why would you need it after graph stop. Can you give a print screen with a graph example? Maybe I can help...

@Kortenbach
Copy link

@fiftyz Sorry, no printscreen, but I'll try to explain.

First the easy one (you just need the record before the start of the graph)
Suppose I have a machine that's on or off. At every status change a message is sent via MQTT. That message will become a logging.
Now, the user makes a graph of the energy use of his factory. The graph includes the status of the machine (on or off).
The graph shows the status of the machine on Friday, however the machine turned on on Wednesday and turned off a week later so InfluxDB would have to look at the status that was logged on Wednesday to get the correct machine status. The graph will then assume the Wednesday status is the Friday status. Case closed.

Now suppose that it's not an on/off status but a capacity indicator (0..100%). The capacity changes a few percent per minute and is logged every 5 minutes. If the graph ends one minute before the last log then the last 4 minutes of the graph will show no line. If Influx would look ahead it would find the log that's 1 minute after the graph ends and use interpolation to estimate the capacity at the end of the graph.

I must admit that case B is a lot less annoying but if you're going to address the problem then it should be taken into consideration I think.

@Kortenbach
Copy link

Kortenbach commented Oct 6, 2022

Oh,
Here is a suggestion taken from flux issue "Add previousN and nextN to range #702"

import "date"

prefix = from(bucket: "Home")
  |> range(start: date.sub(d: 24h, from: v.timeRangeStart), stop: v.timeRangeStart)
  |> filter(fn: (r) => r["_measurement"] == "Zigbee")
  |> filter(fn: (r) => r["_field"] == "Temperature")
  |> last()
  |> map(fn: (r) => ({r with _time: v.timeRangeStart}))
data_range = from(bucket: "Home")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Zigbee")
  |> filter(fn: (r) => r["_field"] == "Temperature") 

union(tables: [prefix, data_range])   
  |> group(columns:["Device", "EndPoint","_field","_measurement"])

@OptrixAU
Copy link

OptrixAU commented Oct 6, 2022

@Kortenbach I believe the forward-loading is going to be significantly less common. There may be some fringe cases, but you can't really be confident about when the change actually occured, so interpolating into the future might be misleading - particularly over longer time frames.

A lot of devices (like LORA sensors) will send an update on an exception or major change, so appreciable changes would be captured.

For me, I'd rather show the value that I KNOW it was at the last sample time rather than the value it possibly had. If you honestly need the higher level of detail, better to actually sample at that detail rather than extrapolate.

It's probably worth noting that industrial historians like OSI PI understand the expected sample time from each measurement channel, so they can tell when interpolation is desirable - so when you've got a sensor that samples every 10 seconds it will interpolate samples that are ~10 seconds apart, but won't interpolate samples that are 20 seconds apart. But Influx doesn't have this kind of facility.

@Kortenbach
Copy link

@OptrixAU I agree. The previous sample is much more important than the one after.

@silviuchingaru
Copy link

Oh, Here is a suggestion taken from flux issue "Add previousN and nextN to range #702"

import "date"

prefix = from(bucket: "Home")
  |> range(start: date.sub(d: 24h, from: v.timeRangeStart), stop: v.timeRangeStart)
  |> filter(fn: (r) => r["_measurement"] == "Zigbee")
  |> filter(fn: (r) => r["_field"] == "Temperature")
  |> last()
  |> map(fn: (r) => ({r with _time: v.timeRangeStart}))
data_range = from(bucket: "Home")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Zigbee")
  |> filter(fn: (r) => r["_field"] == "Temperature") 

union(tables: [prefix, data_range])   
  |> group(columns:["Device", "EndPoint","_field","_measurement"])

@Kortenbach, with this solution, if you have the following time series:

_time                  _measurement    _value
2022-10-05 23:59:00    Warehouse_Temp  23 - with the prefix query, you are selecting this
2022-10-06 00:00:00    Warehouse_Temp  21 - and you map() it with _time of this
2022-10-06 00:01:00    Warehouse_Temp  19
...
2022-10-06 00:01:34    Warehouse_Temp 17

Assuming you want to graph time between 2022-10-06 00:00:00 to 2022-10-06 01:00:00, with this approach, with the prefix query, you will select record with time 2022-10-05 23:59:00 with value 23 and set it's time with map() to 2022-10-06 00:00:00, start of the graph, and with the main query you will select from record 2022-10-06 00:00:00 with value 21 below and union them together. So now you will have two records with time 2022-10-06 00:00:00 like this:

_time                  _measurement    _value
2022-10-00 00:00:00    Warehouse_Temp  23 - this should not be here
2022-10-06 00:00:00    Warehouse_Temp  21 - this should be here
2022-10-06 00:01:00    Warehouse_Temp  19
...
2022-10-06 00:01:34    Warehouse_Temp 17

You will have two records with same time at the begining of the graph, which is definitely inaccurate.
Also, by using map(), I think (I did not test it) that you will flattern queries with multiple measurements (with multiple table streams) to one so queries with REGEX for example won't work if they will return multiple measurements not a single measurement.

@Kortenbach
Copy link

Thanks for pointing that out!

@garcipat
Copy link

garcipat commented Nov 2, 2022

Is there any progress here? Im dealing with workarounds that dont really work as I need.

fields = ["field1"]
prev = from(bucket:"MyBucket")
|> range(start: 2022-10-31T12:57:43Z, stop: 2022-11-02T12:57:43Z)
|> last()
|> duplicate(column: "_stop", as: "_time")
values = from(bucket:"MyBucket")
|> range(start: 2022-11-02T12:57:43Z, stop: 2022-11-02T13:57:43Z)
|> aggregateWindow(every: 1s, fn: last, createEmpty: true, offset: 1ns)
|> fill(usePrevious: true)
union(tables: [prev, values])
|> filter(fn: (r) => r["_measurement"] == "measurement")
|> filter(fn: (r) => contains(value: r._field, set: fields))
|> drop(columns: ["_start", "_stop"])

I need to have one value every second, thats why I added the aggregate there. But the fill previous doesnt take the datapoint from the first query in account. Moving the aggregate after the union leads to extreme query times. (dont even know if it works it takes so long.)

@silviuchingaru
Copy link

Did you look at your prev data in a table or explore? From were did you get column _stop?! It does not exist in your prev query.
Also you are querying last() from all measurements and definitely is not what you want.
You could use map to change the time of prev record with something like:

prev = from(bucket:"MyBucket")
  |> range(start: 2022-10-31T12:57:43Z, stop: 2022-11-02T12:57:43Z)
  // Filter your prev data like in values with something like:
  |> filter(fn: (r) => r["_measurement"] == "measurement")
  |> last()
  // Rewrite tame of last with time of begining of data
  |> map(fn: (r) => ({r with _time: 2022-11-02T12:57:43Z}))
values = from(bucket:"MyBucket")
  |> range(start: 2022-11-02T12:57:43Z, stop: 2022-11-02T13:57:43Z)
  // No measurement filter here?!?! What are you trying to graph, all the bucket?!? Should be something like:
  |> filter(fn: (r) => r["_measurement"] == "measurement")
  |> aggregateWindow(every: 1s, fn: last, createEmpty: true, offset: 1ns)
  |> fill(usePrevious: true)
union(tables: [prev, values])
  // Why you need those 3 lines below?! You already have the filtered data
  //|> filter(fn: (r) => r["_measurement"] == "measurement")
  //|> filter(fn: (r) => contains(value: r._field, set: fields))
  //|> drop(columns: ["_start", "_stop"])

Or you could start filling from previous data, without union, as I mention above:

prev = from(bucket:"MyBucket")
  |> range(start: 2022-10-31T12:57:43Z, stop: 2022-11-02T12:57:43Z)
  // Filter your prev data like in values with something like:
  |> filter(fn: (r) => r["_measurement"] == "measurement")
  |> last()
  |> map(fn: (r) => ({r with _time: 2022-11-02T12:57:43Z}))
last_to_end = from(bucket: v.bucket)
  // Set the range of the query from last record instead of graph start to have a value for
  // usePrevious in fill().
  |> range(start: prev._time, stop: v.timeRangeStop)
  // Should be the same like in last_record, take care at == (string) or =~ (REGEX)
  |> filter(fn: (r) => r["_measurement"] == "measurement")
  // Aggregate the data like required by the graph and it's period
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: true)
  // Fill all NULL periods created above by using createEmpty: true in aggregateWindow()
  // with previous values.
  |> fill(usePrevious: true)

// Finaly, return the data above to graph filtered by graph v.timeRangeStart and v.timeRangeStop,
// not the one above that has data from the sensor last read.
last_to_end
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)

schwabix-1311 added a commit to schwabix-1311/aquaPI that referenced this issue May 3, 2023
Alltough Influx allows irregular time series, downsampling does not work
really: if you query a time span it may omit a measurement from
result, if now data point was encountered in this span.
Likewise, if no measurement occured in the earliest grouped by
intervali(s) empty fields will be returned.
There is frustrated discussion in
influxdata/influxdb#6878 and no progress since ~6 years.

Will evaluate QuestDB in a side branch.
schwabix-1311 added a commit to schwabix-1311/aquaPI that referenced this issue May 12, 2023
Alltough Influx allows irregular time series, downsampling does not work
really: if you query a time span it may omit a measurement from
result, if now data point was encountered in this span.
Likewise, if no measurement occured in the earliest grouped by
intervali(s) empty fields will be returned.
There is frustrated discussion in
influxdata/influxdb#6878 and no progress since ~6 years.

Will evaluate QuestDB in a side branch.
schwabix-1311 added a commit to schwabix-1311/aquaPI that referenced this issue May 30, 2023
Alltough Influx allows irregular time series, downsampling does not work
really: if you query a time span it may omit a measurement from
result, if now data point was encountered in this span.
Likewise, if no measurement occured in the earliest grouped by
intervali(s) empty fields will be returned.
There is frustrated discussion in
influxdata/influxdb#6878 and no progress since ~6 years.

Will evaluate QuestDB in a side branch.
schwabix-1311 added a commit to schwabix-1311/aquaPI that referenced this issue May 30, 2023
Alltough Influx allows irregular time series, downsampling does not work
really: if you query a time span it may omit a measurement from
result, if now data point was encountered in this span.
Likewise, if no measurement occured in the earliest grouped by
intervali(s) empty fields will be returned.
There is frustrated discussion in
influxdata/influxdb#6878 and no progress since ~6 years.

Will evaluate QuestDB in a side branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests