Should be able to force recalculation of continuous query for given time interval #211

pauldix · 2014-01-28T20:23:55Z

If users have a continuous query running and they fill in data from a previous interval, they should be able to trigger recalculation of a continuous query for a given interval of time.

Maybe something like this:

replay :query_id where time > '2013-11-01' and time < '2013-12-01'

fix: wait for all goroutines to finish before Stop

tdunning · 2014-08-27T16:08:19Z

Manual triggering is nice, but shouldn't this happen automagically?

tdunning · 2014-08-27T16:08:46Z

Does manual triggering also imply by symmetry that there should be a way to pause the processing of particular (or all) continuous queries?

jvshahid · 2014-08-29T17:47:26Z

The intention of this issue to recalculate the output of the continuous query if old data has changed and has nothing to do with pausing the continuous query.

tdunning · 2014-08-29T18:20:01Z

Yes. I understood that. But it is often useful to introduce complementary features at the same time.

Continuous queries and similar mechanisms typically fail for two reasons. One is that data is delayed. The other is due to some sort of overload or failure condition outside of the query computation itself. The first case is best handled by having a proper trigger mechanism for continuous queries so that a query re-runs automatically if new data is inserted into a previously completed window. The second condition may require a manual trigger if the results were somehow incorrect due to the failure mode, but it is also common that you need to remove live loads before correcting the problem. Similarly, when trying to recover from a situation, it is very nice to be able to get the system to sit still while you are working to repair it. Having continuous queries fire while you are working can make a proper fix very difficult.

Thus, a pause is a very reasonable complement to a manually forced recomputation.

ghost · 2014-11-05T10:19:21Z

I'd be very interested in this functionality being implemented. I'd like to use influxdb in a project where users import data in bulk and we then run analytics on the data. Do you have a rough estimate on when anyone will work on this?
It would be awesome if this could happen automagically after an insert, but I can see how that might create a huge overhead for inserts. Another convenient way might be if one could give a hint during insert that there's continuous queries that need to be re-run (thus automatically inferring the interval).

PS: Thanks for the great work, playing around with influxdb has been a pleasure and I'd be extremely happy to use it productively :)

kerush · 2015-03-31T16:12:32Z

Guys, I'm using 0.8.8 and I think I'm seeing something similar to what's described here.
Say now is time T and I'm loading data that is slightly delayed (say T-2m) and I want to roll it up every second. I have the continuous query running, I see it triggering:

[2015/03/31 17:01:53 BST] INFO Start Query: db: core, u: root, q: select min(latency) as pct0,percentile(latency,50) as pct50,percentile(latency,75) as pct75,percentile(latency,90) as pct90,percentile(latency,99) as pct99,max(latency) as pct100 from "myapp" where (time < 1427817713000000000) AND (time > 1427817712000000000) group by time(1s)

If you bring it into human format, you see that is trying to roll up 3/31/2015 5:01:52 - 3/31/2015 17:01:53 PM. That is basically [T-1s,T] but the last data point I loaded is T-120s, so no rollups. Never, ever.

At this point I am thinking the only workaround is to drop the continuous query myself and try to do some backfilling overnight, but it's ugly.

How did you guys get around the problem?

kerush · 2015-03-31T16:30:55Z

I haven't looked at the implementation, but I think the main issue here is that continuous queries run on a schedule rather than being triggered by the loading of some new data and this leads to invalid results. Essentially the output of a continuous query should be considered valid only if there's a newer value outside the time window in the original series.
Practically, when new data is loaded we should check whether the new point expires any window defined by a continuous query attached to the series, and only then we can execute the query, store the result and advance the time window expiry time.

toddboom · 2015-03-31T17:05:57Z

@kerush You're right on this, and we've done a bit of rewriting of continuous queries in v0.9.0. There will be a configurable lag on running continuous queries, and we'll probably implement some sort of automatic, time-based retriggering in additional manual retriggering.

kerush · 2015-03-31T17:20:42Z

Thanks @toddboom, that's a good news. I'll be waiting for 0.9 to be released then.

I think continuous queries is really the killer feature of influxdb over hybrid solutions like cassandra+spark. For this reason, their scheduling really needs to be event-based rather than clock-based, both for performance and consistency reasons. I hope you're going down that path.

Thanks again.

vvakar · 2015-04-29T21:13:44Z

+1
There's bound to be some lag between the time data is collected vs loaded. If continuous queries don't take that into consideration, a portion of the data fill be unaccounted for, requiring the query to be recreated. Looking forward to 9.0!

Thanks for all the great work so far!

jbothe · 2015-08-07T04:21:03Z

+1

mobarre · 2015-08-07T09:06:14Z

I agree with a lot of what has been proposed in here. As for the original feature description and taking into account all the other things mentioned, I'd say this is a must have.

comcomservices · 2015-08-13T20:30:32Z

+1, Query's like "SELECT mean(value) INTO feeds_mean_1h FROM feeds GROUP BY time(1h), *" should work too!

dstreppa · 2015-09-02T10:36:25Z

+1

humcguire · 2015-09-13T11:59:01Z

+1

DanielMorsing · 2015-09-23T09:46:32Z

Right now, CQs don't validate statements that are created, so you can create an invalid query. Obviously, for backfill, you need to have a valid query, but should I add this restriction as well for all CQs? It should be easy to do since I'm adding a loop into the tsdb anyway.

The only reason I can see why I shouldn't is that someone might want to create a CQ that will become valid in the future, but that's a weird edge case and validating CQs eliminate so many annoyances.

tdunning · 2015-09-23T14:42:48Z

If you take an example from another domain, JDBC validates prepared queries
even though they aren't yet being executed. The same argument that they
might be valid later applies and nobody thinks that it is worth allowing
temporarily invalid queries.

With JDBC, the time between preparation is typically smaller than with
continuous queries, but not necessarily all that short. For a long running
server, it could be weeks.

On Wed, Sep 23, 2015 at 2:46 AM, Daniel Morsing notifications@github.com
wrote:

Right now, CQs don't validate statements that are created, so you can
create an invalid query. Obviously, for backfill, you need to have a valid
query, but should I add this restriction as well for all CQs? It should be
easy to do since I'm adding a loop into the tsdb anyway.

The only reason I can see why I shouldn't is that someone might want to
create a CQ that will become valid in the future, but that's a weird edge
case and validating CQs eliminate so many annoyances.

—
Reply to this email directly or view it on GitHub
#211 (comment).

pbooth · 2015-10-12T16:36:52Z

Given that this functionality isn't yet available, but appears to be recognized as important, what hacks/workarounds are possible to create summary rollups from influxdb series?

I had been thinking of either:

using a script to periodically issue a query and convert the results into a LineProtocol file that qwould be uploaded with curl to a different Influx instance
using a script to periodically issue a query and write the results to a whisper DB

Are there any other approaches that people are using?

ivanscattergood · 2015-10-12T18:32:58Z

As I am loading the Data from a java process I actually use the https://github.com/influxdb/influxdb-java library to generate LineProtocol and backfill the data in measurement I am using for the continuous query

beckettsean · 2015-10-16T18:17:51Z

#4454 will be a strong mitigation feature for this need

ryanjin · 2015-11-08T15:51:15Z

mark

hoomanv · 2015-12-08T10:58:49Z

As I understood, if I write the data in batches every 10 seconds and there is a CQ that rolls up by 1 min intervals, that CQ will possibly miss a few seconds worth of data (10 seconds in worst case) that falls at the end of every minute, right?
I suggest a configurable delayed execution strategy for the CQs so that practically we allow more data to arrive to fill in the last gaps.

beckettsean · 2015-12-15T01:56:40Z

@hoomanv what you describe is not quite accurate. See https://github.com/influxdb/docs.influxdata.com/blob/extended_cqconfig_options/content/influxdb/v0.9/query_language/continuous_queries_config.md for the work in progress doc that describes the CQ config settings.

If you have the default CQ settings and a 10s CQ, then three queries will run every two minutes, each grabbing 10s worth of points. You will get 30s of good downsampled data and miss the other 90s of each 120s.

In order to actually capture all the data, you need to lower the compute-no-more-than to something like 30s, or raise the recompute-previous-n to something like 12.

hoomanv · 2015-12-15T10:18:54Z

Thanks @beckettsean I didn't know about the upcomming CQ configs

zp-markusp · 2016-03-13T06:36:40Z

Hi @beckettsean,

Is this still on your roadmap? If yes, what's the timeplan for this? As we are in POC phase with influxdb and elasticsearch the continuous queries feature would be one argument for influxdb. If it would be feature complete - means also backfilling would be possible.

beckettsean · 2016-03-15T23:56:24Z

@zp-markusp the new CQ syntax allows you to define the look-back interval for each CQ individually. In addition, the INTO keyword, documented on that same page, allows for ad hoc backfill.

There's no mechanism for triggering a backfill based on an out of order points, the backfill is either always on (CQ) or manually triggered (INTO).

beckettsean · 2016-03-15T23:57:05Z

Closing this since the INTO keyword addresses the need. A particular CQ cannot be triggered, but any valid query in a CQ can be run with the INTO keyword to accomplish the same end result.

jvshahid modified the milestones: 0.8.0, 0.7.0 May 2, 2014

pauldix mentioned this issue Jun 24, 2014

Add backfill option to into clause parser #662 #682

Closed

jvshahid pushed a commit that referenced this issue Aug 12, 2014

Merge pull request #211 from unihorn/55

585c580

fix: wait for all goroutines to finish before Stop

jvshahid modified the milestones: 0.8.0, Next release Aug 25, 2014

jvshahid removed this from the Next release milestone Oct 9, 2014

pauldix removed this from the Next release milestone Oct 9, 2014

jvshahid added engine and removed bug labels Oct 9, 2014

toddboom added area/continuous queries idea and removed engine area/continuous queries labels Nov 25, 2014

beckettsean added this to the Next Point Release milestone Apr 8, 2015

beckettsean removed the idea label Apr 8, 2015

beckettsean modified the milestones: Longer term, Next Point Release Aug 6, 2015

dstreppa mentioned this issue Sep 2, 2015

[0.9.3] Slowness on SELECT queries #3865

Closed

pauldix modified the milestones: 0.9.5, Longer term Sep 21, 2015

pauldix assigned DanielMorsing Sep 21, 2015

ivanscattergood mentioned this issue Sep 30, 2015

[feature request] Insert new tags to existing values, like update #3904

Closed

ryanjin unassigned DanielMorsing Nov 8, 2015

pauldix modified the milestones: 0.10.0, 0.9.5 Dec 8, 2015

jwilder removed this from the 0.10.0 milestone Feb 1, 2016

beckettsean closed this as completed Mar 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should be able to force recalculation of continuous query for given time interval #211

Should be able to force recalculation of continuous query for given time interval #211

pauldix commented Jan 28, 2014

tdunning commented Aug 27, 2014

tdunning commented Aug 27, 2014

jvshahid commented Aug 29, 2014

tdunning commented Aug 29, 2014

ghost commented Nov 5, 2014

kerush commented Mar 31, 2015

kerush commented Mar 31, 2015

toddboom commented Mar 31, 2015

kerush commented Mar 31, 2015

vvakar commented Apr 29, 2015

jbothe commented Aug 7, 2015

mobarre commented Aug 7, 2015

comcomservices commented Aug 13, 2015

dstreppa commented Sep 2, 2015

humcguire commented Sep 13, 2015

DanielMorsing commented Sep 23, 2015

tdunning commented Sep 23, 2015

pbooth commented Oct 12, 2015

ivanscattergood commented Oct 12, 2015

beckettsean commented Oct 16, 2015

ryanjin commented Nov 8, 2015

hoomanv commented Dec 8, 2015

beckettsean commented Dec 15, 2015

hoomanv commented Dec 15, 2015

zp-markusp commented Mar 13, 2016

beckettsean commented Mar 15, 2016

beckettsean commented Mar 15, 2016

Should be able to force recalculation of continuous query for given time interval #211

Should be able to force recalculation of continuous query for given time interval #211

Comments

pauldix commented Jan 28, 2014

tdunning commented Aug 27, 2014

tdunning commented Aug 27, 2014

jvshahid commented Aug 29, 2014

tdunning commented Aug 29, 2014

ghost commented Nov 5, 2014

kerush commented Mar 31, 2015

kerush commented Mar 31, 2015

toddboom commented Mar 31, 2015

kerush commented Mar 31, 2015

vvakar commented Apr 29, 2015

jbothe commented Aug 7, 2015

mobarre commented Aug 7, 2015

comcomservices commented Aug 13, 2015

dstreppa commented Sep 2, 2015

humcguire commented Sep 13, 2015

DanielMorsing commented Sep 23, 2015

tdunning commented Sep 23, 2015

pbooth commented Oct 12, 2015

ivanscattergood commented Oct 12, 2015

beckettsean commented Oct 16, 2015

ryanjin commented Nov 8, 2015

hoomanv commented Dec 8, 2015

beckettsean commented Dec 15, 2015

hoomanv commented Dec 15, 2015

zp-markusp commented Mar 13, 2016

beckettsean commented Mar 15, 2016

beckettsean commented Mar 15, 2016