New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should be able to force recalculation of continuous query for given time interval #211
Comments
fix: wait for all goroutines to finish before Stop
Manual triggering is nice, but shouldn't this happen automagically? |
Does manual triggering also imply by symmetry that there should be a way to pause the processing of particular (or all) continuous queries? |
The intention of this issue to recalculate the output of the continuous query if old data has changed and has nothing to do with pausing the continuous query. |
Yes. I understood that. But it is often useful to introduce complementary features at the same time. Continuous queries and similar mechanisms typically fail for two reasons. One is that data is delayed. The other is due to some sort of overload or failure condition outside of the query computation itself. The first case is best handled by having a proper trigger mechanism for continuous queries so that a query re-runs automatically if new data is inserted into a previously completed window. The second condition may require a manual trigger if the results were somehow incorrect due to the failure mode, but it is also common that you need to remove live loads before correcting the problem. Similarly, when trying to recover from a situation, it is very nice to be able to get the system to sit still while you are working to repair it. Having continuous queries fire while you are working can make a proper fix very difficult. Thus, a pause is a very reasonable complement to a manually forced recomputation. |
I'd be very interested in this functionality being implemented. I'd like to use influxdb in a project where users import data in bulk and we then run analytics on the data. Do you have a rough estimate on when anyone will work on this? PS: Thanks for the great work, playing around with influxdb has been a pleasure and I'd be extremely happy to use it productively :) |
Guys, I'm using 0.8.8 and I think I'm seeing something similar to what's described here. [2015/03/31 17:01:53 BST] INFO Start Query: db: core, u: root, q: select min(latency) as pct0,percentile(latency,50) as pct50,percentile(latency,75) as pct75,percentile(latency,90) as pct90,percentile(latency,99) as pct99,max(latency) as pct100 from "myapp" where (time < 1427817713000000000) AND (time > 1427817712000000000) group by time(1s) If you bring it into human format, you see that is trying to roll up 3/31/2015 5:01:52 - 3/31/2015 17:01:53 PM. That is basically [T-1s,T] but the last data point I loaded is T-120s, so no rollups. Never, ever. At this point I am thinking the only workaround is to drop the continuous query myself and try to do some backfilling overnight, but it's ugly. How did you guys get around the problem? |
I haven't looked at the implementation, but I think the main issue here is that continuous queries run on a schedule rather than being triggered by the loading of some new data and this leads to invalid results. Essentially the output of a continuous query should be considered valid only if there's a newer value outside the time window in the original series. |
@kerush You're right on this, and we've done a bit of rewriting of continuous queries in v0.9.0. There will be a configurable lag on running continuous queries, and we'll probably implement some sort of automatic, time-based retriggering in additional manual retriggering. |
Thanks @toddboom, that's a good news. I'll be waiting for 0.9 to be released then. I think continuous queries is really the killer feature of influxdb over hybrid solutions like cassandra+spark. For this reason, their scheduling really needs to be event-based rather than clock-based, both for performance and consistency reasons. I hope you're going down that path. Thanks again. |
+1 Thanks for all the great work so far! |
+1 |
I agree with a lot of what has been proposed in here. As for the original feature description and taking into account all the other things mentioned, I'd say this is a must have. |
+1, Query's like "SELECT mean(value) INTO feeds_mean_1h FROM feeds GROUP BY time(1h), *" should work too! |
+1 |
+1 |
Right now, CQs don't validate statements that are created, so you can create an invalid query. Obviously, for backfill, you need to have a valid query, but should I add this restriction as well for all CQs? It should be easy to do since I'm adding a loop into the tsdb anyway. The only reason I can see why I shouldn't is that someone might want to create a CQ that will become valid in the future, but that's a weird edge case and validating CQs eliminate so many annoyances. |
If you take an example from another domain, JDBC validates prepared queries With JDBC, the time between preparation is typically smaller than with On Wed, Sep 23, 2015 at 2:46 AM, Daniel Morsing notifications@github.com
|
Given that this functionality isn't yet available, but appears to be recognized as important, what hacks/workarounds are possible to create summary rollups from influxdb series? I had been thinking of either:
Are there any other approaches that people are using? |
As I am loading the Data from a java process I actually use the https://github.com/influxdb/influxdb-java library to generate LineProtocol and backfill the data in measurement I am using for the continuous query |
#4454 will be a strong mitigation feature for this need |
mark |
As I understood, if I write the data in batches every 10 seconds and there is a CQ that rolls up by 1 min intervals, that CQ will possibly miss a few seconds worth of data (10 seconds in worst case) that falls at the end of every minute, right? |
@hoomanv what you describe is not quite accurate. See https://github.com/influxdb/docs.influxdata.com/blob/extended_cqconfig_options/content/influxdb/v0.9/query_language/continuous_queries_config.md for the work in progress doc that describes the CQ config settings. If you have the default CQ settings and a 10s CQ, then three queries will run every two minutes, each grabbing 10s worth of points. You will get 30s of good downsampled data and miss the other 90s of each 120s. In order to actually capture all the data, you need to lower the |
Thanks @beckettsean I didn't know about the upcomming CQ configs |
Hi @beckettsean, Is this still on your roadmap? If yes, what's the timeplan for this? As we are in POC phase with influxdb and elasticsearch the continuous queries feature would be one argument for influxdb. If it would be feature complete - means also backfilling would be possible. |
@zp-markusp the new CQ syntax allows you to define the look-back interval for each CQ individually. In addition, the There's no mechanism for triggering a backfill based on an out of order points, the backfill is either always on (CQ) or manually triggered (INTO). |
Closing this since the |
If users have a continuous query running and they fill in data from a previous interval, they should be able to trigger recalculation of a continuous query for a given interval of time.
Maybe something like this:
The text was updated successfully, but these errors were encountered: