Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] auto choose retention policies based on timestamp when querying #2625

Closed
liyichao opened this issue May 21, 2015 · 22 comments

Comments

@liyichao
Copy link

Now, select chooses the default retention policy if not specified. It will be better if select auto choose retention policies for the same series based on timestamp when the select statement does not specify retention policy. This will simplify dashboard tools. This is what graphite already does.

If we have to change retention policy when we want older data, it is tedious because we have to edit the dashboard definition.

When we select from a series, we do not care what retention policies it has, we just want the datapoints.

@liyichao liyichao changed the title [feature request] auto select retention policies when select [feature request] auto choose retention policies when select May 21, 2015
@liyichao liyichao changed the title [feature request] auto choose retention policies when select [feature request] auto choose retention policies based on timestamp when select May 21, 2015
@beckettsean
Copy link
Contributor

A time range alone is not sufficient to identify which retention policy is desired. There is nothing to prevent two series with identical measurement name and tag sets from existing in separate retention policies with overlapping time ranges. Therefore it is not possible for the system to know which series is intended if the retention policy is not provided.

A workaround for now is to keep all data for a given dashboard in the same retention policy. It does require maintaining multiple dashboards.

@beckettsean beckettsean added this to the Longer term milestone Sep 17, 2015
@beckettsean beckettsean changed the title [feature request] auto choose retention policies based on timestamp when select [feature request] auto choose retention policies based on timestamp when querying Oct 28, 2015
@daviesalex
Copy link
Contributor

@beckettsean, this is certainly from our POV a fairly important feature and I dont think your workaround really works. Let me give an example of the problem we have - we capture some data every second. Lets say IO blocks out (which is in telegraf). You need data at 1 second granularity for some types of troubleshooting, but in most cases on a graph it would be crazy to worry about 1 second data.

Lets imagine a common case - a dashboard showing all metrics per server. It might default to show the last hour (3600 data points/server). Per day 86400, per month >2.5 million points. Per hour and perhaps per day will just work but nobody in their right mind would attempt to keep metrics at 1 second granularity over a year and then graph them - while InfluxDB can downsample them its going to have to pull a crazy number of metrics from disk for that query (and, in the real world, it would likely be >1 server on a graph; we also have plans to store data in some cases at a small number of microseconds delta). We also have a basic disk space problem - we are already capturing many hundreds of GB of 1s and 10s metrics per day.

The sane pattern is to keep 1 second for 24 hours, 1 minute for a week, 5 minute for a month and once an hour for a year (or something similar). This is how just about every other system (graphite, Ganglia, etc.) handle it. This we can sort of do with a Continuous query in InfluxDB, to copy the down sampled data to a new database (although we have to delete the 1 second data manually). The problem is now we have a Grafana problem - we can only query either downsampled, or original data, from a single graph. This means that a user who looks at a 1 hour graph (1 second granularity) then zooms out to see the last month and we have to change the database. Which Grafana does not support.

There are two ways to approach this:

  1. Teach Grafana about this concept (preferably somehow auto-learning that downsampled data exists in this other place, although more realistically defining that in Grafana)
  2. Provide a single "view" inside InfluxDB that merges the various levels of data
  3. Provide a way to down-sample data in InfluxDB after a certain period of time, sort of like a contiguous down-sampling job.

My personal preference would be (3), but I suspect thats not an architectural starter (although if you would be willing to accept that as an option, we might be able to find somebody to work on it and send you a PR). This leaves us with (1) or (2). THis ticket strikes me as asking for (1). DO you think its best to attack this via means of this, or to track a issue more like (2) (for the InfluxDB project)

cc @sebito91, @wrigtim

@PaulKuiper
Copy link

A different way of solving is is to build a proxy between grafana and influx (we need this anyway to check user acces). Parse out the group by, measurement name and agregate of the incomming request at the proxy and apply a rule to change the measurement name (prepend a retention or a custom string fitting your data structure). Send this query to influx instead of the original. I think this is the most pratical solution at the moment.

@daviesalex
Copy link
Contributor

@PaulKuiper, funnily enough thats exactly how we plan to achieve this (we also have the ACL problem).

Have you already worked on this? We may build this and open source it... or use somebody elses's if its already out there.

@PaulKuiper
Copy link

Attached is a python file (in txt format, else I could not upload it), which you can use as a simple proxy between grafana and influx.

It can greatly increase zoom speed. It assumes that the following continous queries are present for the measurement called "metric" :

metric.1s.max
metric.1m.max
metric.1h.max
metric.1d.max
metric.1h.mean
......

Point your "data source" to port 3004 (or whatever you choose) instead of port 8086 in Grafana.
The proxy will now change your query transparantly by choosing a different table when zooming out.
select max(value) from "metric" where time > x1 and time < x2 group by time(12h)
becomes:
select max(value) from "metric.1h.max" where time > x1 and time < x2 group by time(12h)

poxy.txt

@huhongbo
Copy link

+1

1 similar comment
@toni-moreno
Copy link

+1

@exeral
Copy link

exeral commented Feb 17, 2016

+1 !!
any feedback about PaulKuiper workaround ?

@PaulKuiper
Copy link

I'll update it for influxdb 0.10 somewhere this month

@adrianlzt
Copy link

+1

1 similar comment
@thbourlove
Copy link
Contributor

👍

Lupul added a commit to Lupul/influxdb-grafana-rp-proxy that referenced this issue Feb 27, 2016
@Lupul
Copy link

Lupul commented Feb 27, 2016

I've done some work on @PaulKuiper proxy to work with 0.10 and put it here:
https://github.com/Lupul/influxdb-grafana-rp-proxy

@adrianlzt
Copy link

With version 0.10 it's normal to have several values in each measurement.
In the proxy readme the CQs are just for only one value (called value)

CREATE CONTINUOUS QUERY graphite_cq_10sec  ON graphite BEGIN SELECT mean(value) as value INTO graphite."10sec".:MEASUREMENT  FROM graphite."default"./.*/ GROUP BY time(10s), * END

Any ideas how to handle where there are several values?

I was thinking in some batch processing with kapacitor which obtains all values for each measurement, and creates the appropriates CQs.

@adrianlzt
Copy link

I have make a small script to autogenerate RPs and CQs: https://gist.github.com/f4b6f5c8f6c2a51c3f60

@beckettsean
Copy link
Contributor

@adrianlzt in CQs each tag or field must be explicitly named. It is possible to use SELECT * to return all columns from an ad hoc query, but not in a CQ, as there is no aggregation function.

So, to downsample multiple fields, the CQ would look something like this:

CREATE CONTINUOUS QUERY graphite_cq_10sec  
ON graphite BEGIN 
SELECT mean(value) as value, last(value) as last, mean(value_23) as value_23, top(field19) as top
INTO graphite."10sec".:MEASUREMENT  
FROM graphite."default"./.*/ 
GROUP BY time(10s), * 
END

The GROUP BY * clause means that each downsampled value would be stored in a series with the same tag set as the original series. So, while the tags aren't explicitly queried, they will still be part of the downsampled series. Without the GROUP BY * clause above, all tags would be lost during downsampling. It is possible to name explicit tags in the GROUP BY, and then only those tags would be preserved.

@adrianlzt
Copy link

Will be this fixed in next versions?
It is hard to maintain downsampling in multiple databases with lots of
series with multiple values.

El vie., 4 de marzo de 2016 2:20, Sean Beckett notifications@github.com
escribió:

@adrianlzt https://github.com/adrianlzt in CQs each tag or field must
be explicitly named. It is possible to use SELECT * to return all columns
from an ad hoc query, but not in a CQ, as there is no aggregation function.

So, to downsample multiple fields, the CQ would look something like this:

CREATE CONTINUOUS QUERY graphite_cq_10sec
ON graphite BEGIN
SELECT mean(value) as value, last(value) as last, mean(value_23) as value_23, top(field19) as top
INTO graphite."10sec".:MEASUREMENT
FROM graphite."default"./.*/
GROUP BY time(10s), *
END


Reply to this email directly or view it on GitHub
#2625 (comment)
.

@beckettsean
Copy link
Contributor

Follow #5750, which is the
relevant issue.

On Fri, Mar 4, 2016 at 9:13 AM, Adrián López notifications@github.com
wrote:

Will be this fixed in next versions?
It is hard to maintain downsampling in multiple databases with lots of
series with multiple values.

El vie., 4 de marzo de 2016 2:20, Sean Beckett notifications@github.com
escribió:

@adrianlzt https://github.com/adrianlzt in CQs each tag or field must
be explicitly named. It is possible to use SELECT * to return all columns
from an ad hoc query, but not in a CQ, as there is no aggregation
function.

So, to downsample multiple fields, the CQ would look something like this:

CREATE CONTINUOUS QUERY graphite_cq_10sec
ON graphite BEGIN
SELECT mean(value) as value, last(value) as last, mean(value_23) as
value_23, top(field19) as top
INTO graphite."10sec".:MEASUREMENT
FROM graphite."default"./.*/
GROUP BY time(10s), *
END


Reply to this email directly or view it on GitHub
<
https://github.com/influxdata/influxdb/issues/2625#issuecomment-192046509>
.


Reply to this email directly or view it on GitHub
#2625 (comment)
.

Sean Beckett
Director of Support and Professional Services
InfluxDB

@nelg
Copy link

nelg commented Mar 20, 2016

+1 for this issue. Without good data down-sampling / roll up, moving from whisper is likely to present problems for graphite graphs over long time periods. Ideally needs a way to set a default policy around rollup and retention, for all metrics of a specific type.

@TomGudman
Copy link

+1

Coming from the old RRD world, this is obviously a big shift in mentality, I fully agree with #2625 (comment) - option 3

Then I never used graphite but read about it several times and I liked how you can set different retention policies per metrics if desired otherwise you get the default downsampling automatically.

The influxdb approach is awkward and makes it hard to maintain in my beginner's opinion.

I believe users want a time series databases that is efficient, fast and requires low maintenance. Managing RP and CQ with complex InfluxQL queries isn't obvious...

My comments maybe irrelevant, I am still learning and reading mailing-list and github issues to figure out how I can set downsampling. Currently, I start to wonder why I waited so long for InfluxDB instead of just using Dixon's Graphite.

Still, influxdb is fast to spin and play with the tutorial but then it gets more complicated when you really want to do something with it.

@jsternberg
Copy link
Contributor

Looking through old issues and I found this one. It seems related to #6910.

@daviesalex
Copy link
Contributor

Agreed. I think this one can basically be closed as a dup of #6910

@jsternberg
Copy link
Contributor

I'm going to close this in favor of #6910. If a new issue gets created for this, it will be mentioned in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests