Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically chose retention policy based on time range #4262

Open
dennisjac opened this issue Mar 5, 2016 · 27 comments

Comments

Projects
None yet
@dennisjac
Copy link

commented Mar 5, 2016

Hi,
I'm currently looking at how to make Grafana use the aggregated retention policies in InfluxDB.
Issue #3943 apparently adds a way to set the retention policy statically for a query but this only works if you have a hand full of queries. The moment you have more dashboards this is not really a viable option.

What I'd rather like to propose is the ability to add a table in the settings that defines a retention policy based on time ranges. For example you would define the values like this:

1 month => retention_one_value_per_day
1 week => retention_one_value_per_hour
1 day => retention_one_value_per_10s

All queries in all dashboards would then use the retention policy closest to the selected time range as default (which could still be overridden on a per-query basis by explicitly selecting a retention policy there?). If the user doesn't define this table then the default retention policy would always be used.

With this approach most people would be able to set up this table once and then all dashboards would automatically always use the right retention policy for their data which is I think what 99% of the people out there expect. Also for anyone who doesn't specify this value Grafana will behave just as before.

@dennisjac

This comment has been minimized.

Copy link
Author

commented May 10, 2016

So I started looking into this and did some prototyping for how this could work. I'm currently not focusing on the UI bits as that is dependent on how/where exactly this will be implemented.
I added the following code right now in the influxdb/datasource.ts query() function:

    var rangediff = options.range.to.diff(options.range.from) / 1000;
    options.targets.forEach(function(entry){
        console.log(entry.policy);
        //if (entry.policy === '_auto_') {
        if (rangediff >= 10800) {
            entry.policy = 'agg10m';
        } else {
            entry.policy = 'default';
        }
        //}
    });

This woks and forces all queries to use the 'agg10m' policy if the time range selected is >=3h and 'default' if it's not.
One thing I tried to do (as can be seen above) is to limit this only to queries where the policy is set to 'auto'. This would give the flexibility to have queries with an explicitly policy to work like normal and opt-in for others.
The problem I have right now is that the 'options' object is modified permanently so the 'auto' policy setting gets overwritten and I'm not sure how to get around this.
Any ideas how to best get around this problem?

@rubycut

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

@dennisjac , I am not sure if settings is best place to add this. This should go into the datasource definition. During the next week, I'll try to setup grafana development env and try to play with this.

@anlutro

This comment has been minimized.

Copy link

commented May 24, 2016

This should go into the datasource definition.

Can't retention policies be defined on a per-measurement (table) basis?

@dennisjac

This comment has been minimized.

Copy link
Author

commented May 24, 2016

@anlutro , I guess you could override them there but I'm not sure how many people define different retention policies for individual measurements.

The reason I'm interested in this is because right now I'd pretty much have to create each dashboard three times for each of the three retention policies I'm using and then then manually select the right one based on the amount of time I want to display. This is extremely cumbersome at best and if you have more than a few dashboards (or hosts) completely unfeasible.

This is what my retention policies and continuous queries currently look like:

ALTER RETENTION POLICY default ON metrics DURATION 4w REPLICATION 1 SHARD DURATION 1d DEFAULT
CREATE RETENTION POLICY agg5m ON metrics DURATION 24w REPLICATION 1 SHARD DURATION 1w
CREATE RETENTION POLICY agg1h ON metrics DURATION 144w REPLICATION 1 SHARD DURATION 4w

CREATE CONTINUOUS QUERY agg5m ON metrics BEGIN SELECT mean(value) AS value INTO metrics."agg5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m), * END
CREATE CONTINUOUS QUERY agg1h ON metrics BEGIN SELECT mean(value) AS value INTO metrics."agg1h".:MEASUREMENT FROM /.*/ GROUP BY time(1h), * END

With this definition I get the appropriate "density" of data for low-, mid- and long-term views of the data and thanks to the back reference in the CQ's everything stays perfectly dynamic.

With the auto-selection in Grafana the last piece would fit into place to make all of this work end-to-end.

@anlutro

This comment has been minimized.

Copy link

commented May 24, 2016

Yep, I'm in the same situation. I think I might want to store different measurements for different lengths of time, but I'd be fine with keeping it consistent on the database level.

As an easier, temporary solution I was thinking about allowing template variables in the RP/measurement.

@dennisjac

This comment has been minimized.

Copy link
Author

commented May 24, 2016

I wasn't aware that you couldn't use template variables in the retention policy.
Even if this was possible though this would still mean you'd have to chose the policy manually and if you'd make a mistake and select e.g. a time range of 6 months before selecting the right policy you might end up pulling gigabytes of data from influxdb most likely overloading both influxdb and the browser with the data.

@dennisjac

This comment has been minimized.

Copy link
Author

commented May 24, 2016

btw here is an update version of the patch that I'm using right now.
I added the selection of the policy as a function (the array would later live in the datasource configuration):

  determineAutoPolicy(interval) {
    // >48h = agg5m
    // >1month = agg1h
    var retentionPolicies = [
      {interval: 0, policy: 'default'},
      {interval: 172800, policy: 'agg5m'},
      {interval: 2592000, policy: 'agg1h'}
    ];

    var prevEntry = retentionPolicies[0];
    for (var idx = 0; idx < retentionPolicies.length; idx++) {
      if (retentionPolicies[idx].interval > interval) {
        return prevEntry.policy;
      }
      prevEntry = retentionPolicies[idx];
    }
    return prevEntry.policy;
  }

and this bit at the beginning of the query() function:

    var rangediff = options.range.to.diff(options.range.from) / 1000;
    options = _.cloneDeep(options);
    for (var idx = 0; idx < options.targets.length; idx++) {
      if (options.targets[idx].policy === '_auto_') {
        options.targets[idx].policy = this.determineAutoPolicy(rangediff);
      }
    }

I'm cloning the options variable because I need to modify it yet also need to preserve the original "auto" value so that when a new time range is selected the correct policy can be chosen again.

@torkelo

This comment has been minimized.

Copy link
Member

commented May 28, 2016

@anlutro best check InfluxDB docs

@XANi

This comment has been minimized.

Copy link

commented Oct 13, 2016

I think that could be solved more generically by having time range-dependent variable in template.

I can already set up templated variable and just use say $range as parameter for retention policy, all that would be needed is ability to specify if $time_start < now() - 7d { $range = "10m" } else { $range = "default"} via UI.

That way if that would be just generic variable it could be used for any data source, not only influx

@schmurfy

This comment has been minimized.

Copy link

commented Aug 18, 2017

I just stumbled on this problem trying to setup grafana, was there any progress on this ?

@XANi

This comment has been minimized.

Copy link

commented Aug 18, 2017

AFAIK the best you can do is to make a template variable with your retention periods, then use it everywhere

@schmurfy

This comment has been minimized.

Copy link

commented Aug 18, 2017

but this is static, right ? the rentention policy needs to be chosen dependening on the range selected to really be useful.

@XANi

This comment has been minimized.

Copy link

commented Aug 21, 2017

Yes. As I said, there is no other way to do it right now unless Grafana introduces conditional variables ( if time > ( now - 30 days) {$retention=short} else {$retention = long} or something ) or InfluxDB adds views or some smart way to auto-choose retention based on times.

@schmurfy

This comment has been minimized.

Copy link

commented Aug 22, 2017

I would prefer not but I have no issue patching my grafana installation, however I could not find where exactly I should add my custom code.
Could you point me to where is the query actually built when using proxy mode ? Looking at the network monitor it looks like the variables are replaced inside the javascript code but I have tried to add code on both sides (go and javascript) and can't seems to add a new replace pattern :(

I have tried (I searched for $timeFilter as reference):

  • pkg/tsdb/influxdb/query.go in Build
  • public/app/plugins/datasource/influxdb/datasource.ts in annotationQuery

After that I rebuilt the assets with grunt and rebuilt the go sources but neither my fm.Println or console.log logs anything, I also tried to directly add the replace (for $policy) in case the logs where sent somewhere else but no luck either with that.

@XANi

This comment has been minimized.

Copy link

commented Aug 22, 2017

Sorry, writing JS wants me want to vomit so all we have is just a template field in each dashboard with "default" and "longterm" options for retention

@schmurfy

This comment has been minimized.

Copy link

commented Aug 23, 2017

I not a big fan of javascript but the build process used is grafana is not helping either...

I finally found where is the replace code for my use case, it was shadowed by the fact that $timeFilter replacement is done in two locations on the javascript side and once in the go side, I really don't like that design but it does not matter much, I now have what I need to patch this myself.

@nicolas17

This comment has been minimized.

Copy link

commented Mar 23, 2018

I don't think implementing this at the datasource level would work. It doesn't seem so far-fetched to have multiple or different retention policies for individual measurements. For example you could have a RP+CQ storing the average of a measurement and a separate RP+CQ storing the maximum. In that case, a graph for max(foo) would want to use the aggregated-max RP.

I thought of trying to implement this a while ago but I don't know Go at all. I now realize it might be possible with JS code alone, maybe I should give it a try.

@XANi

This comment has been minimized.

Copy link

commented Mar 23, 2018

It would help to look at what people do with it.

We have 2 , "default" that keeps full resolution for last month and "longerm" that has downsampled data (to 10 minute intervals) and keeps it for much longer but I dunno how common setup like that is

@schmurfy

This comment has been minimized.

Copy link

commented Mar 26, 2018

In my case I use influxdb as datasource, in in I have multiple retention policies to reduce the number of datapoints for older records:
raw stores a point every 10s for 3 weeks
down_1min stores a point every minute for 3 months
down_5min stores a point every 5 minute for 6 months

Depending on the size of the range I want to show and/or the periodI want to show I want to use the best retention policy, If I want to show the last 2 weeks there is not much point usind data from raw since it will fetch way too much points, also I I want to show data from last month raw simply will not return any data.

For now I just patched grafana to implement this logic in the influxdb datasource but this is not configurable, I also expose a few scopedVars to be able to create the correct queries.

If you are interested I can show you my patch, it might help find a more generic solution.

@awaw

This comment has been minimized.

Copy link

commented Jul 25, 2018

I'm also running a patched grafana for some time now. Influxdb with more than a single retention policy is infeasible to use without such a modification. With this patch against v5.2.1 you can scroll sideways or zoom and correct retention policy and interval (as configured) is chosen automatically. I'm posting it here as this issue is often referenced in other discussions.

diff --git a/public/app/plugins/datasource/influxdb/datasource.ts b/public/app/plugins/datasource/influxdb/datasource.ts
index f971ac2..3f05ae5 100644
--- a/public/app/plugins/datasource/influxdb/datasource.ts
+++ b/public/app/plugins/datasource/influxdb/datasource.ts
@@ -16,6 +16,9 @@ export default class InfluxDatasource {
   basicAuth: any;
   withCredentials: any;
   interval: any;
+  retentionPolicy: any;
+  retentionBefore: any;
+  retentionInterval: any;
   supportAnnotations: boolean;
   supportMetrics: boolean;
   responseParser: any;
@@ -34,6 +37,9 @@ export default class InfluxDatasource {
     this.basicAuth = instanceSettings.basicAuth;
     this.withCredentials = instanceSettings.withCredentials;
     this.interval = (instanceSettings.jsonData || {}).timeInterval;
+    this.retentionPolicy = (instanceSettings.jsonData || {}).retentionPolicy;
+    this.retentionBefore = (instanceSettings.jsonData || {}).retentionBefore;
+    this.retentionInterval = (instanceSettings.jsonData || {}).retentionInterval;
     this.supportAnnotations = true;
     this.supportMetrics = true;
     this.responseParser = new ResponseParser();
@@ -47,6 +53,20 @@ export default class InfluxDatasource {
     var queryModel;
     var i, y;
 
+    var timestamp = new Date().getTime() / 1000;
+    var beforeTs = timestamp - options.range.from.unix();
+    var retentionPolicy = this.retentionPolicy ? this.retentionPolicy.split(";") : {};
+    var retentionBefore = this.retentionBefore ? this.retentionBefore.split(";") : {};
+    var retentionInterval = this.retentionInterval ? this.retentionInterval.split(";") : {};
+    var policy, interval;
+
+    for (y = 0; y < retentionBefore.length; y++) {
+      if (beforeTs > parseInt(retentionBefore[y])) {
+        policy = retentionPolicy[y];
+        interval = retentionInterval[y];
+      }
+    }
+
     var allQueries = _.map(targets, target => {
       if (target.hide) {
         return '';
@@ -54,6 +74,11 @@ export default class InfluxDatasource {
 
       queryTargets.push(target);
 
+      if (policy) {
+        scopedVars.autopolicy = { value: policy };
+        scopedVars.autointerval = { value: interval };
+      }
+
       // backward compatibility
       scopedVars.interval = scopedVars.__interval;
 
diff --git a/public/app/plugins/datasource/influxdb/influx_query.ts b/public/app/plugins/datasource/influxdb/influx_query.ts
index 2ef7417..f4395c3 100644
--- a/public/app/plugins/datasource/influxdb/influx_query.ts
+++ b/public/app/plugins/datasource/influxdb/influx_query.ts
@@ -23,6 +23,18 @@ export default class InfluxQuery {
     target.groupBy = target.groupBy || [{ type: 'time', params: ['$__interval'] }, { type: 'fill', params: ['null'] }];
     target.select = target.select || [[{ type: 'field', params: ['value'] }, { type: 'mean', params: [] }]];
 
+    if (target.policy === 'auto') {
+      target.policy = 'default';
+      if (typeof scopedVars !== 'undefined' && typeof scopedVars.autopolicy !== 'undefined') {
+        target.policy = scopedVars.autopolicy.value;
+        var interval = kbn.interval_to_seconds(scopedVars.autointerval.value);
+        if (interval > kbn.interval_to_seconds(scopedVars.__interval.value)) {
+          scopedVars.__interval = { value: kbn.secondsToHms(interval) };
+          scopedVars.interval = scopedVars.__interval;
+        }
+      }
+    }
+
     this.updateProjection();
   }
 
diff --git a/public/app/plugins/datasource/influxdb/partials/config.html b/public/app/plugins/datasource/influxdb/partials/config.html
index a70a1de..19d80a7 100644
--- a/public/app/plugins/datasource/influxdb/partials/config.html
+++ b/public/app/plugins/datasource/influxdb/partials/config.html
@@ -49,3 +49,23 @@
 		</div>
 	</div>
 </div>
+
+<h4 class="page-heading">Retention Policy &quot;auto&quot;</h4>
+
+<div class="gf-form-group">
+	<div class="gf-form max-width-32">
+		<span class="gf-form-label width-11">Retention Policy</span>
+		<input type="text" class="gf-form-input width-20" ng-model="ctrl.current.jsonData.retentionPolicy" spellcheck='false' placeholder="sample5m;sample1h;sample1d"></input>
+		<i class="fa fa-question-circle" bs-tooltip="'Retention policies (delimited by semicolons)'" data-placement="right"></i>
+	</div>
+	<div class="gf-form max-width-32">
+		<span class="gf-form-label width-11">Before Timestamp</span>
+		<input type="text" class="gf-form-input width-20" ng-model="ctrl.current.jsonData.retentionBefore" spellcheck='false' placeholder="86400;2592000;31536000"></input>
+		<i class="fa fa-question-circle" bs-tooltip="'Before which timestamp to use above retention polices (in seconds, delimited by semicolons)'" data-placement="right"></i>
+	</div>
+	<div class="gf-form max-width-32">
+		<span class="gf-form-label width-11">Minimum Interval</span>
+		<input type="text" class="gf-form-input width-20" ng-model="ctrl.current.jsonData.retentionInterval" spellcheck='false' placeholder="30s;5m;1h"></input>
+		<i class="fa fa-question-circle" bs-tooltip="'Mimimum group by time interval to use with above policies (delimited by semicolons)'" data-placement="right"></i>
+	</div>
+</div>
diff --git a/public/app/plugins/datasource/influxdb/query_builder.ts b/public/app/plugins/datasource/influxdb/query_builder.ts
index 2b19aa8..84ba90b 100644
--- a/public/app/plugins/datasource/influxdb/query_builder.ts
+++ b/public/app/plugins/datasource/influxdb/query_builder.ts
@@ -52,7 +52,7 @@ export class InfluxQueryBuilder {
       if (!measurement.match('^/.*/')) {
         measurement = '"' + measurement + '"';
 
-        if (policy && policy !== 'default') {
+        if (policy && policy !== 'default' && policy !== 'auto') {
           policy = '"' + policy + '"';
           measurement = policy + '.' + measurement;
         }
@@ -69,7 +69,7 @@ export class InfluxQueryBuilder {
         measurement = '"' + measurement + '"';
       }
 
-      if (policy && policy !== 'default') {
+      if (policy && policy !== 'default' && policy !== 'auto') {
         policy = '"' + policy + '"';
         measurement = policy + '.' + measurement;
       }
@talek

This comment has been minimized.

Copy link

commented Mar 22, 2019

In Grafana v6.0.2 I'm using an workaround which seems to work pretty well. The system I use has four retention policies defined:

> show retention policies
name    duration  shardGroupDuration replicaN default
----    --------  ------------------ -------- -------
autogen 1h0m0s    1h0m0s             1        true
d       24h0m0s   1h0m0s             1        false
m       744h0m0s  24h0m0s            1        false
y       8928h0m0s 168h0m0s           1        false

Because in Grafana there is no conditional variable, the idea was to have in InfluxDB a mapping between the time interval and the corresponding retention policy and to return the proper retention policy in Grafana via a "Query" variable.

So, in InfluxDB I've created an unlimited duration retention policy where I've manually defined a measurement to keep this mapping:

CREATE RETENTION POLICY "forever" ON telegraf DURATION INF REPLICATION 1
INSERT INTO forever rp_config,idx=1 rp="autogen",start=0i,end=3600000i -9223372036854775806
INSERT INTO forever rp_config,idx=2 rp="d",start=3600000i,end=86400000i -9223372036854775806
INSERT INTO forever rp_config,idx=3 rp="m",start=86400000i,end=2592000000i -9223372036854775806
INSERT INTO forever rp_config,idx=4 rp="y",start=2592000000i,end=3110400000000i -9223372036854775806

The mapping measurement is the following:

> select * from forever.rp_config
name: rp_config
time                           end           idx rp      start
----                           ---           --- --      -----
1677-09-21T00:12:43.145224194Z 3600000       1   autogen 0
1677-09-21T00:12:43.145224194Z 3110400000000 4   y       2592000000
1677-09-21T00:12:43.145224194Z 2592000000    3   m       86400000
1677-09-21T00:12:43.145224194Z 86400000      2   d       3600000

Then, in Grafana I've defined the following variable:
image

The $rp variable should change its value according to the time range interval selected in the dashboard. Then, it's just a matter of prefixing all measurements with the $rp variable.

@danielllek

This comment has been minimized.

Copy link

commented Apr 3, 2019

@talek Awesome! No patching needed, works for me!

@kotls

This comment has been minimized.

Copy link

commented Apr 3, 2019

@talek I can't figure out the last step you mentioned, namely prefixing all measurements with the $rp variable. Do you have same field names between retention policies? Or are they like field, d_field, m_field, etc.

@talek

This comment has been minimized.

Copy link

commented Apr 3, 2019

@kotls Yes, I keep the same field names across RPs so that to reuse the same queries in my Grafana dashboards. Of course, it is possible to add new fields in the down-sampled data if I want for example to also have the min() and the max() under new fields.

@kotls

This comment has been minimized.

Copy link

commented Apr 3, 2019

@talek Could you give an example of continuous query you are using in influxdb? Any way to implement such query without explicitly defining each field, i.e. SELECT median("field") as field

I tried running this querry and it works fine, except for adding prefix mean_ to all my fields, which I would very much like to avoid.
CREATE CONTINUOUS QUERY "cq_1m_for_30d" ON "db_name" BEGIN SELECT mean(*) INTO "db_name"."1m_for_30d".:MEASUREMENT FROM /.*/ GROUP BY time(1m),* END

@talek

This comment has been minimized.

Copy link

commented Apr 3, 2019

@kotls I'm using Kapacitor to down-sample the data. The downside of this is that I have a separate tick script for each measurement. For example:

// Parameters
var interval_ds = 5m
var look_behind = 1h
var target_rp = 'm'
var source_rp = 'autogen'

// Dataframe
batch
  |query('''
      select
        mean(blocked) as blocked, min(blocked) as min_blocked, max(blocked) as max_blocked,
        mean(dead) as dead, min(dead) as min_dead, max(dead) as max_dead,
        mean(idle) as idle, min(idle) as min_idle, max(idle) as max_idle,
        mean(paging) as paging, min(paging) as min_paging, max(paging) as max_paging,
        mean(running) as running, min(running) as min_running, max(running) as max_running,
        mean(sleeping) as sleeping, min(sleeping) as min_sleeping, max(sleeping) as max_sleeping,
        mean(stopped) as stopped, min(stopped) as min_stopped, max(stopped) as max_stopped,
        mean(total) as total, min(total) as min_total, max(total) as max_total,
        mean(total_threads) as total_threads, min(total_threads) as min_total_threads, max(total_threads) as max_total_threads,
        mean(unknown) as unknown, min(unknown) as min_unknown, max(unknown) as max_unknown,
        mean(zombies) as zombies, min(zombies) as min_zombies, max(zombies) as max_zombies
      from telegraf.''' + source_rp + '''.processes
      ''')
    .period(look_behind)
    .every(interval_ds)
    .groupBy(time(interval_ds), *)
    .fill('previous')
  |influxDBOut()
    .database('telegraf')
    .retentionPolicy(target_rp)
    .measurement('processes')

For generic CQs, what you're looking for is this.

@genebean

This comment has been minimized.

Copy link

commented Apr 22, 2019

This issue is also relevant on the CQ side of things @kotls
Alias field key backreference when using wildcard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.