Frontend performance with InfluxDB #2634

heyciao · 2015-08-31T22:45:43Z

Hi!

SW-versions:

collectd 5.4.1
graphite 0.9.13
influxdb 0.9.2.1
grafana 2.1.1

Config:

collectd sends every 10 seconds metrics from a bunch of VMs to both "graphite/carbon/whisper" AND "influxdb".
influxdb: keep all data in a 10seconds-resolution for 2 years.
graphite/carbon/whisper: keep data with 10seconds-resolution for 2months, then aggregate it (with "peak") for 2years ("retentions = 10s:60d,1m:2y").

Situation:
I'm comparing the influxdb vs. graphite backends.
"graphite/carbon/whisper" is the nicest one to use with grafana, but on the other side it generates a lot of I/O, I don't like its architecture and there is a huge amount of CPU usage on the server when I issue queries for large timespans (especially if I don't use the 1minute-aggregation for ranges bigger than 2months).
I therefore started to evaluate as well influxdb as grafana's data source.

Problem:
when using influxdb and issuing queries for large timespans (starting from 7 days onwards - but in the future I'll issue queries even for the full 2 years), grafana generates a lot of load in the browser respectively the whole browser hangs (using mainly Firefox - but even if Chrome is top, performance still deteriorates with larger timespans).

Comparison:
when doing the same with graphite, the browser page still hangs for a while, but just because it's waiting for an answer from the server (no CPU usage on the client).

Assumptions & hints:
I assume that with influxdb the browser is overloaded by all the data points that it gets?
If the previous assumption is correct, I am aware of the "maxDataPoints"/"Max data points" option for a "graphite/carbon/whisper" setup => is there something similar for influxdb?

Thanks a lot for reading this - I love grafana's look and its query builder!!!!

torkelo · 2015-09-01T08:26:39Z

when you issue the influxdb query that hangs the browser what group by time interval do you have? InfluxDB does not have a maxDataPoints parameter but it has a group by time feature, if you use the Grafana influxdb query editor Grafana will geneterate a query with group by time ($interval) the $interval variable will be replaced with an appropriate interval depending on the time range and the width in pixels of the graph.

If you use 10s data for 2 years without rollups, InfluxDB is going to be crazy slow when you issue queries over large time spans.

Graphite is amazingly fast for queries due to its integrated rollups, sure you have to have an SSD and 2-4 carbon agents too scale it to do millions of metrics per minute but that is pretty easy. I would suggest sticking with Graphite if you want fast queries and not having to worry about what time range you use.

InfluxDB 0.9 does not have any generic way to do rollups yet, you need to specify one continuous query per measurement and rollup period

heyciao · 2015-09-01T12:55:01Z

Thanks torkelo
This is exactly what I needed to know.

I have checked the parameters that I'm using with InfluxDB and there is something weird happening.
Basically, I think that when using the "derivative"-functions my browser/grafana is overloaded because InfluxDB sends me back always all data points with the maximum resolution of 10 seconds - doesn't matter what I put into "group by time ($interval)".
I tried to write a simple query and to simplify the test further I queried directly InfluxDB instead of going through Grafana - pls. see the screenshots:

The measurement of "contextswitch_value" contains values with a 10seconds-resolution.
When I run the query "count the values I have for the last 10 minutes and group them into 5minutes-buckets"...

SELECT count(value) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(null)

...I get only ~2 results, which I think is correct.
On the other side, when I run the exact same query but I use "derivative" or "non_negative_derivative" instead of "count" then I get back all the single values, not grouped:

SELECT non_negative_derivative(value) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(null)

Am I doing some mistake in the query?
It's only since 2 days that I'm using InfluxDB so I'm definitely not an expert with it.

Many thanks

heyciao · 2015-09-01T13:08:39Z

p.s.
I realize that this is more a question for InfluxDB => I will upgrade InfluxDB to the latest version, test it again, if I still have the problemI will post the question in InfluxDB's github and then update this thread with the outcome.

heyciao · 2015-09-01T14:35:52Z

Upgraded, but I still had the same problem.
I looked again at the issue and maybe I found the solution.

This was my original query, which did not aggregate anything:

SELECT non_negative_derivative(value) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(null)

2 changes are needed:
A) do an aggregation before running the derivative function. Therefore write e.g. "non_negative_derivative(mean(value))", or "non_negative_derivative(max(value))", etc... .
B) do not use "fill(null)" otherwise I'll get a result only when all values are populated (none is missing), instead replace it with anything else, e.g. "fill(none)", "fill(0)", fill(previous)".

End result:

SELECT non_negative_derivative(sum(value)) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(none)

torkelo closed this as completed Sep 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontend performance with InfluxDB #2634

Frontend performance with InfluxDB #2634

heyciao commented Aug 31, 2015

torkelo commented Sep 1, 2015

heyciao commented Sep 1, 2015

heyciao commented Sep 1, 2015

heyciao commented Sep 1, 2015

Frontend performance with InfluxDB #2634

Frontend performance with InfluxDB #2634

Comments

heyciao commented Aug 31, 2015

torkelo commented Sep 1, 2015

heyciao commented Sep 1, 2015

heyciao commented Sep 1, 2015

heyciao commented Sep 1, 2015