Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend performance with InfluxDB #2634

Closed
heyciao opened this issue Aug 31, 2015 · 4 comments
Closed

Frontend performance with InfluxDB #2634

heyciao opened this issue Aug 31, 2015 · 4 comments

Comments

@heyciao
Copy link

heyciao commented Aug 31, 2015

Hi!

SW-versions:

  • collectd 5.4.1
  • graphite 0.9.13
  • influxdb 0.9.2.1
  • grafana 2.1.1

Config:

  • collectd sends every 10 seconds metrics from a bunch of VMs to both "graphite/carbon/whisper" AND "influxdb".
  • influxdb: keep all data in a 10seconds-resolution for 2 years.
  • graphite/carbon/whisper: keep data with 10seconds-resolution for 2months, then aggregate it (with "peak") for 2years ("retentions = 10s:60d,1m:2y").

Situation:
I'm comparing the influxdb vs. graphite backends.
"graphite/carbon/whisper" is the nicest one to use with grafana, but on the other side it generates a lot of I/O, I don't like its architecture and there is a huge amount of CPU usage on the server when I issue queries for large timespans (especially if I don't use the 1minute-aggregation for ranges bigger than 2months).
I therefore started to evaluate as well influxdb as grafana's data source.

Problem:
when using influxdb and issuing queries for large timespans (starting from 7 days onwards - but in the future I'll issue queries even for the full 2 years), grafana generates a lot of load in the browser respectively the whole browser hangs (using mainly Firefox - but even if Chrome is top, performance still deteriorates with larger timespans).

Comparison:
when doing the same with graphite, the browser page still hangs for a while, but just because it's waiting for an answer from the server (no CPU usage on the client).

Assumptions & hints:
I assume that with influxdb the browser is overloaded by all the data points that it gets?
If the previous assumption is correct, I am aware of the "maxDataPoints"/"Max data points" option for a "graphite/carbon/whisper" setup => is there something similar for influxdb?

Thanks a lot for reading this - I love grafana's look and its query builder!!!!

@torkelo
Copy link
Member

torkelo commented Sep 1, 2015

when you issue the influxdb query that hangs the browser what group by time interval do you have? InfluxDB does not have a maxDataPoints parameter but it has a group by time feature, if you use the Grafana influxdb query editor Grafana will geneterate a query with group by time ($interval) the $interval variable will be replaced with an appropriate interval depending on the time range and the width in pixels of the graph.

If you use 10s data for 2 years without rollups, InfluxDB is going to be crazy slow when you issue queries over large time spans.

Graphite is amazingly fast for queries due to its integrated rollups, sure you have to have an SSD and 2-4 carbon agents too scale it to do millions of metrics per minute but that is pretty easy. I would suggest sticking with Graphite if you want fast queries and not having to worry about what time range you use.

InfluxDB 0.9 does not have any generic way to do rollups yet, you need to specify one continuous query per measurement and rollup period

@torkelo torkelo closed this as completed Sep 1, 2015
@heyciao
Copy link
Author

heyciao commented Sep 1, 2015

Thanks torkelo
This is exactly what I needed to know.

I have checked the parameters that I'm using with InfluxDB and there is something weird happening.
Basically, I think that when using the "derivative"-functions my browser/grafana is overloaded because InfluxDB sends me back always all data points with the maximum resolution of 10 seconds - doesn't matter what I put into "group by time ($interval)".
I tried to write a simple query and to simplify the test further I queried directly InfluxDB instead of going through Grafana - pls. see the screenshots:
function_count
function_derivative

The measurement of "contextswitch_value" contains values with a 10seconds-resolution.
When I run the query "count the values I have for the last 10 minutes and group them into 5minutes-buckets"...


SELECT count(value) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(null)


...I get only ~2 results, which I think is correct.
On the other side, when I run the exact same query but I use "derivative" or "non_negative_derivative" instead of "count" then I get back all the single values, not grouped:


SELECT non_negative_derivative(value) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(null)


Am I doing some mistake in the query?
It's only since 2 days that I'm using InfluxDB so I'm definitely not an expert with it.

Many thanks

@heyciao
Copy link
Author

heyciao commented Sep 1, 2015

p.s.
I realize that this is more a question for InfluxDB => I will upgrade InfluxDB to the latest version, test it again, if I still have the problemI will post the question in InfluxDB's github and then update this thread with the outcome.

@heyciao
Copy link
Author

heyciao commented Sep 1, 2015

Upgraded, but I still had the same problem.
I looked again at the issue and maybe I found the solution.

This was my original query, which did not aggregate anything:


SELECT non_negative_derivative(value) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(null)


2 changes are needed:
A) do an aggregation before running the derivative function. Therefore write e.g. "non_negative_derivative(mean(value))", or "non_negative_derivative(max(value))", etc... .
B) do not use "fill(null)" otherwise I'll get a result only when all values are populated (none is missing), instead replace it with anything else, e.g. "fill(none)", "fill(0)", fill(previous)".

End result:


SELECT non_negative_derivative(sum(value)) FROM "contextswitch_value" WHERE "host" = 'vmmonitor' AND time > now() - 10m GROUP BY time(5m) fill(none)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants