Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB: take advantage of influx chunked reponses #863

Closed
sanga opened this issue Sep 24, 2014 · 14 comments
Closed

InfluxDB: take advantage of influx chunked reponses #863

sanga opened this issue Sep 24, 2014 · 14 comments
Labels
area/datasource datasource/InfluxDB needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc prio/low It's a good idea, but not scheduled for any release type/feature-request

Comments

@sanga
Copy link

sanga commented Sep 24, 2014

Fair warning up front. This is possibly not that trivial and I not even entirely sure as to the feasibility of doing this but anyway...

Influx supports chunked http responses, so that it will send data back in, well, chunks, as it calculates the data. According to the docs, it should send all data for the time period it has calculated and then move on to the next chunk of time from the requested time period. So I'm wondering, might it be possible to read the data in in chunks and paint the graph chunk by chunk (a very tertiary glance at flot docs would appear to suggest this is possible at least)?

The problem I'm trying to get around is basically this. I have some graphs that plot an awful lot of data. And they take a long time to paint. During that time we currently just get a spinner in the graph. Much nicer would be if the graph would gradually "fill in" i.e. paint backwards in time (like Splunk if you ever happen to have used that tool).

What do you think? Reasonable use of time/complexity to implement?

@torkelo
Copy link
Member

torkelo commented Sep 24, 2014

have you considered using influxdb continuous queries to pre aggregate and speed up the queries? I havent used influxdb in a production setup yet, but with graphite (in production usage with hundreds of thousands of metrics) I have yet to find queries that take more than a second even for long time ranges and many metrics.

Also are you using columns to distinguish metrics or are you using a series per metric (with single a single value column). Because I think that using a where clause and using columns like "host" is very bad for performance when using influxdb.

As to streaming in the results, it would definitely be possible. But more would have to +1 vote it for me to spend time on it (might be 1-2 days work at least).

@sahilthapar
Copy link

I get this too ... Maybe I should look into continuous queries but I get this issue with single value columns.

@sanga
Copy link
Author

sanga commented Nov 11, 2014

Sorry I never replied to this earlier, apparently it was lost in a sea of open browser tabs.... Anyway, cont- queries would, I guess, work fine if I knew beforehand what I wanted to query. However most of the time I spend in grafana is exploring perf. problems, so I don't know what is interesting before I start exploring.

Having said that, a 90% solution to this is just to use a long enough group_by period, by which you can drastically reduce the amount of data send back from influx (cursory investigation suggests it's influx's speed is inversely proportional to the raw amount of data that it needs to pass back). The other benefit of doing this is that grafana needs to store less in memory so the ui remains snappy.

@nbrownus
Copy link

Support for this would clear up #2266 which the influxdb team has identified as the root issue influxdata/influxdb#3242

@sknaumov
Copy link

I have a related question - could Grafana be configured to retrieve data by chunks, using new request per chunk? The problem is that I have a lot of metrics in a blob in DB (write-optimized, storage size optimized, reads are infrequent and quite expensive) and a dashboard where for each metric Grafana creates an HTTP request for a big period of time (say, 1 day or 1 week). A natural thing would be to try to aggregate these requests to process all metrics simultaneously, as ultimately they point to the very same blob - but modern browsers allow only about 6 concurrent connections to the server by default. Caching blob decompression and parsing results would resolve the problem, but for long periods of data (say, 1 week) it is no longer an option, as data for the first 6 requests has to be processed first before follow-up requests for other metrics will be sent => I need to cache the whole 1 week of per-second data for all possible metrics. If Grafana would be able to perform data retrieval in configurable chunks (say, retrieve no more than 1 hour of data with 1 request), and for all metrics ask for the first hour first, then, when completed, ask for the second hour and so on... It would resolve the problem. Are there some configuration options like this?

@ryantxu
Copy link
Member

ryantxu commented Jul 10, 2019

With the new streaming infrastructure, this is now something we can consider (it is still a ways off!) but there is a path for it.

The things we need are:

However, I a bit skeptical that the browser will be able to do anything useful if there is too much data.

@vpapavas
Copy link

vpapavas commented Nov 7, 2019

Hi everyone! What is the status on supporting chunked responses? I am working on developing a plugin for a streaming data source whose responses are chunked.

@torkelo
Copy link
Member

torkelo commented Nov 7, 2019

The datasource query function can return a rxjs Observable to stream results

@aocenas aocenas added the prio/low It's a good idea, but not scheduled for any release label Sep 2, 2020
@daniellee daniellee changed the title take advantage of influx chunked reponses InfluxDB: take advantage of influx chunked reponses Nov 9, 2020
@gabor gabor added the needs investigation for unconfirmed bugs. use type/bug for confirmed bugs, even if they "need" more investigating label Jul 16, 2021
@gabor gabor self-assigned this Jul 16, 2021
@gabor
Copy link
Contributor

gabor commented Jul 16, 2021

@sknaumov i think the use-case you described in #863 (comment) is slightly different from what is requested in this issue, could you please open a separate feature-request for discussing it? thanks!

@gabor
Copy link
Contributor

gabor commented Jul 16, 2021

@sanga i'm trying to understand your use-case better:

  • is it about influxdb returning too much data? in this case, i'm worried that even if it gets solved, the browser will just stop working if it gets tens of thousands of data-points, in this case (as you also mentioned) longer group_by periods should solve the problem
  • is it about influxdb returning the results very slowly, where by using chunked responses we would see the start of the data faster in the graph?

@gabor
Copy link
Contributor

gabor commented Jul 16, 2021

NOTE: in general, i wonder how much is this use-case still supported in influxdb2. i did some tests with influxdb 2.0.7:

@gabor gabor added needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc and removed needs investigation for unconfirmed bugs. use type/bug for confirmed bugs, even if they "need" more investigating labels Jul 16, 2021
@gabor
Copy link
Contributor

gabor commented Jul 16, 2021

NOTE: for flux-mode, we are consuming the flux-response csv-row by csv-row ( https://github.com/grafana/grafana/blob/main/pkg/tsdb/influxdb/flux/executor.go#L78 ), so at least in theory there is a way to return non-full data to the browser. still, the question remains how useful this would be.

@gabor gabor removed their assignment Jul 16, 2021
@sanga
Copy link
Author

sanga commented Jul 18, 2021

@gabor to be honest this ticket is so old that I can't anymore recall precisely what my problem was. I think is was a combination of both. Which is to say: Slowness caused by a large amount of data. I agree with your assertion that having a large amount of data will probably cause the browser to be unusably slow in any case. Given that and the fact that chunked queries don't exist anymore, I think this ticket can be closed

@gabor
Copy link
Contributor

gabor commented Jul 19, 2021

@sanga thanks for the info, closing it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datasource datasource/InfluxDB needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc prio/low It's a good idea, but not scheduled for any release type/feature-request
Projects
None yet
Development

No branches or pull requests

10 participants