Information needed #2142

AsenZahariev · 2017-12-05T06:38:11Z

Hello folks,
I saw the new option REMOTE_BUFFER_SIZE.
Can we have more information about this ? What exactly are these 1024 * 1024 values ?

Thank you!
Asen Z

deniszh · 2017-12-05T09:04:57Z

Hello @AsenZahariev ,
This is part of rather new PR #2136
Citing it here:

Currently pickle & msgpack using lots of small reads to decode the responses from remote hosts, which causes slowdowns for the remote host and can drastically increase the time taken to receive large responses.
This PR adds a new BufferedHTTPReader class that can be used to wrap the result before passing it to load(). It reads from the underlying response object in chunks to keep memory usage reasonable without slowing down the producer.

REMOTE_BUFFER_SIZE is a default buffer size for http calls to remote hosts in cluster. Default is 1024 * 1024 = 1048576, i.e. 1MB - should be fine for most clusters.

deniszh · 2017-12-05T09:05:20Z

Are you experiencing any issues, or just curious?

AsenZahariev · 2017-12-05T09:40:26Z

First thank you for the quick answer. Just curious, because we have a large cluster and very heavy queries.
Let me describe in short what we have and what we done so far so you can get the picture of my curiosity.
We have 4 graphite clusters running on 0.9.15( i will lie you about the commit version) combine these servers are receiving around 4Mil metrics per minute. On these nodes are running all carbon components (relays(5) and caches(10) per node) also with graphite webapp using memcached for 5 minutes(600seconds). In front of all there is LoadBalancer. Relays are using const hashing for distributing the metrics and each relay can sent metrics to all caches (10caches x node , 40 caches combine). No aggregation. All are running on pypy. Which btw helped a lot with minimizing the utilization of CPU and RAM for carbon`s relay and cache. Each carbonlink_host is directed to their local caches and instances.

What we have done to increase the reading speed for Grafana and overall stability is that we build a separate box only with graphite webapp latest version from the master.
Configuration of the graphite webapp box we put a local memcached with configured policy (e.g. 0,60 ; 7200, 120; 21600,180; and so on).
On cluster_servers directive we put the IP addresses of the 4 nodes with the respective port.
The first issue we encounter is that the queries started to timeout so we put a big values and increase the retries.
We started to use POST request ( REMOTE_STORE_USE_POST = True)
Using the new option (REMOTE_BUFFER_SIZE = 1024 * 1024), and yes it helped a lot!
On the graphite nodes we only enable REMOTE_PREFETCH_DATA.

I will be forever grateful if this threat can become a guide for building sustainable graphite cluster, because we search and experiment a lot before even thinking to start threat here.

P.S.

This is one of cluster , the other one is even more ...heavy
We have two instances of Grafana ( 2.5 and 4.4.6) both with 700 dashboards and around 5k-6k of graphs , with crazy queries using multiple wildcards (*)
Seyren v1.5 with nearly 400 checks.
We can't update the graphite nodes to newer version.

Kind regards,
Asen Z.

deniszh · 2017-12-05T12:00:32Z

Cool, thanks for sharing!
Please note that (IIRC) this behavior is enabled by default now, and doesn't require any additional variables.
REMOTE_PREFETCH_DATA and REMOTE_STORE_USE_POST are useful for cluster tuning ofc.

AsenZahariev · 2017-12-05T12:23:35Z

Sorry, IIRC ? I'm little bit new :)
Anyway, do you have any other recommendation/best practices for such setup like mine ?

deniszh · 2017-12-05T13:21:27Z

IIRC is an acronym for "If I Recall(or Remember) Correctly".
And looks like I'm not really because REMOTE_PREFETCH_DATA is not used now (after #2093). REMOTE_STORE_USE_POST can be useful but not mandatory for features above, POST has no limit for size request (contrary to GET).

AsenZahariev · 2017-12-06T11:11:15Z

Hey Denis,
We switched back to POST since the post is having some limitation. In general is perfect for small queries, but for something with multiple wildcards not so much.
About "REMOTE_PREFETCH_DATA" yes we saw that change and we only are using it on our graphite nodes (version 0.9.15). Do you think there is some limitation? We use 0.9.15(back-end graphite webapp nodes) and 1.1.0(front-end graphite web app) ?

deniszh · 2017-12-06T11:43:16Z

For 0.9.15 PREFETCH is still valid, of course.

AsenZahariev · 2017-12-15T09:37:17Z

@deniszh Thank you for your feedback! I can confirm that with the above settings the getting graphite webapp in front of your graphite cluster with the settings we have(of course, once again, it depends on your environment/infrastructure), the reading is better. I have a couple of more questions ,but there are going to be in a separate thread. Cheers!

deniszh added the question label Dec 5, 2017

AsenZahariev closed this as completed Dec 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information needed #2142

Information needed #2142

AsenZahariev commented Dec 5, 2017

deniszh commented Dec 5, 2017

deniszh commented Dec 5, 2017

AsenZahariev commented Dec 5, 2017 •

edited

Loading

deniszh commented Dec 5, 2017

AsenZahariev commented Dec 5, 2017

deniszh commented Dec 5, 2017 •

edited

Loading

AsenZahariev commented Dec 6, 2017

deniszh commented Dec 6, 2017

AsenZahariev commented Dec 15, 2017

Information needed #2142

Information needed #2142

Comments

AsenZahariev commented Dec 5, 2017

deniszh commented Dec 5, 2017

deniszh commented Dec 5, 2017

AsenZahariev commented Dec 5, 2017 • edited Loading

deniszh commented Dec 5, 2017

AsenZahariev commented Dec 5, 2017

deniszh commented Dec 5, 2017 • edited Loading

AsenZahariev commented Dec 6, 2017

deniszh commented Dec 6, 2017

AsenZahariev commented Dec 15, 2017

AsenZahariev commented Dec 5, 2017 •

edited

Loading

deniszh commented Dec 5, 2017 •

edited

Loading