REPLICATION_FACTOR send metrics over carbon-caches on the same machine. #333

toni-moreno · 2014-11-28T09:12:01Z

We are testing the following graphite scheme, Our goal is to have all data on graphite1 replicated ( as a backup server in graphite2) . After some test, we can see some metrics only on one machine, ( we are using carbon-lookup tool from carbonate also to check where it is sending metrics the relay) , after some test We have seen that REPLICATION_FACTOR + consistent_hashing doesn't know anything about hosts machines only carbon destinations. This is , the relay is sending the same metric twice over the same machine with the following results.

no data backup
overload of the disks.

how can configure this scheme avoiding this result?

deniszh · 2014-11-28T09:45:20Z

@toni-moreno,
usually you need 2-tier relay config:
Tier-1 relays deployed on both hosts and have similar config - consistant-hashing, REPLICATION_FACTOR=2 and two DESTINATIONS - Tier-2 relays on host1 and host2.
Tier-2 relays also deployed on both host but have only local carbon-caches as DESTINATIONS.

deniszh · 2014-11-28T10:13:36Z

Something like

toni-moreno · 2014-11-28T14:19:33Z

@deniszh , we have already done this scheme but is not good enough to scale.

What happens if you would like split load in 2/3/4 relays on each machine in any of the tiers ?

how can I split load across several relays and also maintain replication factor on different machines?

deniszh · 2014-11-28T17:01:57Z

Scaling graphite is painful, agreed.

What happens if you would like split load in 2/3/4 relays on each machine in any of the tiers ?

Yep, that's sucks for normal relay. We are using https://github.com/grobian/carbon-c-relay, pretty fast and scalable across CPU cores.

how can I split load across several relays and also maintain replication factor on different machines?

Sorry, did not understand a question. Or it is related with previous one?

toni-moreno · 2014-11-29T05:15:20Z

@deniszh my wished scenario would be the following.

Only 1 relay Tier : (relay should act as a load balancer but also replicating data on the other center)

I prefer only one Tier to avoid a lot of queued data on several tiers if possible. And also because could be needed to add an extra aggregation tier.

do you think Could the carbon-c-relay do this job ?
which relay daemon are you using?

deniszh · 2014-11-29T10:04:38Z

@toni-moreno,
As I mentioned we are using carbon-c-relay in 2-tiers scheme which I also described. We are not using aggregation, but we can do that on some tier-1 relay (but then we need to use only that single relay and we are round-robin them in load-balancer instead). I see no problems it two-tier or even three-tiers scheme (if we need separate aggregator layer). Carbon-c-relay is quite fast for that (but even normal relay can be used, I see no significant queues on relay at all).
But for your architecture I can't even understand how it will work even theoretically. If you have RF=2 in relay then both carbon caches could be in single DC and you'll have no redundancy. You need to use RF=5 (carbon caches in one DC + 1) to have minimal redundancy but you'll have no performance gain then. That's why Tier-1 relays needed - they're just duplicate/replicate data between servers or DCs. Tier-2 relays are needed to spread load across carbon-caches on single server, for proper I/O utilization, because carbon caches are usually I/O bound.

toni-moreno · 2014-11-29T10:23:37Z

Ok ..

What capacity do you have?
How many metrics per second ? How big is your hardware ?

Is carbon-c-relay instrumented as carbon-relay.py is ?

deniszh · 2014-11-29T11:48:07Z

I can give you numbers, but YMMV, of course. You need to test your cluster with some load tool.
Single Dell R710 (16 cores with HT, 256GB RAM) with SAN disks over iSCSI give me about 1 million metrics / min with ease. Problem that you need fast disk subsystem - RAID10 over SSDs (which are dying quite fast under such load) or some good SAN storage. Spinning disks are too slow for big loads - you can use them, but then you need to spread load across many servers. You can check this video for scaling graphite example in booking.com - https://www.youtube.com/watch?v=dtNUeqQovnA
carbon-c-relay is instrumented also.

toni-moreno · 2016-05-04T07:44:58Z

Finally using carbon-c-relay as a proxy to other carbon servers.

Thank you very much @deniszh.

toni-moreno mentioned this issue Nov 29, 2014

Could carbon-c-relay do replication , load balancing and agregation as once? grobian/carbon-c-relay#28

Closed

toni-moreno mentioned this issue Jan 20, 2015

add MIRRORS #365

Closed

toni-moreno closed this as completed May 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REPLICATION_FACTOR send metrics over carbon-caches on the same machine. #333

REPLICATION_FACTOR send metrics over carbon-caches on the same machine. #333

toni-moreno commented Nov 28, 2014

deniszh commented Nov 28, 2014

deniszh commented Nov 28, 2014

toni-moreno commented Nov 28, 2014

deniszh commented Nov 28, 2014

toni-moreno commented Nov 29, 2014

deniszh commented Nov 29, 2014

toni-moreno commented Nov 29, 2014

deniszh commented Nov 29, 2014

toni-moreno commented May 4, 2016

REPLICATION_FACTOR send metrics over carbon-caches on the same machine. #333

REPLICATION_FACTOR send metrics over carbon-caches on the same machine. #333

Comments

toni-moreno commented Nov 28, 2014

deniszh commented Nov 28, 2014

deniszh commented Nov 28, 2014

toni-moreno commented Nov 28, 2014

deniszh commented Nov 28, 2014

toni-moreno commented Nov 29, 2014

deniszh commented Nov 29, 2014

toni-moreno commented Nov 29, 2014

deniszh commented Nov 29, 2014

toni-moreno commented May 4, 2016