Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPLICATION_FACTOR send metrics over carbon-caches on the same machine. #333

Closed
toni-moreno opened this issue Nov 28, 2014 · 9 comments
Closed

Comments

@toni-moreno
Copy link

We are testing the following graphite scheme, Our goal is to have all data on graphite1 replicated ( as a backup server in graphite2) . After some test, we can see some metrics only on one machine, ( we are using carbon-lookup tool from carbonate also to check where it is sending metrics the relay) , after some test We have seen that REPLICATION_FACTOR + consistent_hashing doesn't know anything about hosts machines only carbon destinations. This is , the relay is sending the same metric twice over the same machine with the following results.

  • no data backup
  • overload of the disks.

how can configure this scheme avoiding this result?

image

@deniszh
Copy link
Member

deniszh commented Nov 28, 2014

@toni-moreno,
usually you need 2-tier relay config:
Tier-1 relays deployed on both hosts and have similar config - consistant-hashing, REPLICATION_FACTOR=2 and two DESTINATIONS - Tier-2 relays on host1 and host2.
Tier-2 relays also deployed on both host but have only local carbon-caches as DESTINATIONS.

@deniszh
Copy link
Member

deniszh commented Nov 28, 2014

Something like
graphite1

@toni-moreno
Copy link
Author

@deniszh , we have already done this scheme but is not good enough to scale.

What happens if you would like split load in 2/3/4 relays on each machine in any of the tiers ?

how can I split load across several relays and also maintain replication factor on different machines?

@deniszh
Copy link
Member

deniszh commented Nov 28, 2014

Scaling graphite is painful, agreed.

What happens if you would like split load in 2/3/4 relays on each machine in any of the tiers ?

Yep, that's sucks for normal relay. We are using https://github.com/grobian/carbon-c-relay, pretty fast and scalable across CPU cores.

how can I split load across several relays and also maintain replication factor on different machines?

Sorry, did not understand a question. Or it is related with previous one?

@toni-moreno
Copy link
Author

@deniszh my wished scenario would be the following.

  • Only 1 relay Tier : (relay should act as a load balancer but also replicating data on the other center)

I prefer only one Tier to avoid a lot of queued data on several tiers if possible. And also because could be needed to add an extra aggregation tier.

do you think Could the carbon-c-relay do this job ?
which relay daemon are you using?

image

@deniszh
Copy link
Member

deniszh commented Nov 29, 2014

@toni-moreno,
As I mentioned we are using carbon-c-relay in 2-tiers scheme which I also described. We are not using aggregation, but we can do that on some tier-1 relay (but then we need to use only that single relay and we are round-robin them in load-balancer instead). I see no problems it two-tier or even three-tiers scheme (if we need separate aggregator layer). Carbon-c-relay is quite fast for that (but even normal relay can be used, I see no significant queues on relay at all).
But for your architecture I can't even understand how it will work even theoretically. If you have RF=2 in relay then both carbon caches could be in single DC and you'll have no redundancy. You need to use RF=5 (carbon caches in one DC + 1) to have minimal redundancy but you'll have no performance gain then. That's why Tier-1 relays needed - they're just duplicate/replicate data between servers or DCs. Tier-2 relays are needed to spread load across carbon-caches on single server, for proper I/O utilization, because carbon caches are usually I/O bound.

@toni-moreno
Copy link
Author

Ok ..

What capacity do you have?
How many metrics per second ? How big is your hardware ?

Is carbon-c-relay instrumented as carbon-relay.py is ?

@deniszh
Copy link
Member

deniszh commented Nov 29, 2014

I can give you numbers, but YMMV, of course. You need to test your cluster with some load tool.
Single Dell R710 (16 cores with HT, 256GB RAM) with SAN disks over iSCSI give me about 1 million metrics / min with ease. Problem that you need fast disk subsystem - RAID10 over SSDs (which are dying quite fast under such load) or some good SAN storage. Spinning disks are too slow for big loads - you can use them, but then you need to spread load across many servers. You can check this video for scaling graphite example in booking.com - https://www.youtube.com/watch?v=dtNUeqQovnA
carbon-c-relay is instrumented also.

@toni-moreno
Copy link
Author

Finally using carbon-c-relay as a proxy to other carbon servers.

Thank you very much @deniszh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants