Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explaining a HA + scalable setup? #1500

Closed
KlavsKlavsen opened this issue Mar 24, 2016 · 19 comments
Closed

explaining a HA + scalable setup? #1500

KlavsKlavsen opened this issue Mar 24, 2016 · 19 comments

Comments

@KlavsKlavsen
Copy link

Hi guys,

I've been looking at replacing graphite for sometime.. being pretty bound to the push model here though.

I'm trying to figure out to do high-availability setups with prometheus? I was planning on setting it up on 2 physical servers - and I'd like my grafana dashboard and collecting still working, when a physical server is downed.. and it would be nice with loadsharing, but I'd be fine with DRBD between the two if need be.

Can you point to anything explaining how this works with prometheus.. I read about the federation - but can't see that giving me HA.. manual scaling though.. meaning that the grafana graphs would need to be manually edited to query the correct prometheus server (the one which actually has the data) as I understand it ?

Is there any routing/discovery - so grafana could query just one node.. and then it would route the request to the server who had the data?

@matthiasr
Copy link
Contributor

For HA, simply run the two Prometheus servers independently with the same configuration. They'll have the same data, modulo sample timings.

Make a way to query one of the Prometheus servers so that you can switch it to the other. We're mostly doing this with simple CNAMEs in DNS; when a server dies we just switch the CNAME to the other. If you want it to be more automatic, use VIPs or loadbalancers. Configure this name in Grafana, i.e. pretend to Grafana that there is only one server.

The only downside is that while one server is down, it won't collect, so it will have holes in its data for that time. This would be the same with federation.

@matthiasr
Copy link
Contributor

For scaling on the other hand, there are some things you can do with relabelling and hashmod, but we normally just split a Prometheus that gets too loaded by giving services their own Prometheus. This obviously only works well if you have many services with reasonable metric/instance counts each, so you have some line to split along.

@KlavsKlavsen
Copy link
Author

Thank you for feedback. issue with "just running two" servers, is that you'll have to collect twice (ie. double load on servers you collect from).. but as long as the agents (hopefully) just collect and store in mem.. then it won't be much of a load (since it won't fetch the details twice from the hosts we're collecting from). I'd probably have to go with graphite_collector - since opening firewall from prometheus server to ALL servers, will be a fight uphill (and to be fair it is a security risk if anyone finds a bug in the daemon listening - they have to own the prometheus server first though.. but still)

It would be a very cool feature, if federation worked like a "shared cloud" like thing.. (like LDAP/MS AD f.ex. ) ie. each server, knew what stat-keys (ie. in graphite it would be the paths.. servers.server01.cpu...) the others had - so you could query any server, and get forwarded (http 301 or just proxy request) to a server who has the data.

@brian-brazil
Copy link
Contributor

so you could query any server, and get forwarded (http 301 or just proxy request) to a server who has the data.

This is not a feature we plan to add. Prometheus is intended for your critical monitoring, and having a full-mesh of communication between servers opens risks around cascading failures and all the complexities of having to synchronise metadata across servers.

In practice you'll almost always know which Prometheus server has the data you're looking for, so it's not a major problem in the real world.

@shortdudey123
Copy link

you'll almost know

How do you deal with the cases in which you don't? What is best practice for those cases?

@juliusv
Copy link
Member

juliusv commented Apr 11, 2016

@shortdudey123 I think the "best practice" currently is to set up your topology in such a way that you know where to find the metrics you are interested in. In some situations, a label can give you a clue about the source (like when you're doing hierarchical federation from per-zone Prometheus servers to a global one, the data coming from the per-zone Prometheuses would get a zone label attached, which would tell you where to go to find more detail).

It would be interesting to think about better auto-discovery (UI?) features in the future, as long as those don't lead to stronger coupling of components, reliability-wise.

@shortdudey123
Copy link

you know where to find the metrics you are interested in

This is a perfect world scenario that does not always happen even with the best planing.

Example that was mentioned above: when a physical server is downed.
When a host is brought back up, it will have holes in its dataset. In this case, you would need to have something tell you to go to the other host in the pair to get that data.

@juliusv
Copy link
Member

juliusv commented Apr 11, 2016

This is correct. There may be some difference in expectations here. If you need 100% accuracy in data reporting, Prometheus is not for you (see also http://prometheus.io/docs/introduction/overview/#when-does-it-not-fit). We generally tolerate occasional gaps in collection (caused by network problems or downed Prometheuses, etc.), as long as there is always one available Prometheus server for the purpose of current fault detection. Prometheus's local storage is generally not considered a durable, replicated, or arbitrarily long-term storage - even if it works well for weeks, months, or sometimes even years in practice.

"Real" long-term storage is still in the works and aims to address individual down servers and replication of data better. It will be decoupled enough from the main Prometheus servers that these won't have to hard-depend on it though.

@shortdudey123
Copy link

We generally tolerate occasional gaps in collection

Gotcha, thanks for the quick replies :)

@juliusv
Copy link
Member

juliusv commented Apr 11, 2016

@shortdudey123 Note that I wouldn't worry about gaps too much in practice. They should still be pretty rare, so I would still use Prometheus for trending over reasonable periods of time, just not if the trend data is really critical to my business.

@shortdudey123
Copy link

They should still be pretty rare

Unless you run the server in the cloud :p

@juliusv
Copy link
Member

juliusv commented Apr 11, 2016

We generally recommend running Prometheus servers in the same failure domain as the jobs they monitor. Otherwise you won't get a very reliable monitoring setup no matter what solution you are using. For example, run one Prometheus server (or HA pair) in each zone / cluster, monitoring jobs only in that zone. Have a set of global Prometheus servers that monitor (federate from) the per-cluster ones. Have meta-monitoring on top of that. If your topology is even that big.

Or maybe you mean you run everything in the cloud - the cloud itself shouldn't be that unreliable though? :)

Or maybe you mean you're dynamically re-scheduling even your Prometheus servers all the time via some cluster manager, in which case, yes, you're kind of screwed unless you have good network file systems or block devices that get shared across instance migrations. :)

@willbryant
Copy link

If you run two nodes to provide HA, does that mean that you'll get twice the alerts?

@brian-brazil
Copy link
Contributor

No, the alertmanager will deduplicate them.

@willbryant
Copy link

But at the moment alertmanager has no native clustering, so if you run prometheus on two nodes, won't you have two alertmanagers?

@brian-brazil
Copy link
Contributor

You'd run a single alertmanager in that scenario. While you may have many Prometheus servers, you'd only have one alertmanager so manual failover is sufficient for HA.

@willbryant
Copy link

Hmm OK. So the dashboard might be HA, but alerts won't be - if the server hosting alerts goes down, then we won't get an alert to tell us about it :).

@brian-brazil
Copy link
Contributor

It's always important to have independent monitoring of your monitoring system. It's all HA, even with manual failover.

@lock
Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants