-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explaining a HA + scalable setup? #1500
Comments
For HA, simply run the two Prometheus servers independently with the same configuration. They'll have the same data, modulo sample timings. Make a way to query one of the Prometheus servers so that you can switch it to the other. We're mostly doing this with simple CNAMEs in DNS; when a server dies we just switch the CNAME to the other. If you want it to be more automatic, use VIPs or loadbalancers. Configure this name in Grafana, i.e. pretend to Grafana that there is only one server. The only downside is that while one server is down, it won't collect, so it will have holes in its data for that time. This would be the same with federation. |
For scaling on the other hand, there are some things you can do with relabelling and hashmod, but we normally just split a Prometheus that gets too loaded by giving services their own Prometheus. This obviously only works well if you have many services with reasonable metric/instance counts each, so you have some line to split along. |
Thank you for feedback. issue with "just running two" servers, is that you'll have to collect twice (ie. double load on servers you collect from).. but as long as the agents (hopefully) just collect and store in mem.. then it won't be much of a load (since it won't fetch the details twice from the hosts we're collecting from). I'd probably have to go with graphite_collector - since opening firewall from prometheus server to ALL servers, will be a fight uphill (and to be fair it is a security risk if anyone finds a bug in the daemon listening - they have to own the prometheus server first though.. but still) It would be a very cool feature, if federation worked like a "shared cloud" like thing.. (like LDAP/MS AD f.ex. ) ie. each server, knew what stat-keys (ie. in graphite it would be the paths.. servers.server01.cpu...) the others had - so you could query any server, and get forwarded (http 301 or just proxy request) to a server who has the data. |
This is not a feature we plan to add. Prometheus is intended for your critical monitoring, and having a full-mesh of communication between servers opens risks around cascading failures and all the complexities of having to synchronise metadata across servers. In practice you'll almost always know which Prometheus server has the data you're looking for, so it's not a major problem in the real world. |
How do you deal with the cases in which you don't? What is best practice for those cases? |
@shortdudey123 I think the "best practice" currently is to set up your topology in such a way that you know where to find the metrics you are interested in. In some situations, a label can give you a clue about the source (like when you're doing hierarchical federation from per-zone Prometheus servers to a global one, the data coming from the per-zone Prometheuses would get a It would be interesting to think about better auto-discovery (UI?) features in the future, as long as those don't lead to stronger coupling of components, reliability-wise. |
This is a perfect world scenario that does not always happen even with the best planing. Example that was mentioned above: |
This is correct. There may be some difference in expectations here. If you need 100% accuracy in data reporting, Prometheus is not for you (see also http://prometheus.io/docs/introduction/overview/#when-does-it-not-fit). We generally tolerate occasional gaps in collection (caused by network problems or downed Prometheuses, etc.), as long as there is always one available Prometheus server for the purpose of current fault detection. Prometheus's local storage is generally not considered a durable, replicated, or arbitrarily long-term storage - even if it works well for weeks, months, or sometimes even years in practice. "Real" long-term storage is still in the works and aims to address individual down servers and replication of data better. It will be decoupled enough from the main Prometheus servers that these won't have to hard-depend on it though. |
Gotcha, thanks for the quick replies :) |
@shortdudey123 Note that I wouldn't worry about gaps too much in practice. They should still be pretty rare, so I would still use Prometheus for trending over reasonable periods of time, just not if the trend data is really critical to my business. |
Unless you run the server in the cloud :p |
We generally recommend running Prometheus servers in the same failure domain as the jobs they monitor. Otherwise you won't get a very reliable monitoring setup no matter what solution you are using. For example, run one Prometheus server (or HA pair) in each zone / cluster, monitoring jobs only in that zone. Have a set of global Prometheus servers that monitor (federate from) the per-cluster ones. Have meta-monitoring on top of that. If your topology is even that big. Or maybe you mean you run everything in the cloud - the cloud itself shouldn't be that unreliable though? :) Or maybe you mean you're dynamically re-scheduling even your Prometheus servers all the time via some cluster manager, in which case, yes, you're kind of screwed unless you have good network file systems or block devices that get shared across instance migrations. :) |
If you run two nodes to provide HA, does that mean that you'll get twice the alerts? |
No, the alertmanager will deduplicate them. |
But at the moment alertmanager has no native clustering, so if you run prometheus on two nodes, won't you have two alertmanagers? |
You'd run a single alertmanager in that scenario. While you may have many Prometheus servers, you'd only have one alertmanager so manual failover is sufficient for HA. |
Hmm OK. So the dashboard might be HA, but alerts won't be - if the server hosting alerts goes down, then we won't get an alert to tell us about it :). |
It's always important to have independent monitoring of your monitoring system. It's all HA, even with manual failover. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi guys,
I've been looking at replacing graphite for sometime.. being pretty bound to the push model here though.
I'm trying to figure out to do high-availability setups with prometheus? I was planning on setting it up on 2 physical servers - and I'd like my grafana dashboard and collecting still working, when a physical server is downed.. and it would be nice with loadsharing, but I'd be fine with DRBD between the two if need be.
Can you point to anything explaining how this works with prometheus.. I read about the federation - but can't see that giving me HA.. manual scaling though.. meaning that the grafana graphs would need to be manually edited to query the correct prometheus server (the one which actually has the data) as I understand it ?
Is there any routing/discovery - so grafana could query just one node.. and then it would route the request to the server who had the data?
The text was updated successfully, but these errors were encountered: