explaining a HA + scalable setup? #1500

KlavsKlavsen · 2016-03-24T07:18:40Z

Hi guys,

I've been looking at replacing graphite for sometime.. being pretty bound to the push model here though.

I'm trying to figure out to do high-availability setups with prometheus? I was planning on setting it up on 2 physical servers - and I'd like my grafana dashboard and collecting still working, when a physical server is downed.. and it would be nice with loadsharing, but I'd be fine with DRBD between the two if need be.

Can you point to anything explaining how this works with prometheus.. I read about the federation - but can't see that giving me HA.. manual scaling though.. meaning that the grafana graphs would need to be manually edited to query the correct prometheus server (the one which actually has the data) as I understand it ?

Is there any routing/discovery - so grafana could query just one node.. and then it would route the request to the server who had the data?

matthiasr · 2016-03-24T08:39:03Z

For HA, simply run the two Prometheus servers independently with the same configuration. They'll have the same data, modulo sample timings.

Make a way to query one of the Prometheus servers so that you can switch it to the other. We're mostly doing this with simple CNAMEs in DNS; when a server dies we just switch the CNAME to the other. If you want it to be more automatic, use VIPs or loadbalancers. Configure this name in Grafana, i.e. pretend to Grafana that there is only one server.

The only downside is that while one server is down, it won't collect, so it will have holes in its data for that time. This would be the same with federation.

matthiasr · 2016-03-24T08:42:17Z

For scaling on the other hand, there are some things you can do with relabelling and hashmod, but we normally just split a Prometheus that gets too loaded by giving services their own Prometheus. This obviously only works well if you have many services with reasonable metric/instance counts each, so you have some line to split along.

KlavsKlavsen · 2016-03-30T07:09:41Z

Thank you for feedback. issue with "just running two" servers, is that you'll have to collect twice (ie. double load on servers you collect from).. but as long as the agents (hopefully) just collect and store in mem.. then it won't be much of a load (since it won't fetch the details twice from the hosts we're collecting from). I'd probably have to go with graphite_collector - since opening firewall from prometheus server to ALL servers, will be a fight uphill (and to be fair it is a security risk if anyone finds a bug in the daemon listening - they have to own the prometheus server first though.. but still)

It would be a very cool feature, if federation worked like a "shared cloud" like thing.. (like LDAP/MS AD f.ex. ) ie. each server, knew what stat-keys (ie. in graphite it would be the paths.. servers.server01.cpu...) the others had - so you could query any server, and get forwarded (http 301 or just proxy request) to a server who has the data.

brian-brazil · 2016-03-30T17:51:37Z

so you could query any server, and get forwarded (http 301 or just proxy request) to a server who has the data.

This is not a feature we plan to add. Prometheus is intended for your critical monitoring, and having a full-mesh of communication between servers opens risks around cascading failures and all the complexities of having to synchronise metadata across servers.

In practice you'll almost always know which Prometheus server has the data you're looking for, so it's not a major problem in the real world.

shortdudey123 · 2016-04-11T22:18:08Z

you'll almost know

How do you deal with the cases in which you don't? What is best practice for those cases?

juliusv · 2016-04-11T22:30:43Z

@shortdudey123 I think the "best practice" currently is to set up your topology in such a way that you know where to find the metrics you are interested in. In some situations, a label can give you a clue about the source (like when you're doing hierarchical federation from per-zone Prometheus servers to a global one, the data coming from the per-zone Prometheuses would get a zone label attached, which would tell you where to go to find more detail).

It would be interesting to think about better auto-discovery (UI?) features in the future, as long as those don't lead to stronger coupling of components, reliability-wise.

shortdudey123 · 2016-04-11T22:41:47Z

you know where to find the metrics you are interested in

This is a perfect world scenario that does not always happen even with the best planing.

Example that was mentioned above: when a physical server is downed.
When a host is brought back up, it will have holes in its dataset. In this case, you would need to have something tell you to go to the other host in the pair to get that data.

juliusv · 2016-04-11T22:48:32Z

This is correct. There may be some difference in expectations here. If you need 100% accuracy in data reporting, Prometheus is not for you (see also http://prometheus.io/docs/introduction/overview/#when-does-it-not-fit). We generally tolerate occasional gaps in collection (caused by network problems or downed Prometheuses, etc.), as long as there is always one available Prometheus server for the purpose of current fault detection. Prometheus's local storage is generally not considered a durable, replicated, or arbitrarily long-term storage - even if it works well for weeks, months, or sometimes even years in practice.

"Real" long-term storage is still in the works and aims to address individual down servers and replication of data better. It will be decoupled enough from the main Prometheus servers that these won't have to hard-depend on it though.

shortdudey123 · 2016-04-11T22:52:22Z

We generally tolerate occasional gaps in collection

Gotcha, thanks for the quick replies :)

juliusv · 2016-04-11T23:03:41Z

@shortdudey123 Note that I wouldn't worry about gaps too much in practice. They should still be pretty rare, so I would still use Prometheus for trending over reasonable periods of time, just not if the trend data is really critical to my business.

shortdudey123 · 2016-04-11T23:23:35Z

They should still be pretty rare

Unless you run the server in the cloud :p

juliusv · 2016-04-11T23:37:14Z

We generally recommend running Prometheus servers in the same failure domain as the jobs they monitor. Otherwise you won't get a very reliable monitoring setup no matter what solution you are using. For example, run one Prometheus server (or HA pair) in each zone / cluster, monitoring jobs only in that zone. Have a set of global Prometheus servers that monitor (federate from) the per-cluster ones. Have meta-monitoring on top of that. If your topology is even that big.

Or maybe you mean you run everything in the cloud - the cloud itself shouldn't be that unreliable though? :)

Or maybe you mean you're dynamically re-scheduling even your Prometheus servers all the time via some cluster manager, in which case, yes, you're kind of screwed unless you have good network file systems or block devices that get shared across instance migrations. :)

willbryant · 2016-05-11T22:30:54Z

If you run two nodes to provide HA, does that mean that you'll get twice the alerts?

brian-brazil · 2016-05-11T22:33:28Z

No, the alertmanager will deduplicate them.

willbryant · 2016-05-12T02:00:35Z

But at the moment alertmanager has no native clustering, so if you run prometheus on two nodes, won't you have two alertmanagers?

brian-brazil · 2016-05-12T02:06:26Z

You'd run a single alertmanager in that scenario. While you may have many Prometheus servers, you'd only have one alertmanager so manual failover is sufficient for HA.

willbryant · 2016-05-12T02:08:13Z

Hmm OK. So the dashboard might be HA, but alerts won't be - if the server hosting alerts goes down, then we won't get an alert to tell us about it :).

brian-brazil · 2016-05-12T02:11:14Z

It's always important to have independent monitoring of your monitoring system. It's all HA, even with manual failover.

lock · 2019-03-24T08:20:06Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

brian-brazil added the question label Mar 30, 2016

fabxc added kind/question and removed question labels Apr 28, 2016

brian-brazil closed this as completed Jul 13, 2016

douglas-reid mentioned this issue Aug 15, 2018

May not be able to view metrics when sharding prometheus istio/istio#7823

Closed

lock bot locked and limited conversation to collaborators Mar 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explaining a HA + scalable setup? #1500

explaining a HA + scalable setup? #1500

KlavsKlavsen commented Mar 24, 2016

matthiasr commented Mar 24, 2016

matthiasr commented Mar 24, 2016

KlavsKlavsen commented Mar 30, 2016

brian-brazil commented Mar 30, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

willbryant commented May 11, 2016

brian-brazil commented May 11, 2016

willbryant commented May 12, 2016

brian-brazil commented May 12, 2016

willbryant commented May 12, 2016

brian-brazil commented May 12, 2016

lock bot commented Mar 24, 2019

explaining a HA + scalable setup? #1500

explaining a HA + scalable setup? #1500

Comments

KlavsKlavsen commented Mar 24, 2016

matthiasr commented Mar 24, 2016

matthiasr commented Mar 24, 2016

KlavsKlavsen commented Mar 30, 2016

brian-brazil commented Mar 30, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

shortdudey123 commented Apr 11, 2016

juliusv commented Apr 11, 2016

willbryant commented May 11, 2016

brian-brazil commented May 11, 2016

willbryant commented May 12, 2016

brian-brazil commented May 12, 2016

willbryant commented May 12, 2016

brian-brazil commented May 12, 2016

lock bot commented Mar 24, 2019