Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upConsul SD: Include health check information in meta data #770
Comments
beorn7
assigned
fabxc
Jun 3, 2015
beorn7
added
the
feature-request
label
Jun 3, 2015
This comment has been minimized.
This comment has been minimized.
|
There can be multiple health checks per node. The check IDs do not match our labelname constraints. What would make sense is |
This comment has been minimized.
This comment has been minimized.
|
Just in case, there can be multiple health checks per node AND per service. |
This comment has been minimized.
This comment has been minimized.
|
See #948 (comment) for an example for output with multiple health checks. |
fabxc
added
kind/enhancement
and removed
feature request
labels
Apr 28, 2016
dominikschulz
added a commit
to dominikschulz/prometheus
that referenced
this issue
Oct 16, 2016
dominikschulz
referenced this issue
Oct 16, 2016
Closed
Consul SD: Include health check information in meta data #2085
brian-brazil
added
component/service discovery
priority/P3
help wanted
low hanging fruit
and removed
help wanted
labels
Jul 14, 2017
gouthamve
unassigned
fabxc
Jul 21, 2017
gouthamve
added
the
hacktoberfest
label
Sep 28, 2017
This comment has been minimized.
This comment has been minimized.
nfirvine
commented
Apr 11, 2018
|
Any movement on this? I see that #2085 was closed due to it causing too many RPCs. For me the impact is with prometheus finding its alertmanagers using consul SD. Even though the node/service is unhealthy, prometheus is still attempting to send alerts to it. #2085 (comment) indicates you should keep trying to scrape services, even if they're unhealthy, which I'm not quite sure I understand, but maybe. On the other hand, sending alerts to unhealthy alertmanagers seems wrong in general, and at least deserves the ability to opt out, I would think. |
This comment has been minimized.
This comment has been minimized.
There's no problem with doing that. |
This comment has been minimized.
This comment has been minimized.
nfirvine
commented
Apr 13, 2018
|
No, it's not really a problem in terms of Prometheus doing its job, but the logs are filled with spurious errors. Also, I was hoping to use the series |
This comment has been minimized.
This comment has been minimized.
That sounds like something we can improve, can you file an issue?
If your alertmanagers are unhealthy that often, you have bigger problems. |
This comment has been minimized.
This comment has been minimized.
nfirvine
commented
Apr 17, 2018
I can, but from my point of view avoiding enumerating those alertmanagers in the first place seems like the right fix. If we rip out those log messages, there'd really be no way to tell if an alertmanager that's reporting healthy in consul is actually failing to accept alerts. Honestly, I'm struggling to understand the resistance to figuring out a way to exclude unhealthy consul services. Seems like it'd be a pretty desirable feature, and the design of scraping unhealthy targets is unintuitive. Kubernetes SD for endpoints has the
Yes, a bigger problem that I would love to be able to alert on using my meta-monitoring. ;) |
This comment has been minimized.
This comment has been minimized.
Which is why it's important to scrape non-healthy instances :) Something is wrong with your setup, this is a very weak use case. |
This comment has been minimized.
This comment has been minimized.
nfirvine
commented
Apr 17, 2018
|
I am scraping them. But I need to know how many Prometheus considers in terms of being a valid place to send alerts, not just whether it's a valid target for scraping. That information doesn't seem to exist. Anyway, I seem to not be able to convey this, so let's drop it. Can we talk about the other aspect though? I think there's valid reasons for something like |
This comment has been minimized.
This comment has been minimized.
There's no such distinction. |
beorn7 commentedJun 3, 2015
So that users who need that information can relabel it onto their metrics.
__meta_consul_healthor something...