Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upUse only ip:port or host:port in instance label #493
Comments
This comment has been minimized.
This comment has been minimized.
|
Would it not also make sense to have a label that is always the FQDN of the host? My reasoning: Alerts don't want to know what the endpoint is, they want the host. For example, if I have an endpoint reporting whether a specific process is in an operational state. You end up with metrics stored as something like In our case, we're using Puppet exported resources to collect alert rules, and in our situation every rule we create must pertain to that specific host, rather than all instances of a specific metric. Concrete example: We need to monitor the number of running worker processes of a certain daemon, which happens to be 4 on one host, but 8 on another. So we need to two separate alert rules, one for each host. Actually, adding the host label may have to go into whatever is collecting metrics, eg. |
This comment has been minimized.
This comment has been minimized.
|
Follow-up question: Until this is resolved, would it make sense for us to devote one job per host, and rely on the job name as the host name? |
This comment has been minimized.
This comment has been minimized.
|
On 22 February 2015 at 03:09, Alexander Staubo notifications@github.com
What I've been thinking is to have a template function that can go from
It's best to think about services, not machines. Don't think host42 runs |
This comment has been minimized.
This comment has been minimized.
I suspect you may be trying to directly translate from another monitoring system to Prometheus, which means you might miss many of the benefits of Prometheus's power. Why don't you pop into IRC on #prometheus on Freenode so we can help see if there's a better way to solve your problems? |
This comment has been minimized.
This comment has been minimized.
|
I am not trying to translate from an existing monitoring system. But for as long as we're using Puppet, we are forced to follow the Puppet model where everything is host-oriented; exported resources are always host-based, and it would be complicated to de-duplicate them. To explain — I don't know how well you know Puppet — at the moment every host "exports" an alert file for each thing (such as processes or network ports) to monitor. The alert file is something as simple as:
This alert file, being exported, is something that can be read by the config manifest for any other host in the Puppet system. Thus, the Puppet manifest for the Prometheus server "collects" all those alert files into a directory; there is then a template that generates This way, everything is automated: If we add a service on any host, the correct rule appears on the Prometheus server. What you're suggesting is that I don't include a host at all, but check the processes in the aggregate. That means that when a host exports its alerts, I will end up with duplicate alert files, because we want one alert per process, not per-process-per-host. I can work around Puppet's limitations in a number of ways. For example, instead of exporting alert files, I can export a simple file which But I'm not sure how to write an alert if the host name is not part of the alert, and the target metric is. I must be missing something. For example, given a reported metric:
...how do I write an
Or did you mean that I write an alert like this:
If you meant the latter, that means I still need to emit one alert entry per host, since the host is only one that knows what rvalue to insert here. But you're right in that I don't have to include the host name. I do have to de-duplicate: If I have N hosts that all have a minimum of 8, I wouldn't want N alert entries with the same check. |
This comment has been minimized.
This comment has been minimized.
|
I really think you should join us over in IRC, there's better ways to monitor this but it's more difficult to explain it in a less-interactive medium such as this. In general you want to autogenerate your list of targets, and your rules should almost always be completely static. What you want to export is something like: and then have a single alert rule that's: |
This comment has been minimized.
This comment has been minimized.
|
Ah, the fact that expressions are able to join was the missing part I was looking for. I assume here that alert rule expressions match on all labels except the instance name. The fact that I can do this means I can simplify the rules to be non-host-specific. I'm not on IRC, and don't want to go through the hassle of setting up a client just for this, sorry. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sorry, I meant the time series name. Got it. Thanks. |
This comment has been minimized.
This comment has been minimized.
|
Yeah, you can do all kinds of fancy expression language stuff to select, aggregate, and match your time series to get useful alerts. Since Prometheus is quite unique here, I think we'll need more intro-level guides to this stuff... (but time!) |
beorn7
closed this
May 11, 2015
simonpasquier
pushed a commit
to simonpasquier/prometheus
that referenced
this issue
Oct 12, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
juliusv commentedJan 30, 2015
This makes instance labels smaller and only include parts that are (usually) used to identify an instance. The exact items to include might also vary depending on the type of static configuration or service discovery. TBD.