Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metric for alertrule template rendering failure #4634

Closed
juliantaylor opened this issue Sep 19, 2018 · 2 comments

Comments

@juliantaylor
Copy link

@juliantaylor juliantaylor commented Sep 19, 2018

The alertrules can have templated values which can be filled out based on metric labels for summaries.
This template rendering can fail due to typos in the template. In that case no alert is sent to the alertmanager which can be a major problem.

E.g. forgetting the .Labels causes:

Sep 19 13:11:47 po-prometheus-live-bs01 prometheus[23155]: level=warn ts=2018-09-19T11:11:47.902437202Z caller=alerting.go:220 component="rule manager" alert=k8s_puppet_inconsistent_environment msg="Expanding alert template failed" err="error executing template __alert_k8s_puppet_inconsistent_environment: template: __alert_k8s_puppet_inconsistent_environment:1:68: executing \"__alert_k8s_puppet_inconsistent_environment\" at <.group>: can't evaluate field group in type struct { Labels map[string]string; Value float64 }" data="unsupported value type"

There should be a metric that is increased on rule template rendering failures similar to prometheus_rule_evaluation_failures_total and prometheus_notifications_errors_total so one can alert on that failure instead.

We are currently using prometheus 2.3.2.

@mucahitkurt

This comment has been minimized.

Copy link
Contributor

@mucahitkurt mucahitkurt commented Oct 16, 2018

@simonpasquier I would like to help about this issue. I think, a metric like prometheus_template_expand_failures_total will be added and increased when a parse error ocured at methods Expand() and ExpandHTML()

@simonpasquier

This comment has been minimized.

Copy link
Member

@simonpasquier simonpasquier commented Oct 16, 2018

@mucahitkurt sure you're more than welcome! You're mostly correct. Maybe we can limit the counter to Expand() as ExpandHTML is only used for rendering web pages, not alerts.

mucahitkurt added a commit to mucahitkurt/prometheus that referenced this issue Oct 16, 2018
…ate expanding errors prometheus#4634

Signed-off-by: Mucahit Kurt <mucahitkurt@gmail.com>
@lock lock bot locked and limited conversation to collaborators May 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants
You can’t perform that action at this time.