Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic arithmetic functions to templating funcmap #1188

Open
roganartu opened this issue Jan 9, 2018 · 18 comments
Open

Add basic arithmetic functions to templating funcmap #1188

roganartu opened this issue Jan 9, 2018 · 18 comments

Comments

@roganartu
Copy link

Why

I link to grafana dashboards from my prometheus alerts, using some alert labels as grafana variables in the URL to narrow down the dashboard queries. I would like to link to a specific range (eg: 30 mins before/after) around StartsAt and/or EndsAt instead of having to adjust the timeline after opening the link, but this requires basic arithmetic functions (ie: add/sub). In addition, grafana URL params expect unix timestamps in milliseconds, making StartsAt and EndsAt currently unusable, forcing me to return to the alert and manually translate the times from it to grafana via a date picker.

Proposal

Add the following self-explanatory arithmetic functions to DefaultFuncs in https://github.com/prometheus/alertmanager/blob/master/template/template.go

  1. add
  2. sub
  3. div
  4. mul

From a user perspective, I don't want to worry about types here, so these functions will require type assertions to differentiate between floats and ints (and maybe strconv.Atoi if string, but I'm not sold on the usefulness of that). Still, this should be relatively simple to implement.

Add the following two functions to allow the use of the above arithmetic functions to manipulate the timestamps in StartsAt and EndsAt:

  1. toUnix
  2. fromUnix

This is similar to #603 but that request has other concerns about manipulating/accessing current dates, sorting lists etc, so I thought it worth separating the two.

@brian-brazil
Copy link
Contributor

Anything like this should be done down in alert templates in Prometheus.

@roganartu
Copy link
Author

The problem with putting this in Prometheus alert templates is that it will cause a huge amount of duplication in this case.

To solve my given example with my proposal I would need to add something like the following to a single shared template in alertmanager:

{{ $url := (printf "%s?from=%s" $url (.StartsAt | toUnix | sub 1800000)) }}
{{ $url := (printf "%s&to=%s" $url (.EndsAt | toUnix | add 1800000)) }}

That I can then include wherever needed with {{ template "grafana.link.href.partial" . }}.

To achieve the same if these functions instead existed in Prometheus alert templates would require adding the following to every single rule:

annotations:
  start_unix: {{ .StartsAt | toUnix | sub 1800000 }}
  end_unix: {{ .EndsAt | toUnix | add 1800000) }}

As well as having a line in the alertmanager template to extract the annotation anyway.

Additionally, timestamp data seemingly isn't exposed to Prometheus alert templates. Unless I'm missing something, only labels and the raw sample value are exposed: https://github.com/prometheus/prometheus/blob/master/rules/alerting.go#L196-L202
Even if this timestamp data were to be exposed to the template here it wouldn't be the ActiveAt timestamp (same as StartsAt in alertmanager?), which is the useful one for this example.

@brian-brazil
Copy link
Contributor

The StartsAt and EndsAt aren't exactly reliable, and may be zero depending on the current state of the alert. They're more an implementation detail than anything.

Usually you also want context on an alert, not merely when it get bad enough to start firing. What I'd suggest is creating links to Grafana with fixed parameters such as &from=now-6h&to=now or rely on the defaults for the dashboard which (presumably) have an appropriate value for the time range already.

@carlosflorencio
Copy link

Ugly workaround for now:

groups:
- name: testalert
  rules:
  - record: grafanaFrom
    expr: vector((time() - (30*60))*1000)
  - record: grafanaTo
    expr: vector((time() + (30*60))*1000)
  - alert: IgnoreAlert
    expr: vector(1)
    for: 10s
    labels:
      severity: major
      grafana: "http://grafana.board.local?{{ printf \"from=%.0f&to=%.0f\" (query \"grafanaFrom\" | first | value) (query \"grafanaTo\" | first | value) }}"
    annotations:
      summary: Daily alert test summary
      description: Daily alert test description

@simonpasquier
Copy link
Member

Note that the grafana link should be an annotation and not a label (see prometheus/prometheus#4652 for the details).

@ServerNinja
Copy link

ServerNinja commented Feb 6, 2019

The StartsAt and EndsAt aren't exactly reliable, and may be zero depending on the current state of the alert. They're more an implementation detail than anything.

Usually you also want context on an alert, not merely when it get bad enough to start firing. What I'd suggest is creating links to Grafana with fixed parameters such as &from=now-6h&to=now or rely on the defaults for the dashboard which (presumably) have an appropriate value for the time range already.

I would argue that being able to produce a graph attached to an alert with the timeframe the alert occurred as opposed to (now-6h to now) would be ideal for gathering data and graphs to prepare for postmortems. It seems like it would be very beneficial.

Ideally, one would do something like this in the alert template:

https://grafana.url:xxx/dashboard?var-pod_name={{ .Labels.pod_name }}&from={{ .StartsAt | UnixDate }}-15m&to={{ .EndsAt | UnixDate }}

@Tyson1986
Copy link

Tyson1986 commented Jul 9, 2019

Any updates? I tried to put Splunk and Grafana links to Splunk alert template with timestamps. I still haven't found a good solution.
IMHO put relative links like now-6h to now is bad practice. Sometime you'd like to use this link after some time, for example after the weekends.
As of now closest solution is use:
{{ with query "time()" }}{{ . | first | value | printf "%.0f"}}{{ end }}

@Yapcheekian
Copy link

Any updates? I tried to put Splunk and Grafana links to Splunk alert template with timestamps. I still haven't found a good solution.
IMHO put relative links like now-6h to now is bad practice. Sometime you'd like to use this link after some time, for example after the weekends.
As of now closest solution is use:
{{ with query "time()" }}{{ . | first | value | printf "%.0f"}}{{ end }}

Do you have any idea how to trim the whitespace at the begin and end of the timestamp?

@bastibrunner
Copy link

I found this thread while searching for grafana timerange but in alertmanager templates. This is my solution, maybe it helps someone else:

&time={{- (index .Alerts 0).StartsAt.Unix -}}000&time.window=600000

@Alexander-Bartosh
Copy link

Guys
I needed StartsAt - 10m

This
{{ (.StartsAt.Add -600000000000 ).Unix }}000
Did the trick for me with Grafana.

Logs: <{{ $.ExternalURL }}/explore?orgId=1&left=%5B%22{{ (.StartsAt.Add -600000000000 ).Unix }}000%22,%22{{if eq .Status "firing" }}now{{ else }}{{ .EndsAt.Unix }}000{{ end }}%22,%22Loki%22,%7B%22expr%22:%22{{ urlquery .Annotations.logsExpr | reReplaceAll "\+" "%20" | reReplaceAll "%5C" "%5C%5C" | reReplaceAll "%22" "%5C%22" }}%22%7D%5D|:chart_with_upwards_trend: Graph>

@roidelapluie
Copy link
Member

I also use {{.StartsAt.Add -600000000000.Unix}}000.

I think we can close this issue.

@hanikesn
Copy link

I think it makes sense to document the workaround in the official documentation as it isn't obvious for most people.

@ismarslomic
Copy link

I have spent many hours finding this issue and workarounds. So I think definitely that official docs should be updated with examples and tips. Linking to the Grafana dashboard with time range is crucial. But what would be even better is to have variables and functions to support this functionality.

Thanks to all contributing with useful workarounds!

@diversario
Copy link

There's still no basic math available, though.

@grobinson-grafana
Copy link
Contributor

There are no integer or decimal fields in the template data as far as I can tell, so in what situations would having Math functions be useful? (template.go#L296-L317)

@nikita2206
Copy link

@grobinson-grafana there is {{ $value }}, take for example kube_job_status_start_time and you could use that value (unix ts) to generate a link to logs with sensible timestamp bounds

@grobinson-grafana
Copy link
Contributor

@nikita2206 There is $value in Prometheus. However, this issue is talking about Alertmanager, and there is no $value in Alertmanager as far as I know?

@nikita2206
Copy link

@grobinson-grafana To be more specific, here is my use case: (including the workaround)

  - alert: KubeCronJobFailing2Hours
    expr: |
      (kube_job_failed{condition="true"} > 0)
        * on (job_name) group_right ()
          label_replace(kube_job_owner{owner_kind="CronJob"}, "cronjob", "$0", "owner_name", ".*")
        * on (job_name) group_left ()
          kube_job_status_start_time
      unless on (cronjob)
        label_replace(
          present_over_time(kube_job_status_completion_time[2h]),
          "cronjob", "$1", "job_name", "^(.+)-\\d+$")
    annotations:
      type: Job
      cronjob: "{{ $labels.cronjob }}"
      message: >
        CronJob `{{ $labels.cronjob }}` is failing and hasn't completed successfully for at least 2 hours,
        last attempt was at {{ $value | humanizeTimestamp }},
        <https://logs-backend.internal/logs?filter=trace-id%3D%27{{ $labels.job_name }}%27&startTime={{ (printf "vector(%f - 10)" .Value) | query | first | value | printf "%.0f" }}&endTime={{ (printf "vector(%f + 1800)" .Value) | query | first | value | printf "%.0f" }}|logs here>.

As you can see, I would like to include a link to the logs, which needs time bounds. Sensible time bounds, given that the start timestamp of the Job is known, would be something like '10 seconds before the job started' until '30 minutes after the job started'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests