Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to config alert that can immediately trigger to firing states #4724

Closed
pulord opened this Issue Oct 11, 2018 · 10 comments

Comments

Projects
None yet
2 participants
@pulord
Copy link

pulord commented Oct 11, 2018

Bug Report

What did you do?
want config an alert which can immediately trgger and send to alertmanager

What did you expect to see?
alretmanager receive alert immediately

What did you see instead? Under which circumstances?
sometime works,but Sometimes not works

Environment
docker

  • Prometheus version:

root@prometheus-76f4b8668c-28qc6:/# prometheus --version
prometheus, version 2.4.0-rc.0 (branch: master, revision: d075d78)
build user: putaohong@putaohodeMBP719.frontnode.net
build date: 20180907-06:58:06
go version: go1.11

  • Alertmanager version:

root@alertmanager-5664c7dc9c-9xskx:/# alertmanager --version
alertmanager, version 0.15.0-rc.2 (branch: master, revision: 4208663af3850b2ac07d53e6c3152058eb4a815e)
build user: root@7964facceeb6
build date: 20180612-09:21:34
go version: go1.9.1

  • Prometheus configuration file:
global:
  scrape_interval: 30s
  scrape_timeout: 30s


# A scrape configuration for running Prometheus on a Kubernetes cluster.
# This uses separate scrape configs for cluster components (i.e. API server, node)
# and services to allow each to use different authentication configs.
#
# Kubernetes labels will be added as Prometheus labels on metrics via the
# `labelmap` relabeling action.
#
# If you are using Kubernetes 1.7.2 or earlier, please take note of the comments
# for the kubernetes-cadvisor job; you will need to edit or remove this job.



alerting:
  alertmanagers:
    - static_configs:
      - targets: ["alertmanager.infra:9093"]

rule_files:
  - "/etc/prometheus/alert-rule.yml"
  - "/etc/prometheus/record-rule.yml"

  - job_name: "nginx-exporter"
    scrape_interval: 15s
    honor_labels: true
    static_configs:
      - targets: ['k8s-dev-db.yingmi-inc.com:10201']
groups:
  - name: Prod-Alerts
    rules:
    - alert: Http-Request-Status-Code-500
      expr: nginx_http_request_status_seconds{status_code="500",interface_name!~".*/check(/)?$"} >= 1
      labels:
        severity: High
        current_value: "{{ $value }}"
        standard_value: "0"
        interface_name: '{{ $labels.interface_name }}'
        group_tag_one: "{{ $labels.interface_name }}"
        service: "{{ $labels.service }}"
        cluster: "{{ $labels.cluster }}"
        http_x_request_id: "{{ $labels.http_x_request_id }}"
        grafana_url: "http://ymon.yingmi-inc.com/d/ElRrXkOiz/error-list?orgId=1"
      annotations:
        summary: "The request status code of {{ $labels.cluster }},{{ $labels.interface_name }} is 500"
        description: "{{ $labels.interface_name }} is 500"
  • Grafana Query:
nginx_http_request_status_seconds{status_code="500",interface_name!~".*/check(/)?$"} >= 1

image

I can find promethues stored the metrics,but i can‘t find interrelated alerts for them
** sometimes its works **
image

** Expect **

  • How to set an alert than can immediately trigger (not found in official documents)
  • How can slove this problem( sometime works, sometime not works)
@pulord

This comment has been minimized.

Copy link
Author

pulord commented Oct 11, 2018

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 11, 2018

Alerting rules are evaluated every minute by default which could explain why your alert doesn't fire if the metric goes up and back to zero in less than a minute. Please note also that your alert definition is prone to flapping as you don't have any for: clause.

I'm closing it for now. If you have further questions, please use our user mailing list, which you can also search.

@pulord

This comment has been minimized.

Copy link
Author

pulord commented Oct 11, 2018

Sorry,maybe the description of issue is not legible.

I just want to known how config that alert can fire immediately?

I find solution from Prometheus: understanding the delays on alerting
image

So I delete for clause. but is works sometime, sometime not work

So How can i do? thanks!!

try set evaluation_interval = 0s ???

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 11, 2018

Setting evaluation_interval to zero won't work as it will default to 1 minute. You can set it to scrape_interval / 2 (anything lower doesn't make sense as the value won't change between scrape). But in general it isn't a recommended approach as it will generate lots of noise.

@pulord

This comment has been minimized.

Copy link
Author

pulord commented Oct 15, 2018

Setting evaluation_interval to zero won't work as it will default to 1 minute. You can set it to scrape_interval / 2 (anything lower doesn't make sense as the value won't change between scrape). But in general it isn't a recommended approach as it will generate lots of noise.

As you said, if set evaluation_interval=scrape_interval / 2, it will generate lots of noise.
So any good idear?

Otherwise,the alert should be fired immediately, don't need any evaluation and Prometheus support it?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 15, 2018

Otherwise,the alert should be fired immediately, don't need any evaluation and Prometheus support it?

I don't get what you mean. What is your use case for having an alert firing immediately? In general Prometheus alerting is defined in https://prometheus.io/docs/practices/alerting/, in particular this excerpt: "Allow for slack in alerting to accommodate small blips".

@pulord

This comment has been minimized.

Copy link
Author

pulord commented Oct 16, 2018

My case is:
I monitor the logs of production system from nginx . we expect to send alert which http status_code is 500, and Developer can get the alert info (such as interface name , environment and service) and fix it quickly ,So Expect Http-Request-Status-Code-500 send immediately。

the expr:
expr: nginx_http_request_status_seconds{status_code="500",interface_name!~".*/check(/)?$"} >= 1

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 16, 2018

This isn't exactly what Prometheus is meant for but you can try that:

max_over_time(nginx_http_request_status_seconds{status_code="500",interface_name!~".*/check(/)?$"}|5m]) >= 1

This would fire the alert immediately and keep it firing for 5 minutes. You can obviously adjust the 5 minutes interval to your needs.

@pulord

This comment has been minimized.

Copy link
Author

pulord commented Oct 16, 2018

This isn't exactly what Prometheus is meant for but you can try that:

max_over_time(nginx_http_request_status_seconds{status_code="500",interface_name!~".*/check(/)?$"}|5m]) >= 1

This would fire the alert immediately and keep it firing for 5 minutes. You can obviously adjust the 5 minutes interval to your needs.

good idea !!!!!
let me try !!!

@pulord

This comment has been minimized.

Copy link
Author

pulord commented Oct 17, 2018

thanks @simonpasquier
its work perfectly, although the alert be fired after 2m
but any alert dose not miss anymore!

@lock lock bot locked and limited conversation to collaborators Apr 15, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.