Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hysterisis alert like zabbix #863

Open
annProg opened this issue Sep 2, 2016 · 2 comments
Open

hysterisis alert like zabbix #863

annProg opened this issue Sep 2, 2016 · 2 comments

Comments

@annProg
Copy link

annProg commented Sep 2, 2016

Hi,
I test kapacitor and have some questions.

1. with window but no all()

var win = 5s
var origin = stream
    |from()
        .database('test')
        .retentionPolicy('default')
        .measurement('ka')
        .groupBy('app')

origin
    |window()
        .period(win)
        .every(1s)

    |alert()
        .id('HTTP_CODE:{{ index .Tags "app" }}')
        .message('')
        .crit(lambda: "code_match" == 0)
        .log('/tmp/alerts.log')

test data

i1=(1 1 0 1 0 1 1 1 1 1)

result:

window  level  duration
1,1,0   "CRITICAL"  0
1,1,0,1 "CRITICAL"  0
1,0,1,0 "CRITICAL"  0
0,1,0,1 "CRITICAL"  0
1,0,1,1 "CRITICAL"  2034708173
0,1,1,1 "CRITICAL"  2034708173
1,1,1,1 "OK"        7107346394

with windows, if one bad value came in window, kapacitor will trigger immediately and until all bad value leave

2. with windows and all()

var win = 5s
var origin = stream
    |from()
        .database('test')
        .retentionPolicy('default')
        .measurement('ka')
        .groupBy('app')

origin
    |window()
        .period(win)
        .every(1s)

    |alert()
        .id('HTTP_CODE:{{ index .Tags "app" }}')
        .message('')
        .all()
        .crit(lambda: "code_match" == 0)
        .log('/tmp/alerts.log')
        .all()

test data

i1=(1 0 0 0 0 1 1 1 1 1 0 1 1)

result

windows  level  duration
0,0,0,0 "CRITICAL"  0
0,0,0,1 "OK"        1022011392

with window and all(), kapacitor will trigger when all point in window is 0, but if one 1 enter in window ,trigger OK immediately

Hysteresis in Zabbix

description: Sometimes a trigger must have different conditions for different states. For example, we would like to define a trigger which would become PROBLEM when server room temperature is higher than 20C while it should stay in the state until temperature will not become lower than 15C.

I want to get alert like zabbix hystereis, Is there any way to do it?

As a example, if I have some test point:

i1=(1 0 0 0 0 1 0 1 1 1 1 1)

I want result like this:

1,0,0,0  no trigger
0,0,0,0  "CRITICAL"
0,0,0,1  "CRITICAL"
0,0,1,0  "CRITICAL"
0,1,0,1   "CRITICAL"
1,0,1,1   "CRITICAL"
0,1,1,1   "CRITICAL"
1,1,1,1   "OK"

furthermore, if I want to let trigger stay in CRITICAL until all point is 1 in 2*window times, is there any way to realize?

@nathanielc
Copy link
Contributor

@annProg Thanks for the detailed write-up!

If I understand correctly you want the alert to go CIRITICAL only if all points in the window are 0 and then to only recover to the OK state once all points are 1?

In the lastest 1.0 version there exist reset expressions that prevent lowering the alert state unless it is true.

For your example you can use these reset expressions in combination with taking the mean of the window to do what you want.

var win = 5s
var origin = stream
    |from()
        .database('test')
        .retentionPolicy('default')
        .measurement('ka')
        .groupBy('app')

origin
    |window()
        .period(win)
        .every(1s)
    |mean('code_match')
        .as('code_match')
    |alert()
        .id('HTTP_CODE:{{ index .Tags "app" }}')
        .message('')
        .crit(lambda: "code_match" == 0)
        // Alert cannot leave CRITICAL state until critReset is true, in other words all values in window must be equal to 1.
        .critReset(lambda: "code_match" == 1)
        .log('/tmp/alerts.log')
        // Notice .all is no longer needed since we aggregated the window into a single point.

Here are the docs on the reset expressions.

Hope that helps get you started.

@annProg
Copy link
Author

annProg commented Sep 4, 2016

@nathanielc I test the reset expressions in 1.0 version and with influxqlnode mean, and it works fine. But influxqlnode will discard fields:

with no influxqlnode, all colums exists

"data": {
    "series": [
      {
        "values": [
          [
            "2016-09-04T15:16:12.661354758Z",
            0,
            200
          ],
          [
            "2016-09-04T15:16:13.673172831Z",
            0,
            200
          ],
          [
            "2016-09-04T15:16:14.689260839Z",
            0,
            200
          ],
          [
            "2016-09-04T15:16:15.709873033Z",
            1,
            200
          ]
        ],
        "columns": [
          "time",
          "code_match",
          "http_code"
        ],
        "tags": {
          "app": "cmdb"
        },
        "name": "ka"
      }
    ]
  },

use influxqlnode, field http_code disapear

"data": {
    "series": [
      {
        "values": [
          [
            "2016-09-04T15:20:05.76924853Z",
            1
          ]
        ],
        "columns": [
          "time",
          "code_match"
        ],
        "tags": {
          "app": "cmdb"
        },
        "name": "ka"
      }
    ]
  },

I want to use http_code in alert email.

I also issue this in #864

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants