Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`promtool test rules` fails for alerts composed of recorded metrics with heavy input_series #4838

Closed
bititanb opened this Issue Nov 8, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@bititanb
Copy link

bititanb commented Nov 8, 2018

Bug Report

What did you do?

Executing promtool test rules for alerts composed of recorded (record rules) metrics that use timeseries with many values

What did you expect to see?

  SUCCESS

What did you see instead? Under which circumstances?

I get:

  FAILED:
    alertname:InstanceDown, time:10m0s,
        exp:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"page\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]",
        got:"[]"

When:

alerts.yml:

  - record: test:testrecord:avg3m
    expr: avg_over_time(up[3m])

  - alert: InstanceDown
    expr: test:testrecord:avg3m == 0
test.yml:

      input_series:
          - series: 'up{job="prometheus", instance="localhost:9090"}'
            values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 500+0x900'

Environment

  • System information:

Linux 4.15.0-36-generic x86_64

  • promtool version:
promtool, version 2.5.0 (branch: master, revision: bda9781ccda2b9c6051f30eb5f00e66059fe640a)
  build user:       root@prometheus
  build date:       20181107-09:52:29
  go version:       go1.10.1
  • promtool configuration file:

I got your configs from docs (https://github.com/prometheus/prometheus/blob/master/docs/configuration/unit_testing_rules.md) and modified them to reproduce the issue, see last revision:
https://gist.github.com/bititanb/017517528069ae1d0d70502bf95f6e26/revisions?diff=unified

Might help:

I found out that it fails here, because of minValidTime becomes too high, so metric got discarded. This is the state when ErrOutOfBounds throws (values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 500+0x900'):

(dlv) print lset[0].Value
"test:testrecord:avg3m"
(dlv) print t
0
(dlv) print a
*github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.headAppender {
        head: *github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.Head {
                chunkRange: 86400000,
                minTime: 0,
                maxTime: 54900000,
                ...}
        minValidTime: 11700000,
        mint: 9223372036854775807,
        maxt: -9223372036854775808,
        ...}

And this is when it works fine (values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'):

(dlv) print lset[0].Value
"test:testrecord:avg3m"
(dlv) print t
0
(dlv) print a
*github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.headAppender {
        head: *github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.Head {
                chunkRange: 86400000,
                minTime: 0,
                maxTime: 840000,
                ...}
        minValidTime: -42360000,
        mint: 9223372036854775807,
        maxt: -9223372036854775808,
        ...}

Thanks.
CC @codesome

@bititanb bititanb changed the title `promtool test rules` for alerts composed of recorded metrics with timeseries with many vals fails `promtool test rules` fails for alerts composed of recorded metrics with heavy input_series Nov 8, 2018

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Nov 9, 2018

@bititanb Thanks for reporting this, I will have a look.

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Nov 9, 2018

After digging a bit I found that the reason is not minValidTime. The number that you saw is normal, also maxTime: 54900000 means there are 916 samples.

But, I haven't found the actual issue. Still checking.

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Nov 9, 2018

Sorry, my bad, you are right. Recording rules are evaluated after minValidTime is already incremented.

@bititanb

This comment has been minimized.

Copy link
Author

bititanb commented Nov 12, 2018

Checked #4851. Works fine for me even with much more complex expressions. Thank you, this really helped me!
Should I close?

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Nov 12, 2018

Don't close till that PR is not merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.