Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

count() should result in 0 if no timeseries found #4982

Closed
sgeisbacher opened this issue Dec 9, 2018 · 3 comments
Closed

count() should result in 0 if no timeseries found #4982

sgeisbacher opened this issue Dec 9, 2018 · 3 comments

Comments

@sgeisbacher
Copy link

sgeisbacher commented Dec 9, 2018

Proposal

Use case. Why is this important?
for dynamic metrics, like node_exporters node_filesystem* where for each mountpoint that metrics increase, i use count() in alert-rules like here:

count(node_filesystem_avail_bytes{instance="nas.example.com",mountpoint=~"/srv/nas/(backups|files)"}) < 2

to get notified when one of them is not mounted anymore.
it works perfectly if one is missing as count() then returns 1 and the rule fires. but it does not fire if both are missing because than count() returns no data
the workaround is to additionally check with absent() but it's on the one hand annoying to double-check on each rule and on the other hand count should be able to "count" zero timeseries too:

count(node_filesystem_avail_bytes{instance="nas.example.com",mountpoint=~"/srv/nas/(backups|files)"}) < 2
 or absent(node_filesystem_avail_bytes{instance="nas.example.com",mountpoint=~"/srv/nas/(backups|files)"})

Bug Report

What did you do?
used count(...) in alert rules to get notified on missing metrics
What did you expect to see?
count(...) should return 0 if there is no result
What did you see instead? Under which circumstances?
count(...) returns no data if there are no metrics and so alert-rule does not fire (rule: count(...) < 2)

  • System information:
    Linux 4.14.70-v7+ armv7l
  • Prometheus version:
    prometheus 2.4.2
  • Alertmanager version:
    alertmanager, version 0.15.2
  • Alertmanager configuration file:
  - alert: important_nas_mounts_missing
    expr: count(node_filesystem_avail_bytes{instance="nas.example.com",mountpoint=~"/srv/nas/(backups|files)"}) < 2
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: IMPORTANT NAS MOUNTS MISSING
      description: 'backups or files mount on nas.example.com missing'
@brian-brazil
Copy link
Contributor

brian-brazil commented Dec 9, 2018 via email

@sgeisbacher
Copy link
Author

oh ok, thx but to be honest thats not very intuitive and every new count()-user will fall in this trap again i guess 😅
nevertheless i'm comparing result of count() not inside like in your example. i'm not filtering by operator but by regex on a label.
so i think bool does not work here. is there some other workarround?

@brian-brazil
Copy link
Contributor

brian-brazil commented Dec 9, 2018 via email

@lock lock bot locked and limited conversation to collaborators Nov 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants