Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtool unittests fail with rate() & time() #4817

Closed
sbueringer opened this issue Nov 2, 2018 · 10 comments
Closed

promtool unittests fail with rate() & time() #4817

sbueringer opened this issue Nov 2, 2018 · 10 comments

Comments

@sbueringer
Copy link
Contributor

sbueringer commented Nov 2, 2018

Bug Report

What did you do?

I executed promtool in a dir with the following 2 files:

promtool test rules *.test.yaml

github.rules

groups:
- name: thanos-compact
  rules:
  - alert: ThanosCompactBucketOperationsFailed
    expr: rate(thanos_objstore_bucket_operation_failures_total{app="thanos-compact"}[5m]) > 0.01
    labels:
      severity: warning
      tenant: "{{$labels.tenant}}"
    annotations:
      summary: Thanos Compact bucket operations are failing
      impact: Long term storage queries will be slower and Minio fills up
      action: Check {{ $labels.kubernetes_pod_name }} pod logs in {{ $labels.kubernetes_namespace}} namespace
  - alert: ThanosCompactNotRunIn24Hours
    expr: (time() - max(thanos_objstore_bucket_last_successful_upload_time{app="thanos-compact"}) ) /60/60 > 24
    labels:
      severity: warning
      tenant: "{{$labels.tenant}}"
    annotations:
      summary: Thanos Compaction has not been run in 24 hours
      impact: Long term storage queries will be slower and Minio fills up
      action: Check {{ $labels.kubernetes_pod_name }} pod logs in {{ $labels.kubernetes_namespace}} namespace

github.rules.test.yaml


rule_files:
- github.rules

tests:

- interval: 1m
  input_series:
  - series: 'thanos_objstore_bucket_operation_failures_total{app="thanos-compact",tenant="i13as",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '0+0x10 1+1x5'
  - series: 'thanos_objstore_bucket_last_successful_upload_time{app="thanos-compact",tenant="i13as",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '1531152794+0x20'

  alert_rule_test:
  - alertname: ThanosCompactBucketOperationsFailed
    eval_time: 11m
    exp_alerts:
      - exp_labels:
          severity: warning
          tenant: i13as
          app: thanos-compact
          kubernetes_namespace: monitoring
          kubernetes_pod_name: thanos-compact-0
        exp_annotations:
          action: "Check thanos-compact-0 pod logs in monitoring namespace"
          impact: "Long term storage queries will be slower and Minio fills up"
          summary: "Thanos Compact bucket operations are failing"
  - alertname: ThanosCompactNotRunIn24Hours
    eval_time: 10m
    exp_alerts:
      - exp_labels:
          severity: warning
          tenant: i13as
          app: thanos-compact
          kubernetes_namespace: monitoring
          kubernetes_pod_name: thanos-compact-0
        exp_annotations:
          action: "Check thanos-compact-0 pod logs in monitoring namespace"
          impact: "Long term storage queries will be slower and Minio fills up"
          summary: "Thanos Compact bucket operations are failing"

What did you expect to see?

I expected the unit test to succeed.

What did you see instead? Under which circumstances?

The unit test failed.

Environment

Linux Fedora with promtool 2.5.0-rc2

  • Logs:
Unit Testing:  github.rules.test.yaml
  FAILED:
    alertname:ThanosCompactNotRunIn24Hours, time:10m0s, 
        exp:"[Labels:{alertname=\"ThanosCompactNotRunIn24Hours\", app=\"thanos-compact\", kubernetes_namespace=\"monitoring\", kubernetes_pod_name=\"thanos-compact-0\", severity=\"warning\", tenant=\"i13as\"} Annotations:{action=\"Check thanos-compact-0 pod logs in monitoring namespace\", impact=\"Long term storage queries will be slower and Minio fills up\", summary=\"Thanos Compact bucket operations are failing\"}]", 
        got:"[]"
    alertname:ThanosCompactBucketOperationsFailed, time:11m0s, 
        exp:"[Labels:{alertname=\"ThanosCompactBucketOperationsFailed\", app=\"thanos-compact\", kubernetes_namespace=\"monitoring\", kubernetes_pod_name=\"thanos-compact-0\", severity=\"warning\", tenant=\"i13as\"} Annotations:{action=\"Check thanos-compact-0 pod logs in monitoring namespace\", impact=\"Long term storage queries will be slower and Minio fills up\", summary=\"Thanos Compact bucket operations are failing\"}]", 
        got:"[]"
@johannesfrey
Copy link

Just to verify that rate is correctly evaluated in the unit tests I used following values:

  - series: 'thanos_objstore_bucket_operation_failures_total{app="thanos-compact",tenant="c01p005",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '0+1x10'

So the initial values seem to be wrong, in order to trigger the alert.

@sbueringer
Copy link
Contributor Author

What kind of information is needed?

@sbueringer
Copy link
Contributor Author

@simonpasquier I retested it with promtool 2.7.2. The first test (ThanosCompactBucketOperationsFailed) now works. The second one for the Alert ThanosCompactNotRunIn24Hours still doesn't work. I'm not sure what more information you need?

I expected the the time() function to resolve to some value during the unit test and then the rule should lead to an alert and a successful test.

@codesome
Copy link
Member

codesome commented Mar 3, 2019

I will take a look at it in the coming few days, not sure how I missed it.

@karlem
Copy link

karlem commented Jul 2, 2019

I have the same issue. time() function does not actually returns the actual time during test.
Is there any update on this issue?

@shric
Copy link

shric commented Jul 17, 2019

time() works for me in tests. It appears to return eval_time seconds, which is kind of intuitive. For example, given eval_time: 10m time() returns 600.

@MalKeshar
Copy link

Behavior of time() function in unit tests breaks test for such alert rules: https://www.robustperception.io/get-alerted-before-your-ssl-certificates-expire

@brian-brazil
Copy link
Contributor

This has been open a while, and I'm not seeing any bug here. I'd suggest
taking to the prometheus-users mailing list.

@sbueringer
Copy link
Contributor Author

Okay, so I assume the intended behavior is that time() returns the eval_time. Than it's indeed not a bug.

@MitchelNijdam-Rockstars

I noticed that day_of_week() returns 4 when doing tests, which confused me at first, but made sense since that's the day of start of epoch (Thursday 1 January 1970).

Since that wasn't mentioned here and I've been searching for it for a while, I thought it would be good to put it here as a reference for other people that might be stumbling on this. I'm also trying to get this in the Unit test documentation: prometheus/docs#1464.

@lock lock bot locked and limited conversation to collaborators Apr 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants