Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtool unittests fail with rate() & time() #4817

Closed
sbueringer opened this issue Nov 2, 2018 · 10 comments
Closed

promtool unittests fail with rate() & time() #4817

sbueringer opened this issue Nov 2, 2018 · 10 comments

Comments

@sbueringer
Copy link

@sbueringer sbueringer commented Nov 2, 2018

Bug Report

What did you do?

I executed promtool in a dir with the following 2 files:

promtool test rules *.test.yaml

github.rules

groups:
- name: thanos-compact
  rules:
  - alert: ThanosCompactBucketOperationsFailed
    expr: rate(thanos_objstore_bucket_operation_failures_total{app="thanos-compact"}[5m]) > 0.01
    labels:
      severity: warning
      tenant: "{{$labels.tenant}}"
    annotations:
      summary: Thanos Compact bucket operations are failing
      impact: Long term storage queries will be slower and Minio fills up
      action: Check {{ $labels.kubernetes_pod_name }} pod logs in {{ $labels.kubernetes_namespace}} namespace
  - alert: ThanosCompactNotRunIn24Hours
    expr: (time() - max(thanos_objstore_bucket_last_successful_upload_time{app="thanos-compact"}) ) /60/60 > 24
    labels:
      severity: warning
      tenant: "{{$labels.tenant}}"
    annotations:
      summary: Thanos Compaction has not been run in 24 hours
      impact: Long term storage queries will be slower and Minio fills up
      action: Check {{ $labels.kubernetes_pod_name }} pod logs in {{ $labels.kubernetes_namespace}} namespace

github.rules.test.yaml


rule_files:
- github.rules

tests:

- interval: 1m
  input_series:
  - series: 'thanos_objstore_bucket_operation_failures_total{app="thanos-compact",tenant="i13as",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '0+0x10 1+1x5'
  - series: 'thanos_objstore_bucket_last_successful_upload_time{app="thanos-compact",tenant="i13as",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '1531152794+0x20'

  alert_rule_test:
  - alertname: ThanosCompactBucketOperationsFailed
    eval_time: 11m
    exp_alerts:
      - exp_labels:
          severity: warning
          tenant: i13as
          app: thanos-compact
          kubernetes_namespace: monitoring
          kubernetes_pod_name: thanos-compact-0
        exp_annotations:
          action: "Check thanos-compact-0 pod logs in monitoring namespace"
          impact: "Long term storage queries will be slower and Minio fills up"
          summary: "Thanos Compact bucket operations are failing"
  - alertname: ThanosCompactNotRunIn24Hours
    eval_time: 10m
    exp_alerts:
      - exp_labels:
          severity: warning
          tenant: i13as
          app: thanos-compact
          kubernetes_namespace: monitoring
          kubernetes_pod_name: thanos-compact-0
        exp_annotations:
          action: "Check thanos-compact-0 pod logs in monitoring namespace"
          impact: "Long term storage queries will be slower and Minio fills up"
          summary: "Thanos Compact bucket operations are failing"

What did you expect to see?

I expected the unit test to succeed.

What did you see instead? Under which circumstances?

The unit test failed.

Environment

Linux Fedora with promtool 2.5.0-rc2

  • Logs:
Unit Testing:  github.rules.test.yaml
  FAILED:
    alertname:ThanosCompactNotRunIn24Hours, time:10m0s, 
        exp:"[Labels:{alertname=\"ThanosCompactNotRunIn24Hours\", app=\"thanos-compact\", kubernetes_namespace=\"monitoring\", kubernetes_pod_name=\"thanos-compact-0\", severity=\"warning\", tenant=\"i13as\"} Annotations:{action=\"Check thanos-compact-0 pod logs in monitoring namespace\", impact=\"Long term storage queries will be slower and Minio fills up\", summary=\"Thanos Compact bucket operations are failing\"}]", 
        got:"[]"
    alertname:ThanosCompactBucketOperationsFailed, time:11m0s, 
        exp:"[Labels:{alertname=\"ThanosCompactBucketOperationsFailed\", app=\"thanos-compact\", kubernetes_namespace=\"monitoring\", kubernetes_pod_name=\"thanos-compact-0\", severity=\"warning\", tenant=\"i13as\"} Annotations:{action=\"Check thanos-compact-0 pod logs in monitoring namespace\", impact=\"Long term storage queries will be slower and Minio fills up\", summary=\"Thanos Compact bucket operations are failing\"}]", 
        got:"[]"
@johannesfrey

This comment has been minimized.

Copy link

@johannesfrey johannesfrey commented Nov 5, 2018

Just to verify that rate is correctly evaluated in the unit tests I used following values:

  - series: 'thanos_objstore_bucket_operation_failures_total{app="thanos-compact",tenant="c01p005",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '0+1x10'

So the initial values seem to be wrong, in order to trigger the alert.

@sbueringer

This comment has been minimized.

Copy link
Author

@sbueringer sbueringer commented Nov 17, 2018

What kind of information is needed?

@sbueringer

This comment has been minimized.

Copy link
Author

@sbueringer sbueringer commented Mar 3, 2019

@simonpasquier I retested it with promtool 2.7.2. The first test (ThanosCompactBucketOperationsFailed) now works. The second one for the Alert ThanosCompactNotRunIn24Hours still doesn't work. I'm not sure what more information you need?

I expected the the time() function to resolve to some value during the unit test and then the rule should lead to an alert and a successful test.

@codesome

This comment has been minimized.

Copy link
Member

@codesome codesome commented Mar 3, 2019

I will take a look at it in the coming few days, not sure how I missed it.

@karlem

This comment has been minimized.

Copy link

@karlem karlem commented Jul 2, 2019

I have the same issue. time() function does not actually returns the actual time during test.
Is there any update on this issue?

@shric

This comment has been minimized.

Copy link

@shric shric commented Jul 17, 2019

time() works for me in tests. It appears to return eval_time seconds, which is kind of intuitive. For example, given eval_time: 10m time() returns 600.

@MalKeshar

This comment has been minimized.

Copy link

@MalKeshar MalKeshar commented Jul 24, 2019

Behavior of time() function in unit tests breaks test for such alert rules: https://www.robustperception.io/get-alerted-before-your-ssl-certificates-expire

@brian-brazil

This comment has been minimized.

Copy link
Member

@brian-brazil brian-brazil commented Jul 24, 2019

This has been open a while, and I'm not seeing any bug here. I'd suggest
taking to the prometheus-users mailing list.

@sbueringer

This comment has been minimized.

Copy link
Author

@sbueringer sbueringer commented Jul 24, 2019

Okay, so I assume the intended behavior is that time() returns the eval_time. Than it's indeed not a bug.

@MitchelNijdam-Rockstars

This comment has been minimized.

Copy link

@MitchelNijdam-Rockstars MitchelNijdam-Rockstars commented Oct 15, 2019

I noticed that day_of_week() returns 4 when doing tests, which confused me at first, but made sense since that's the day of start of epoch (Thursday 1 January 1970).

Since that wasn't mentioned here and I've been searching for it for a while, I thought it would be good to put it here as a reference for other people that might be stumbling on this. I'm also trying to get this in the Unit test documentation: prometheus/docs#1464.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.