Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtool unittests fail with rate() & time() #4817

Open
sbueringer opened this Issue Nov 2, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@sbueringer
Copy link

sbueringer commented Nov 2, 2018

Bug Report

What did you do?

I executed promtool in a dir with the following 2 files:

promtool test rules *.test.yaml

github.rules

groups:
- name: thanos-compact
  rules:
  - alert: ThanosCompactBucketOperationsFailed
    expr: rate(thanos_objstore_bucket_operation_failures_total{app="thanos-compact"}[5m]) > 0.01
    labels:
      severity: warning
      tenant: "{{$labels.tenant}}"
    annotations:
      summary: Thanos Compact bucket operations are failing
      impact: Long term storage queries will be slower and Minio fills up
      action: Check {{ $labels.kubernetes_pod_name }} pod logs in {{ $labels.kubernetes_namespace}} namespace
  - alert: ThanosCompactNotRunIn24Hours
    expr: (time() - max(thanos_objstore_bucket_last_successful_upload_time{app="thanos-compact"}) ) /60/60 > 24
    labels:
      severity: warning
      tenant: "{{$labels.tenant}}"
    annotations:
      summary: Thanos Compaction has not been run in 24 hours
      impact: Long term storage queries will be slower and Minio fills up
      action: Check {{ $labels.kubernetes_pod_name }} pod logs in {{ $labels.kubernetes_namespace}} namespace

github.rules.test.yaml


rule_files:
- github.rules

tests:

- interval: 1m
  input_series:
  - series: 'thanos_objstore_bucket_operation_failures_total{app="thanos-compact",tenant="i13as",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '0+0x10 1+1x5'
  - series: 'thanos_objstore_bucket_last_successful_upload_time{app="thanos-compact",tenant="i13as",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '1531152794+0x20'

  alert_rule_test:
  - alertname: ThanosCompactBucketOperationsFailed
    eval_time: 11m
    exp_alerts:
      - exp_labels:
          severity: warning
          tenant: i13as
          app: thanos-compact
          kubernetes_namespace: monitoring
          kubernetes_pod_name: thanos-compact-0
        exp_annotations:
          action: "Check thanos-compact-0 pod logs in monitoring namespace"
          impact: "Long term storage queries will be slower and Minio fills up"
          summary: "Thanos Compact bucket operations are failing"
  - alertname: ThanosCompactNotRunIn24Hours
    eval_time: 10m
    exp_alerts:
      - exp_labels:
          severity: warning
          tenant: i13as
          app: thanos-compact
          kubernetes_namespace: monitoring
          kubernetes_pod_name: thanos-compact-0
        exp_annotations:
          action: "Check thanos-compact-0 pod logs in monitoring namespace"
          impact: "Long term storage queries will be slower and Minio fills up"
          summary: "Thanos Compact bucket operations are failing"

What did you expect to see?

I expected the unit test to succeed.

What did you see instead? Under which circumstances?

The unit test failed.

Environment

Linux Fedora with promtool 2.5.0-rc2

  • Logs:
Unit Testing:  github.rules.test.yaml
  FAILED:
    alertname:ThanosCompactNotRunIn24Hours, time:10m0s, 
        exp:"[Labels:{alertname=\"ThanosCompactNotRunIn24Hours\", app=\"thanos-compact\", kubernetes_namespace=\"monitoring\", kubernetes_pod_name=\"thanos-compact-0\", severity=\"warning\", tenant=\"i13as\"} Annotations:{action=\"Check thanos-compact-0 pod logs in monitoring namespace\", impact=\"Long term storage queries will be slower and Minio fills up\", summary=\"Thanos Compact bucket operations are failing\"}]", 
        got:"[]"
    alertname:ThanosCompactBucketOperationsFailed, time:11m0s, 
        exp:"[Labels:{alertname=\"ThanosCompactBucketOperationsFailed\", app=\"thanos-compact\", kubernetes_namespace=\"monitoring\", kubernetes_pod_name=\"thanos-compact-0\", severity=\"warning\", tenant=\"i13as\"} Annotations:{action=\"Check thanos-compact-0 pod logs in monitoring namespace\", impact=\"Long term storage queries will be slower and Minio fills up\", summary=\"Thanos Compact bucket operations are failing\"}]", 
        got:"[]"
@johannesfrey

This comment has been minimized.

Copy link

johannesfrey commented Nov 5, 2018

Just to verify that rate is correctly evaluated in the unit tests I used following values:

  - series: 'thanos_objstore_bucket_operation_failures_total{app="thanos-compact",tenant="c01p005",kubernetes_namespace="monitoring",kubernetes_pod_name="thanos-compact-0"}'
    values: '0+1x10'

So the initial values seem to be wrong, in order to trigger the alert.

@sbueringer

This comment has been minimized.

Copy link
Author

sbueringer commented Nov 17, 2018

What kind of information is needed?

@sbueringer

This comment has been minimized.

Copy link
Author

sbueringer commented Mar 3, 2019

@simonpasquier I retested it with promtool 2.7.2. The first test (ThanosCompactBucketOperationsFailed) now works. The second one for the Alert ThanosCompactNotRunIn24Hours still doesn't work. I'm not sure what more information you need?

I expected the the time() function to resolve to some value during the unit test and then the rule should lead to an alert and a successful test.

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Mar 3, 2019

I will take a look at it in the coming few days, not sure how I missed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.