Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtool doesn't always bubble up PromQL engine errors #5473

Open
wutianchen opened this Issue Apr 17, 2019 · 2 comments

Comments

Projects
None yet
2 participants
@wutianchen
Copy link

wutianchen commented Apr 17, 2019

I have a alert rule

- record: yace_cloudwatch_requests_1h_incremental
    expr: increase(yace_cloudwatch_requests_total[1h])

  - alert: prom_yace_cost_anomaly
    expr: yace_cloudwatch_requests_1h_incremental - quantile_over_time(0.5, yace_cloudwatch_requests_1h_incremental[7d]) > 0
    for: 10m
    labels:
      severity: warning
    annotations:
      value: '{{humanize $value}}'
      summary: yace_cloudwatch_requests_1h_incremental is deviated from normal value
      description: yace_cloudwatch_requests_1h_incremental is deviated from normal value

and a promtool unit test

rule_files:
  - /home/ec2-user/iot-infra-mon-prometheus/config/prometheus/rules/monitoring_services.yml
evaluation_interval: 1m
tests:
 - interval: 1m
   input_series:
    - series: 'yace_cloudwatch_requests_1h_incremental'
      values: '1+0x10080 100+0x100'

   alert_rule_test:
    - alertname: prom_yace_cost_anomaly
      eval_time: 10155m
      exp_alerts:
       - exp_labels:
           severity: warning
         exp_annotations:
           description: 'yace_cloudwatch_requests_1h_incremental is deviated from normal value'
           summary: 'yace_cloudwatch_requests_1h_incremental is deviated from normal value'
           value: 99

It doesn't file as expected.

But if I change the expression in alerts to: yace_cloudwatch_requests_1h_incremental . The alert fires in unit test with value 100, which means yace_cloudwatch_requests_1h_incremental = 100.

Similarly if I change the expression in alerts to: quantile_over_time(0.5, yace_cloudwatch_requests_1h_incremental[7d]) . It fires in unit test with value 1.

The above two are expected but the following:

yace_cloudwatch_requests_1h_incremental - quantile_over_time(0.5, yace_cloudwatch_requests_1h_incremental[7d]) fires with value 0 which means:

yace_cloudwatch_requests_1h_incremental == quantile_over_time(0.5, yace_cloudwatch_requests_1h_incremental[7d]) is true mathematically

Any explaination to this ? thx

@wutianchen wutianchen changed the title Promtool tells 1 = 100 Promtool tells me 1 = 100 Apr 17, 2019

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 18, 2019

There is a bug in promtool unit test: it swallows any error returned by the rule evaluation (in this particular case, the test creates too many samples causing the evaluation to fail with query processing would load too many samples into memory in query execution). I'm sending a PR for promtool to surface such errors. After that we can think more how/if limits can be increased.

In the meantime you can still reduce the number of samples like this:

 - interval: 2m
   input_series:
    - series: 'yace_cloudwatch_requests_1h_incremental'
      values: '1+0x5040 100+0x100'

@simonpasquier simonpasquier referenced a pull request that will close this issue Apr 18, 2019

Open

cmd/promtool: return errors from rule evaluations #5483

@simonpasquier simonpasquier changed the title Promtool tells me 1 = 100 promtool doesn't always bubble up PromQL engine errors Apr 19, 2019

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 19, 2019

I've edited the issue's title to reflect the current status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.