Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain step parameter in query api #2564

Closed
bobrik opened this Issue Apr 4, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@bobrik
Copy link

bobrik commented Apr 4, 2017

What did you do?

It seems that the finer the step for the rate query, the more incorrect results are. For 1m scraping interval and 2m rate function it looks like the following:

  • 1m resolution:

image

  • 5s resolution:

image

What did you expect to see?

For the rate over 2m I expect to have real resolution to be 1m. I also expect area under graph to represent the volume (CPU time in my case). With resolution finer than 1m I get skewed results.

What did you see instead? Under which circumstances?

I saw squared graphs. I looked at documentation for explanations, but all I saw was:

image

While I understand that dots connected via straight lines are not necessarily what happens in real world, straight jumps do not represent that either. I think the former is closer to reality.

At the very minimum current behavior should be explained.

It'd also be nice to have an ability to get datapoints at native resolution of raw datapoints. If we scrape every 10s, then it means datapoints every 10s. If we take 2m rate, then it means datapoints 1m apart. If it was possible to set maximum step, then in Grafana users would be able to get automatic step based on their timespan and graph width with minimal step equal to the native prometheus resolution. Example for 60s scraping interval:

  • Grafana asks for min-step=30s -> 60s step in response.
  • Grafana asks for min-step=60s -> 60s step in response.
  • Grafana asks for min-step=90s -> 90s step in response.

Hopefully this makes sense.

It's totally possible that I speak nonsense, but at least the next person will be able to find some history.

Environment

  • Prometheus version:
prometheus, version 1.5.2 (branch: master, revision: 16a512fe91926e2e3d0b1d2da6e7e16ceeab5f02)
  build user:       bamboo@53b1348075ea
  build date:       20170328-13:52:43
  go version:       go1.8
@bobrik

This comment has been minimized.

Copy link
Author

bobrik commented Apr 4, 2017

It's also worth mentioning that the finer step is, the more latency you get:

  • 225ms response time for 5s step
  • 40ms response time for 1m step

This is over loopback.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 4, 2017

The output in both graphs is "correct". Let me explain. You scrape data at 1m intervals. You use the function irate(), which only looks back at the last two samples under a sliding window. The delta between those last two samples determines the function's output. That delta stays the same for 1 minute though, as you are scraping only once per minute, and then once you get a new datapoint and the sliding window slides over that, you will get an updated (jumpy) rate.

The thing here is that you are looking at the result of an irate() expression at a resolution (5s) that is much finer than the scraped data. If you want to smoothe the results over more than two collected datapoints, use the rate() function instead (with an appropriately large, e.g. [5m], time window).

It's also worth mentioning that the finer step is, the more latency you get:

Yes, the finer the resolution that you ask Prometheus to assemble for you, the more work it has to do. At a query resolution of 5s, Prometheus will have to run 12 times more instant evaluations vs. at a resolution of 1 minute. Even if the underlying data is only at a resolution of 1m.

Closing as this is behaving as expected.

@juliusv juliusv closed this Apr 4, 2017

@bobrik

This comment has been minimized.

Copy link
Author

bobrik commented Apr 5, 2017

Yes, the finer the resolution that you ask Prometheus to assemble for you, the more work it has to do. At a query resolution of 5s, Prometheus will have to run 12 times more instant evaluations vs. at a resolution of 1 minute. Even if the underlying data is only at a resolution of 1m.

Prometheus doesn't have more datapoints to evaluate when I ask for resolution that is finer than scraping interval. It shouldn't be more work to just fill the gaps between datapoints after evaluation.

You also skipped the part where I asked about native resolution.

@redbaron

This comment has been minimized.

Copy link
Contributor

redbaron commented Apr 5, 2017

The thing is, scraping interval is best effort, nothing guarantees that datapoint will be available at that “planned” moment. Therefore Prometheus have to go point by point and assemble them in sliding windows

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 5, 2017

@bobrik Yeah, that's maybe a bit confusing. Generally, samples from different series can be at arbitrary intervals and not time-aligned with each other, etc., but you still need to be able to select multiple series and aggregate over them, so they need to be artificially aligned. The way PromQL does this is by having an independent evaluation interval (the resolution step that you chose) for the query as a whole, independent of the underlying details of the data. Then at every step (in your case, every 5s), the PromQL expression is executed. At every resolution timestep, for every series the PromQL expression references, the last sample value is chosen (if it is not older than the staleness period of 5 minutes). If you have steps of 5s, but you only get new data in every minute, then you will still evaluate the PromQL expression every 5s, but only see an actually new value every minute.

So there is no "native resolution", since in general all the underlying timestamps of the different involved series don't match up or can even be irregular, but they have to be made to match up (for graphing, but also for aggregatings, etc.).

This is not a Prometheus issue anymore, but a question. If you have further questions, please take them to our community channels like the mailing lists or IRC: https://prometheus.io/community/

@bobrik

This comment has been minimized.

Copy link
Author

bobrik commented Apr 7, 2017

OpenTSDB can evaluate aggregations without converting timeseries into a single independent evaluation interval, but it's fair enough that Prometheus chose to do things differently. It'd just be nice to have it in the docs instead of issue comments.

I opened an issue for Grafana, as it seems to be the right place to pick the step value:

Thanks for clearing this up for me.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 7, 2017

@bobrik Good point, we should explain this somewhere at least. I filed prometheus/docs#699 for better docs.

Do you happen to know more about how the OpenTSDB approach to aggregating unaligned series works?

bobrik added a commit to bobrik/grafana that referenced this issue Apr 13, 2017

Change prometheus semantics from step to min step
Previously `Step` parameter would set a hard value for any zoom level.

Now it's renamed to `Min step` and sets the minimal value of `step` parameter
to Prometheus query. User would usually want to set it to the scraping interval
of the target metric to avoid having shap cliffs on graphs and extra load
on Prometheus. Actual `step` value is calculated as the minimum of automatically
selected step (based on zoom level) and user provided minimal step. If user
did not provide the step, then automatic value is used as is.

Example bahavior for `60s` scrape intervals:

* `5s` automatic interval, no user specified min step:
  * Before: `step=5`
  * After: `step=5`
* `5s` automatic interval, `1m` user specified min step:
  * Before: `step=5`
  * After: `step=60`
* `5m` automatic interval, `1m` user specified min step:
  * Before: `step=60` (not really visible, too dense)
  * After: `step=300` (automatic value is picked)

See:

* grafana#8065
* prometheus/prometheus#2564

torkelo added a commit to grafana/grafana that referenced this issue Apr 14, 2017

Change prometheus semantics from step to min step (#8073)
Previously `Step` parameter would set a hard value for any zoom level.

Now it's renamed to `Min step` and sets the minimal value of `step` parameter
to Prometheus query. User would usually want to set it to the scraping interval
of the target metric to avoid having shap cliffs on graphs and extra load
on Prometheus. Actual `step` value is calculated as the minimum of automatically
selected step (based on zoom level) and user provided minimal step. If user
did not provide the step, then automatic value is used as is.

Example bahavior for `60s` scrape intervals:

* `5s` automatic interval, no user specified min step:
  * Before: `step=5`
  * After: `step=5`
* `5s` automatic interval, `1m` user specified min step:
  * Before: `step=5`
  * After: `step=60`
* `5m` automatic interval, `1m` user specified min step:
  * Before: `step=60` (not really visible, too dense)
  * After: `step=300` (automatic value is picked)

See:

* #8065
* prometheus/prometheus#2564
@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.