Specify desired step in Prometheus in dashboard panels #9705

zemek · 2017-10-27T21:30:06Z

Previously we used to be able to specify the step parameter in Prometheus. This was changed in #8073

It would be nice to be able to get this functionality back to be able to display brief spikes in graphs.

atonkyra · 2017-11-01T21:30:42Z

This would be pretty awesome to have back. Right now some fast moving gauges are impossible to render correctly.

The animation at https://gyazo.com/f83dc32078209a7e6a4a87efe5b4e81b illustrates the problem pretty well.

MarcMagnin · 2017-11-09T14:01:57Z

This would be quite nice to have it back. Any idea when this could be done if it is planned?

MarcMagnin · 2017-11-14T15:15:39Z

As a reference there is a discussion on prom github as well: prometheus/prometheus#2364
Basically without an enforced step on a metric which is not a counter (using the rate trick), there is no way to render properly a chart on each refresh when the time range is higher than 1h.

seeruk · 2017-11-30T15:59:24Z

I've also just left a comment on that issue explaining what I've found about the issue. Realistically I think this is something that Prometheus should handle, but it's something that Grafana may be able to deal with on it's own too - I'm still experimenting with it really.

Edit: After having experimented a little, Grafana could handle this if Prometheus doesn't, by ensuring that steps always contained the same points, i.e. if the step size is 15s, maybe each step should be one of these parts of a minute: 0-15s, 15-30s, 30-45s, or 45-60s. So, if now was 36 seconds into a minute, prometheus would be queried for 45s into that same minute (i.e. into the future). In this case, prometheus does actually seem to return the latest information for the last point which means that the last point will change, but once we get passed it it would always remain the same from then on.

This sort of behaves like pre-calculated buckets, in some cases the value in a bucket might be updated, in other cases it may be added to over time, but historic buckets will never change because their bounds wouldn't be moving targets like they are now.

thenayr · 2017-12-04T22:24:25Z

With this change in place I can't get metric resolution lower than 15s intervals (we actually use 10s resolution for all metrics).

I also have all of the issues mentioned above with spiky charts jumping around needlessly.

dprittie · 2017-12-05T15:39:28Z

I am also unable to get a metric resolution lower than 15s - we have some prometheus targets that scrape every 500 ms and we unable to properly observe this data which is a serious breaking change for us :(

if anyone has figured out a way to get lower than 15s please share!

torkelo · 2017-12-05T17:14:24Z

Have you set the min step/interval option on the datasource options page? You can also override it on per panel or query level

atonkyra · 2017-12-05T17:23:39Z

@torkelo the problem is the fact that it is min step. We can define the minimum step but not the maximum step which in turn causes us to end up overstepping high frequency polled data. (see my picture above for example, with that the min step was set to 1s on 10s interval polled data)

seeruk · 2017-12-05T17:37:02Z

One option to help with this issue is to use an *_over_time aggregation query, and set the time period to $__interval, this will scale to the step size. Unfortunately, it doesn't completely solve the problem.

torkelo · 2017-12-05T20:35:27Z

Not sure I understand, you want grafana to query for more data points than there are pixels in the graph? Say you set a max step your query will return too much data.

Prometheus have range selectors and functions that should allow you to get what you want, using the interval variable can help

atonkyra · 2017-12-05T20:56:56Z

@torkelo okay I'll try to explain the problem (as I understand it)

I have data with resolution of 1 value per 10 seconds. Now if I want to show all the data and want to see the spikes, what would be the logical step in a 5 minute graph? I'd say it's 10 seconds.

Okay, now lets set min step to 10s for the fun of it. Graph keeps changing completely every reload!? Okay let's examine what we send to Prometheus:

GET /grafana/api/datasources/proxy/1/api/v1/query_range?query=...&start=1512506717&end=1512507017&step=15

See the step 15? That basically means that we are stepping over existing values. Let's look at this on a timeline.

0s  10s  20s  30s  40s  50s  60s
+----+----+----+----+----+----+---- ...
0    1   100   0    1   100   0
^      ^       ^      ^       ^

+ = value interval
^ = where step happens to land at

Now looking at that we see that the spikes of 100 may be jumped over due to the misaligned step. Prometheus might report 1 or 100 on the 15 second marker depending on which side the step happens to land at. Now what happens if our data is at 1 second resolution? Well we miss 14 seconds of values and especially on very unstable gauges the graph looks completely different on each refresh. :)

I hope this explains the problem better.

atonkyra · 2017-12-05T21:07:27Z

Also please have a look at the image here. This is a 10 second interval dataset with min step set to 10 (or 1, grafana sends 15 regardless). The 2 graphs are from same data, but with a difference that I pressed refresh 3 seconds after taking the left screenshot.

I'd say there are enough pixels to show the other points as well... :P

bmildren · 2017-12-05T21:12:50Z

@atonkyra just to confirm, did you say you had already adjusted the scrape interval on the data source?

atonkyra · 2017-12-05T21:21:10Z

@bmildren okay, I didn't even know there was such setting, setting that to the minimal value of any scrape job you have will fix the problem.

I still think defaulting to anything hard-coded is utterly broken (on the lower bound). At the very least it should be configurable on the graph.

atonkyra · 2017-12-05T21:52:56Z

If we wanted some automation, one could use /api/v1/status/config API endpoint on Prometheus. The contents are in yaml so that would need parsing. Grafana could then perform a match on the job key and failing that, fallback to global scrape interval on the config.

torkelo · 2017-12-06T08:09:34Z

I still think defaulting to anything hard-coded is utterly broken (on the lower bound). At the very least it should be configurable on the graph.

Grafana defaults to the same default prometheus use. And nothing is hard coded as you can change it. However I can still see a problem here as Grafana should align step in even intervals of the min step / scrape interval option

bergquist · 2017-12-06T10:11:01Z

FYI that setting is only available in the night builds.

atonkyra · 2017-12-06T11:00:36Z

I'd say we still need a per-graph (or query?) "scrape interval" field and when that is absent default to the global on the datasource. Let's say I have a Prometheus instance with default scrape interval of 10s but I happen to have some data at 1s and some at 60s.

Single global default just doesn't make sense for many Prometheus installations IMHO.

This commit makes it possible to set min interval per panel. Overrides the value configured on the datasource. ref #9705

bergquist · 2017-12-06T14:32:48Z

It's now possible to set min interval per panel in the nightly build

zemek · 2017-12-08T17:10:38Z

FWIW my original issue is actually solved by using max_over_time(my_metric{}[$__interval])

Although you can't take the max_over_time() of a rate() unless you make a recording rule first, which is slightly annoying

matejzero · 2017-12-14T17:40:31Z

This is a showstopper for us. We are trying to migrate to Prometheus from Graphite and not having consistent graphs doesn't work for us:)

It's really hard to debug an issue on the system when graphs are constantly changing.

Will follow this thread and provide info if needed.

thenayr · 2017-12-14T19:06:26Z

I believe Grafana 4.6.3 release today addresses this 8a16163

matejzero · 2017-12-14T20:54:40Z

I upgraded to 4.6.3 but it doesn't seem to fix my issue.

I have set scrape interval in data source to 10s and this is what I get:

This are 2 graphs with reload time of 1 minute.

bmildren · 2017-12-14T21:13:20Z

In this case, isn't that just an artifact of irate? irate is based on the last two data points in the range vector ( https://prometheus.io/docs/prometheus/latest/querying/functions/#irate() ), here you're looking at the graph 1 minute later so the last two data points in each of the 5m ranges are going to different no matter what you set your scrape interval to. 🤔

matejzero · 2017-12-14T21:43:50Z

Could be, now that I think of it (I'm a total Prometheus noob). Should using rate() function solve the problem, because I see the same problem with rate().

atonkyra · 2017-12-14T22:18:32Z

@matejzero does resolution 1/1 help at all?

I think we still have a problem with this which I believe is that the dashboard refreshes aren't aligned to the step (which I believe @torkelo mentioned earlier), example:

scrape interval 10s, min step 15s

S = scraped value
! = step

                   S-----S-----S-----S-----S-----S ...
initial            !        !        !        !    ...
reload after 5s       !        !        !        ! ...
reload after 10s         !        !        !       ...

So basically when we have uneven scrape/step intervals we ultimately have situation where step causes us to hit completely different values on each refresh.

free · 2018-06-08T08:00:45Z

FYI, I've just made a Prometheus 2.3.0 + xrate release, at https://github.com/free/prometheus/releases/tag/xrate_v2.3.0

Prometheus 2.3.0 has significantly improved the performance of range queries (which is where Grafana and xrate come together), so you may want to give it a whirl.

matejzero · 2018-06-08T08:18:39Z

Great! I'm already running it on our Prometheus and so far it looks good:)

zemek · 2018-07-05T23:00:00Z

is this effectively resolved in grafana 5.2 with #10434 ?

(my original request was resolved with using max_over_time() but it seemed like this discussion changed into how rate/irate graphs end up moving around a lot)

gjcarneiro · 2020-01-25T11:51:27Z

I have similar problems due to step. My metric is something like:

sum(max_over_time(pricefeed_num_clients}[15m]))

Now, if tick the Instant checkbox I get a URL that includes time=xxxx, and that's it.

If I untick the Instant checkbox, I get a URL that includes start=xxxx&end=yyy&step=300. Due to the step=300, I actually get a max_over_time, in a period of a few hours, that can be actually lower than the max_over_time I get with the Instant mode. Which is mathematically impossible.

I just want to get rid of the step=300 because it's screwing up with the calculation. I tried subquery syntax, sum(max_over_time(pricefeed_num_clients}[15m:1m])), but it makes no difference.

zekth · 2020-01-27T16:09:01Z

Still no solution for this?

davkal · 2020-05-10T19:59:08Z

If you want to set a fixed step, you can do this in Explore. But for dashboard panels we have not settled on a solution yet.

sksingh20 · 2021-05-27T13:28:37Z

Does this mean Grafana is not right tool for historical data view as this represent data wrongly and will completely mislead analysis and reporting?

Changing step value on fly is changing full data view resulting to wrong reporting... Please confirm so that same will be communicated to CXO forums.

leoluk · 2021-05-27T19:10:02Z

Changing step value on fly is changing full data view resulting to wrong reporting... Please confirm so that same will be communicated to CXO forums.

Prometheus is fine for analytics, but as with any data source, it's important to understand its properties/limitations. For example, you'll want to use an aggregation function like sum_over_time if you need accurate reporting.

The problem discussed here is inherent to any time-series DB that uses sampling, it just happens to be very visible.

sksingh20 · 2021-05-27T20:12:32Z

@leoluk I think you mistaken to read problem condition. This appears to me that query is built at grafana level dynamically and not prometheus level. So
(1) How this is limitation at data source?
(2) Grafana is adding step value in query to reduce data points, which eventually loosing key data value itself!!
(3) can you help with a sample dashboard to find out all power outage in last 7 days or any longer duration on a dashboard?

leoluk · 2021-05-27T21:03:24Z

This sounds like a question for the community or commercial support through a company like Robust Perception.

When you query a data source to build a graph, you need either sampling (i.e. only looking at every n-th data point) or aggregation (min/max/quantiles) to reduce the number of data points to something that can be displayed in a graph. Grafana instructs Prometheus to sample data - via the step parameter - depending on the zoom level to render data at an appropriate resolution. If you zoom out, resolution will go down. There's a hardcoded limit on how many data points Prometheus will return in a single query.

If your use case is finding, say, power outages of 5m in a 7-day graph, then no, that won't work (with any DB).

This issue is about noisy graphs being unstable since a different set of data points are sampled each time.

sksingh20 · 2021-05-28T05:13:56Z

@leoluk This makes limitation on gauge data type on grafana. Gauge is actually to show all data points. I see sole objective of dashboard to show actual guage values.
Even if limitation of grafana technical approach, business case is quite clear. If something can't be done using grafana then its just matter to confirm and communicate.

Can we summarize that Grafana is having limitation to show gauge value properly for data older than 12 hours? and there is high probability that this will show error.

leoluk · 2021-05-28T13:20:01Z

Your monitor wouldn't even have enough pixels to show all data points, depending on the resolution. This is not a Grafana limitation, it's a fundamental property of working with time-series data.

aocenas · 2021-06-16T08:57:32Z

I guess we could do something like a dropdown select with (max|exact|min)[step] so that user can decide how the step param should be evaluated.

torkelo added type/feature-request datasource/Prometheus labels Oct 28, 2017

bergquist changed the title ~~[Feature request] Specify desired step in Prometheus~~ Specify desired step in Prometheus Oct 31, 2017

bergquist added a commit that referenced this issue Dec 6, 2017

prom: enable min interval per panel

48d9d0d

This commit makes it possible to set min interval per panel. Overrides the value configured on the datasource. ref #9705

EmilioD mentioned this issue May 25, 2018

Step interval not being used when set #12074

Closed

marefr added the area/datasource label Mar 30, 2019

aocenas changed the title ~~Specify desired step in Prometheus~~ Specify desired step in Prometheus in dashboard panels Jul 1, 2020

aocenas added this to Backlog features in Observability (deprecated, use Observability Squad) Jul 1, 2020

Elfo404 mentioned this issue May 27, 2021

fix method of changing interval parameter value for queries more than 12 hours duration #34775

Closed

aocenas added prio/medium Important over the long term, but may not be staffed and/or may need multiple releases to complete. onboarding effort/small help wanted and removed help wanted labels Jun 16, 2021

aocenas assigned olbo98 Jun 16, 2021

olbo98 mentioned this issue Jun 24, 2021

Prometheus: add functionality to specify desired step in dashboards panels #36125

Closed

olbo98 mentioned this issue Jul 5, 2021

Prometheus: add functionality to specify desired step interval in dashboards panels #36422

Merged

olbo98 closed this as completed in #36422 Jul 28, 2021

Observability (deprecated, use Observability Squad) automation moved this from Backlog features to Done Jul 28, 2021

HashemTaheriSonos mentioned this issue Nov 16, 2021

Inconsistency between graph (stack) visualisation #41773

Closed

Sin4wd mentioned this issue Jun 19, 2023

Can't have a fixed step of 15s for big time ranges #60993

Closed

Specify desired step in Prometheus in dashboard panels #9705

Specify desired step in Prometheus in dashboard panels #9705

Comments

zemek commented Oct 27, 2017 • edited Loading

atonkyra commented Nov 1, 2017

MarcMagnin commented Nov 9, 2017

MarcMagnin commented Nov 14, 2017 • edited Loading

seeruk commented Nov 30, 2017 • edited Loading

thenayr commented Dec 4, 2017

dprittie commented Dec 5, 2017

torkelo commented Dec 5, 2017

atonkyra commented Dec 5, 2017

seeruk commented Dec 5, 2017

torkelo commented Dec 5, 2017

atonkyra commented Dec 5, 2017 • edited Loading

atonkyra commented Dec 5, 2017

bmildren commented Dec 5, 2017

atonkyra commented Dec 5, 2017 • edited Loading

atonkyra commented Dec 5, 2017

torkelo commented Dec 6, 2017

bergquist commented Dec 6, 2017

atonkyra commented Dec 6, 2017

bergquist commented Dec 6, 2017

zemek commented Dec 8, 2017 • edited Loading

matejzero commented Dec 14, 2017

thenayr commented Dec 14, 2017

matejzero commented Dec 14, 2017 • edited Loading

bmildren commented Dec 14, 2017

matejzero commented Dec 14, 2017

atonkyra commented Dec 14, 2017 • edited Loading

free commented Jun 8, 2018

matejzero commented Jun 8, 2018

zemek commented Jul 5, 2018 • edited Loading

gjcarneiro commented Jan 25, 2020

zekth commented Jan 27, 2020

davkal commented May 10, 2020

sksingh20 commented May 27, 2021

leoluk commented May 27, 2021 • edited Loading

sksingh20 commented May 27, 2021

leoluk commented May 27, 2021 • edited Loading

sksingh20 commented May 28, 2021

leoluk commented May 28, 2021

aocenas commented Jun 16, 2021

zemek commented Oct 27, 2017 •

edited

Loading

MarcMagnin commented Nov 14, 2017 •

edited

Loading

seeruk commented Nov 30, 2017 •

edited

Loading

atonkyra commented Dec 5, 2017 •

edited

Loading

atonkyra commented Dec 5, 2017 •

edited

Loading

zemek commented Dec 8, 2017 •

edited

Loading

matejzero commented Dec 14, 2017 •

edited

Loading

atonkyra commented Dec 14, 2017 •

edited

Loading

zemek commented Jul 5, 2018 •

edited

Loading

leoluk commented May 27, 2021 •

edited

Loading

leoluk commented May 27, 2021 •

edited

Loading