-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify desired step in Prometheus in dashboard panels #9705
Comments
This would be pretty awesome to have back. Right now some fast moving gauges are impossible to render correctly. The animation at https://gyazo.com/f83dc32078209a7e6a4a87efe5b4e81b illustrates the problem pretty well. |
This would be quite nice to have it back. Any idea when this could be done if it is planned? |
As a reference there is a discussion on prom github as well: prometheus/prometheus#2364 |
I've also just left a comment on that issue explaining what I've found about the issue. Realistically I think this is something that Prometheus should handle, but it's something that Grafana may be able to deal with on it's own too - I'm still experimenting with it really. Edit: After having experimented a little, Grafana could handle this if Prometheus doesn't, by ensuring that steps always contained the same points, i.e. if the step size is 15s, maybe each step should be one of these parts of a minute: 0-15s, 15-30s, 30-45s, or 45-60s. So, if now was 36 seconds into a minute, prometheus would be queried for 45s into that same minute (i.e. into the future). In this case, prometheus does actually seem to return the latest information for the last point which means that the last point will change, but once we get passed it it would always remain the same from then on. This sort of behaves like pre-calculated buckets, in some cases the value in a bucket might be updated, in other cases it may be added to over time, but historic buckets will never change because their bounds wouldn't be moving targets like they are now. |
With this change in place I can't get metric resolution lower than 15s intervals (we actually use 10s resolution for all metrics). I also have all of the issues mentioned above with spiky charts jumping around needlessly. |
I am also unable to get a metric resolution lower than 15s - we have some prometheus targets that scrape every 500 ms and we unable to properly observe this data which is a serious breaking change for us :( if anyone has figured out a way to get lower than 15s please share! |
Have you set the min step/interval option on the datasource options page? You can also override it on per panel or query level |
@torkelo the problem is the fact that it is min step. We can define the minimum step but not the maximum step which in turn causes us to end up overstepping high frequency polled data. (see my picture above for example, with that the min step was set to 1s on 10s interval polled data) |
One option to help with this issue is to use an |
Not sure I understand, you want grafana to query for more data points than there are pixels in the graph? Say you set a max step your query will return too much data. Prometheus have range selectors and functions that should allow you to get what you want, using the interval variable can help |
@torkelo okay I'll try to explain the problem (as I understand it) I have data with resolution of 1 value per 10 seconds. Now if I want to show all the data and want to see the spikes, what would be the logical step in a 5 minute graph? I'd say it's 10 seconds. Okay, now lets set min step to 10s for the fun of it. Graph keeps changing completely every reload!? Okay let's examine what we send to Prometheus:
See the step 15? That basically means that we are stepping over existing values. Let's look at this on a timeline.
Now looking at that we see that the spikes of 100 may be jumped over due to the misaligned step. Prometheus might report 1 or 100 on the 15 second marker depending on which side the step happens to land at. Now what happens if our data is at 1 second resolution? Well we miss 14 seconds of values and especially on very unstable gauges the graph looks completely different on each refresh. :) I hope this explains the problem better. |
Also please have a look at the image here. This is a 10 second interval dataset with min step set to 10 (or 1, grafana sends 15 regardless). The 2 graphs are from same data, but with a difference that I pressed refresh 3 seconds after taking the left screenshot. I'd say there are enough pixels to show the other points as well... :P |
@atonkyra just to confirm, did you say you had already adjusted the scrape interval on the data source? |
@bmildren okay, I didn't even know there was such setting, setting that to the minimal value of any scrape job you have will fix the problem. I still think defaulting to anything hard-coded is utterly broken (on the lower bound). At the very least it should be configurable on the graph. |
If we wanted some automation, one could use /api/v1/status/config API endpoint on Prometheus. The contents are in yaml so that would need parsing. Grafana could then perform a match on the job key and failing that, fallback to global scrape interval on the config. |
Grafana defaults to the same default prometheus use. And nothing is hard coded as you can change it. However I can still see a problem here as Grafana should align step in even intervals of the min step / scrape interval option |
FYI that setting is only available in the night builds. |
I'd say we still need a per-graph (or query?) "scrape interval" field and when that is absent default to the global on the datasource. Let's say I have a Prometheus instance with default scrape interval of 10s but I happen to have some data at 1s and some at 60s. Single global default just doesn't make sense for many Prometheus installations IMHO. |
This commit makes it possible to set min interval per panel. Overrides the value configured on the datasource. ref #9705
FWIW my original issue is actually solved by using Although you can't take the |
This is a showstopper for us. We are trying to migrate to Prometheus from Graphite and not having consistent graphs doesn't work for us:) It's really hard to debug an issue on the system when graphs are constantly changing. Will follow this thread and provide info if needed. |
I believe Grafana |
In this case, isn't that just an artifact of irate? irate is based on the last two data points in the range vector ( https://prometheus.io/docs/prometheus/latest/querying/functions/#irate() ), here you're looking at the graph 1 minute later so the last two data points in each of the 5m ranges are going to different no matter what you set your scrape interval to. 🤔 |
Could be, now that I think of it (I'm a total Prometheus noob). Should using rate() function solve the problem, because I see the same problem with rate(). |
@matejzero does resolution 1/1 help at all? I think we still have a problem with this which I believe is that the dashboard refreshes aren't aligned to the step (which I believe @torkelo mentioned earlier), example:
So basically when we have uneven scrape/step intervals we ultimately have situation where step causes us to hit completely different values on each refresh. |
FYI, I've just made a Prometheus 2.3.0 + Prometheus 2.3.0 has significantly improved the performance of range queries (which is where Grafana and |
Great! I'm already running it on our Prometheus and so far it looks good:) |
is this effectively resolved in grafana 5.2 with #10434 ? (my original request was resolved with using |
I have similar problems due to step. My metric is something like:
Now, if tick the Instant checkbox I get a URL that includes time=xxxx, and that's it. If I untick the Instant checkbox, I get a URL that includes start=xxxx&end=yyy&step=300. Due to the step=300, I actually get a max_over_time, in a period of a few hours, that can be actually lower than the max_over_time I get with the Instant mode. Which is mathematically impossible. I just want to get rid of the |
Still no solution for this? |
If you want to set a fixed step, you can do this in Explore. But for dashboard panels we have not settled on a solution yet. |
Does this mean Grafana is not right tool for historical data view as this represent data wrongly and will completely mislead analysis and reporting? Changing step value on fly is changing full data view resulting to wrong reporting... Please confirm so that same will be communicated to CXO forums. |
Prometheus is fine for analytics, but as with any data source, it's important to understand its properties/limitations. For example, you'll want to use an aggregation function like sum_over_time if you need accurate reporting. The problem discussed here is inherent to any time-series DB that uses sampling, it just happens to be very visible. |
@leoluk I think you mistaken to read problem condition. This appears to me that query is built at grafana level dynamically and not prometheus level. So |
This sounds like a question for the community or commercial support through a company like Robust Perception. When you query a data source to build a graph, you need either sampling (i.e. only looking at every n-th data point) or aggregation (min/max/quantiles) to reduce the number of data points to something that can be displayed in a graph. Grafana instructs Prometheus to sample data - via the step parameter - depending on the zoom level to render data at an appropriate resolution. If you zoom out, resolution will go down. There's a hardcoded limit on how many data points Prometheus will return in a single query. If your use case is finding, say, power outages of 5m in a 7-day graph, then no, that won't work (with any DB). This issue is about noisy graphs being unstable since a different set of data points are sampled each time. |
@leoluk This makes limitation on gauge data type on grafana. Gauge is actually to show all data points. I see sole objective of dashboard to show actual guage values. Can we summarize that Grafana is having limitation to show gauge value properly for data older than 12 hours? and there is high probability that this will show error. |
Your monitor wouldn't even have enough pixels to show all data points, depending on the resolution. This is not a Grafana limitation, it's a fundamental property of working with time-series data. |
I guess we could do something like a dropdown select with (max|exact|min)[step] so that user can decide how the step param should be evaluated. |
Previously we used to be able to specify the step parameter in Prometheus. This was changed in #8073
It would be nice to be able to get this functionality back to be able to display brief spikes in graphs.
The text was updated successfully, but these errors were encountered: