Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align The requested period with Interval if Interval templating is used #3781

Closed
PeterZaitsev opened this issue Jan 18, 2016 · 12 comments
Closed
Labels
help wanted prio/medium Important over the long term, but may not be staffed and/or may need multiple releases to complete. stale Issue with no recent activity type/feature-request

Comments

@PeterZaitsev
Copy link

Currently Grafana uses many intervals based on the current time, for example at 13:34:56 asking for last 24 hours will look at the interval from 13:34:56 from the last day.

When uses has Interval templating enabled one would expect information to be aligned on the interval boundary, For example if I am looking at last 24 hours with Interval set to 1 hour I would rather see data from 14:00:00 last day to 14:00:00 not have potentially misleading information for the first interval in the day.

This is especially valuable for accumulated value - for example if I am looking at avg network throughput it does not matter that much if hour is complete. If I'm looking to see how much traffic in GB I'm producing per hour viewing full hours would be preferred.

@PeterZaitsev
Copy link
Author

Filed at Torkel's request https://groups.io/g/grafana/message/1221

@torkelo torkelo added type/feature-request prio/medium Important over the long term, but may not be staffed and/or may need multiple releases to complete. help wanted labels Jan 18, 2016
@torkelo
Copy link
Member

torkelo commented Apr 21, 2018

is this issue still relevant ?

@PeterZaitsev
Copy link
Author

I think this is still relevant as an option to offer some "stability" Currently if I'm looking at 5min intervals over last hour for example I see the intervals shifting all the time - sometimes it is say 43 to 48 min sometimes 45 to 50 etc.

What this causes is where I would expect with refreshes graph to simply shift side ways with history being same it actually can change dramatically. Imagine for example I had some issue going on from 43 to 47 minutes of the hour with 5 min interval with high latency. When intervals shown at 40-45-50 I will see spike shown for 2 intervals but when on refresh when it shifts to 43-48-53 I now have only one interval showing double the spike which looks crazy.

I recognize not everyone might want this but for me having fixed interval alignment which shifts for complete interval size would often be preferable

@leszekeljasz
Copy link

leszekeljasz commented Jul 4, 2018

The problem of doing it on the data source side is that it would have to be implemented in every data source that in combination with Grafana suffers this issue. I'm not sure how many of them do have this issue, but I'm sure there is more then one and at least the developers behind Prometheus do not seem to be interested in adding this type of functionality.

I might be wrong, but I would like to express my opinion here in hope of helping to solve the issue that @PeterZaitsev brought up, which I believe is a major issue right now, at least in combination with Prometheus data source.

There are many filed bug reports and proposals on Prometheus issue tracker related to getting Prometheus and Grafana work together nicely, but as of now, it's impossible to achieve, every proposed solution has a major flaw. It's not an isolated issue, it's basically every day thing for Grafana / Prometheus users, and can get pretty extreme, especially for very irregular / spiky metrics, example from our dashboard, refreshing every few seconds, last 6h view:
ezgif-2-d7cd81b330

another metric last 1h view:
ezgif com-gif-maker
This is the effect of lack of data alignment and inaccuracy of the rate() function in Prometheus, really not telling me much of what is going on in RabbitMQ.
Here is some examples: prometheus/prometheus#2364, prometheus/prometheus#3746, there is more.

Im trying think whether the problem is on Grafana side or Prometheus side, and I think it's actually both.
One problem is the rate() function in Pometheus can get very inaccurate, I don't want to elaborate on this here, let's assume the rate() function is just specific but we are ok with it. There is a fork of Prometheus (https://github.com/free/prometheus) that adds a modified version of the rate() function - xrate() that is supposed to address the limitations but that's not Grafana's issue anyway.
There is another problem - lack of data alignment produces a very confusing results, and when thinking of where it should be addresses, I'm trying to take into account the philosophy of both tools.
I believe the original idea behind plotting is to make the numbers much easier to digest for humans. Thanks to visualising we can observe trends or anomalies much easier than by reading raw numbers. Correct me if I'm wrong but isn't providing quality graphs the aim of Grafana? And the purpose of quality graphs is, to let us, humans, observe trends and identify anomalies, but this can only be achieved if the graphs are stable. If the graph behaves like the one I pasted above, instead of simplifying observations, it basically creates confusion, so it just fails to meet the goal.
The problem with lack of data alignment is not only with relative time periods, like "last 6h" or "last 30 mins" and refreshing the dashboard. The same issue appears when digging in -it's not really doable to select a perfectly aligned time window when zooming in, and when data are differently aligned, they may (and they usually do) look completely different. Yes, I could just open the time picker and type the start and end date, perfectly aligned, but that's not really the easy way of navigating, when one tries to dig in and correlate events across the system. I think we all agree this is not the way it was meant to be used.

Ok, but this still doesn't prove the problem should be solved on Grafana side, Prometheus could do that, right? Well, I think it's not going to happen in Prometheus anytime soon, because of two reasons:

  1. Lack of data alignment does not mean that the results from Prometheus are wrong. These results are valid, they are just confusing for humans. And Prometheus doesn't really try to address this problem, that's why it leaves visualising to other tools, like Grafana. Yes, it does offer simple graphs, but that part has a different purpose. I can't speak for Prometheus developers, but the way I see it is, they are not even trying to solve the visualising part, so why would they try to solve little details that are related to just that functionality? I think instead, Prometheus aim is more like "ask me anything about time series data, and I will give it to you".
  2. If Prometheus was about to provide alignment, it would have to guess how to align the data, and that is not obvious. Prometheus API has the step parameter, which can be set to anything - it can be 5s, or 7s, or 183s. When human asks to have a graph displayed with 60s interval, it's pretty much obvious how to align it, but Prometheus does not really know it's a human that is going to be looking at it, it could be anything, and even if it made an assumption on this, how should the data be aligned if the interval is, let's say 183s? It's not going to be easy to end up with a good universal approach to this.
    So my point is, Prometheus is just a database with a query language (plus other features), it doesn't try to solve the data presentation part. It's not even meant to know how the data is going ot be presented, and even if it was, it would have to make assumptions about it, but that's not easy with current design. Grafana could send a hint as a parameter, but then, we would be already making the alignment decision on the client side. So in the end, the system that is presenting the data should know what is the best way of doing it, and in this case that would be Grafana.

I'm sorry for a long and boring post, but after spending 2 days on this issue, I thought it would be worth to give my 2 cents. I'm hoping this will at least start a debate that would eventually lead us to a solution.

@PeterZaitsev
Copy link
Author

Torkel,

It is still relevant. In fact I forgot about this case and posted similar forum message couple of days ago

https://community.grafana.com/t/is-it-possible-to-align-interval-start-with-interval-length/8441

@PeterZaitsev
Copy link
Author

@leszekeljasz

In PMM we did a lot of work to prevent some of this. The key thing we use is the $interval for the step in Prometheus Guery which is also always aligned with the rate() computation interval. This means we're computing average for say 5 min and we're taking it every 5 minutes each data point would provide the average rate for those exact 5 minutes.

For gauge metrics we do not use simple value (unless it is constant) but rather avg_over_time() or max_over_time() whatever we're trying to see.

It was my experience many people do not understand how data "fitting" works by default with Grafana and what it does not do some magic averaging or other processing but simply picks 1/nth value which of course can produce bizarre views for the cases when you have some short but very relevant spikes

The Interval alignment problem remains the issue.

@leszekeljasz
Copy link

@PeterZaitsev just today I found alignment has been added in Grafana 5.2.0:
https://github.com/grafana/grafana/blob/master/CHANGELOG.md#520-beta1-2018-06-05
I just upgraded, looks fantastic. Finally stable graphs :-)

@funlake
Copy link

funlake commented Feb 15, 2019

@leszekeljasz Thank you so much,just upgraded grafana to 5.2.0 and this annoying crap just gone.:)

@stevenh
Copy link

stevenh commented Feb 23, 2020

Looks like this is was only for promethus, influx still seems to have the same issue.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in the last year. It will be closed in 30 days if no further activity occurs. Please feel free to leave a comment if you believe the issue is still relevant. Thank you for your contributions!

@github-actions github-actions bot added the stale Issue with no recent activity label Jan 19, 2024
Copy link
Contributor

This issue has been automatically closed because it has not had any further activity in the last 30 days. Thank you for your contributions!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 20, 2024
@stevenh
Copy link

stevenh commented Feb 20, 2024

A real shame to see bug reports just closed because they are old :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted prio/medium Important over the long term, but may not be staffed and/or may need multiple releases to complete. stale Issue with no recent activity type/feature-request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants