Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upGaps in recording rules series, but data is available when querying the rule expression #4229
Comments
This comment has been minimized.
This comment has been minimized.
|
Just wanted to add few more details:
|
This comment has been minimized.
This comment has been minimized.
|
Can you share the raw time series data backing this and the full Prometheus configuration and rule group please? |
This comment has been minimized.
This comment has been minimized.
|
Ah wait, I see what it is now. You're using a 2m rate on data with a 1m scrape interval. That's going to be flaky, you need at least a 4m rate in those circumstances. |
This comment has been minimized.
This comment has been minimized.
|
Hi @brian-brazil! thanks for the quick response. In the issue post, I'm showing the graph of the recording rule and the actual expression I use in that recording rule. As you can see there, the recording rule is mostly gaps however the actual expression shows a nice and continuous graph with no gaps. That's a good point about the [2m] rate interval, indeed we will increase it to get things more accurate (considering the 1m scrape interval), but this still doesn't explain why the underlying recording rule expression shows a nice graph but the recording rule doesn't. Also, as I mentioned, this doesn't happen on all series generated by this recording rule. Here's the extra details you requested. Raw time series data: I'm not sure exactly what I should provide here and how, could you guide me please? Rule group:
Full configuration:
|
This comment has been minimized.
This comment has been minimized.
They're being run at different times, so that's sufficient to explain the issue. If you refresh a few times you should see the issue. |
This comment has been minimized.
This comment has been minimized.
|
I've been hitting the Execute in the Graph UI for a few hours now (using the record expression as the query) and I always see a nice graph with no gaps, while the recording rule is mostly gaps. And this issue is limited to a specific set of labels in the record generated time-series, specifically on series with this label |
This comment has been minimized.
This comment has been minimized.
|
It might be a slow host, but a 2m rate on 1m interval is going to have issues like this. |
This comment has been minimized.
This comment has been minimized.
|
The rule seem to evaluate quite quickly, I see Anyway, I'm going to modify the recording rules and set the interval to Will update with the results. |
This comment has been minimized.
This comment has been minimized.
|
After modifying the rate interval to Thanks for the analysis and support! What struck me as strange is that I got consistent results from the record expression, but not from the record itself, but I guess we were hitting some corner case due to rate interval being too short. Thanks again and cheers! |
dannyk81
closed this
Jun 6, 2018
brian-brazil
added
the
kind/question
label
Jun 6, 2018
This comment has been minimized.
This comment has been minimized.
alexf101
commented
Feb 8, 2019
|
I was having a similar issue and thought it must be a bug, but this makes sense to me now, thanks for the analysis.
@dannyk81 My understanding is that the recording rule data will have been generated once shortly after the scrape and then cached forever, whereas the other query will be looking back over a history that has since become complete, which would explain this behaviour. |
dannyk81 commentedJun 6, 2018
Bug Report
What did you do?
Defined a Recording Rule:
What did you expect to see?
Expected to see continuous graphs from the recording rule, in the same way as I see when querying the underlying expression directly.
What did you see instead? Under which circumstances?
Instead I see mostly gaps with no data on some series of the recording rule (I did find that all the gaps are for series of a specific
kubernetes_io_hostname) but when querying the record expression above for the same data I get the results and there are no gaps.Query:
pod_namelabel selection)Query:
Environment
System information:
Linux 4.14.32-coreos x86_64Prometheus version:
Alertmanager version:
N/APrometheus configuration file:
I pasted below the
kubernetes-nodesjob configuration (as seen in the Prometheus UI), as this is where these metrics are coming from.I reviewed the logs we have from Prometheus over the last few days and there was nothing except the occasional
WAL truncation completed,head GC completedandcompact blocksmsgs, like so: