Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upShould Prometheus pretend to know the future? #4966
Comments
beorn7
added
kind/question
component/promql
labels
Dec 6, 2018
beorn7
referenced this issue
Dec 6, 2018
Closed
Current values shows 0 for Prometheus with 5.x Grafana #13419
This comment has been minimized.
This comment has been minimized.
It used to do that via interpolation, and that behaviour was removed. Predictions are a different matter, we have predict_linear for that. I don't think there's anything we can sanely/safely do here, I'd suggest that Grafana not request data from the future.
We do accept scrapes containing metrics up to 10 minutes into to the future as a safety measure, while allowing for time skew. Any of the proposed changes could break such users, and 3 violates timestamp and step semantics. |
brian-brazil
added
priority/Pmaybe
and removed
kind/question
labels
Dec 6, 2018
This comment has been minimized.
This comment has been minimized.
|
To be clear: This is not about actual prediction. The title refers to the fact that Prometheus returns a potentially very problematic result when asked for a sample from the future, without any flagging. |
This comment has been minimized.
This comment has been minimized.
|
One thing we could maybe do is return back the maximum timestamp currently in the tsdb as an additional piece of information in queries. |
This comment has been minimized.
This comment has been minimized.
|
The Grafana problem has been solved by grafana/grafana#16110 . I'll leave this issue open nevertheless. I think adding the maximum timestamp currently in the tsdb as suggested above is a good first step to at least make the problem of “querying the future” discoverable. I also think we should document the stability of query results over time explicitly (grace period for samples added with timestamps in the past and the future, ingestion delay (how large can it become in practice, actually? Prom1 had a problem with indexing delay, which shouldn't exist anymore, but I guess there will still be some finite ingestion delay), rule evaluation delay, mutation of the database by the new vertical block merges, …). |
This comment has been minimized.
This comment has been minimized.
Those shouldn't be an issue, as it's either a compaction or a reload. |
This comment has been minimized.
This comment has been minimized.
I don't understand the relevance. If I add an overlapping block, I'll “change the past” and thus it is important to keep in mind if a user of the query API wants to do caching. No? |
This comment has been minimized.
This comment has been minimized.
|
Ah right. Deletion and retention also will have that effect. |
beorn7 commentedDec 6, 2018
So far, the query model is very clear:
For any point in time tn, we process…
--query.lookback-delta.The returned results are time-stamped at tn.
If tn is in the future, weird things happen, however:
--query.lookback-delta, results will start to disappear, which is especially problematic if the query is an aggregation, so that only some of the expected series are included.rateand friends will stop yielding results at all once there is only one data point in the range. In other cases, there will be no result anymore if no data point is in the range. This again is especially problematic with aggregations.State of the art is, IMHO, that we consider queries from the future an accident, resulting from time skew (we even warn in the web UI about time), and we return results as described above.
As you might have noticed, Grafana is now using future query times deliberately. It is doing so for consistently aligned query times in range queries, e.g. a range query with a step length of 30s will query for data points at the full and half minute always. To not miss the most recent data, Grafana will, in this case, query for up to 30s into the future and then (presumably) still cut the displayed graph to the present. This strategy collides with the implications I have listed above, creating issues.
One might argue that Grafana's approach here is simply wrong, and indeed, the feature is somewhat controversial, see the discussion in the relevant PR.
However, it triggered some thinking on my side if we should handle queries from the future in a different way. Even if we disregard Grafana's current use case, there will be more that will ask Prometheus to predict the future, accidentally or deliberately.
Mutually exclusive options that come to mind:
There are certainly many pros and cons. 1. would break current Grafana hard. 2. would directly counteract Grafana's strategy to also get the freshest result. 3. has the advantage that it does what probably most deliberate users of future timestamps want.
Of course, there is the overarching question if this is a breaking change only possible in Prometheus 3, even if we believe it is a change we want.