Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upMetrics created by rules should keep the timestamp #386
Comments
This comment has been minimized.
This comment has been minimized.
|
On what order of magnitude is the lag you're seeing? 2014-03-28 21:55 GMT+01:00 Tobias Schmidt notifications@github.com:
Key: 0xC42234343FDE43A8 |
This comment has been minimized.
This comment has been minimized.
|
It's probably just the rule evaluation interval being out of sync with the
|
This comment has been minimized.
This comment has been minimized.
|
Yeah, my assumption was that it's the rule evaluation time visible here. Given a low scrape interval (15s for example), I'd expect to see the data On Fri, Mar 28, 2014 at 7:30 PM, juliusv notifications@github.com wrote:
|
This comment has been minimized.
This comment has been minimized.
|
Thanks for reporting this. This is indeed worth fixing as a matter of Can you give some more detail about the load type the server is The pinning of times to samples is a no-brainer, but handling the 2014-03-29 0:39 GMT+01:00 Tobias Schmidt notifications@github.com:
Key: 0xC42234343FDE43A8 |
This comment has been minimized.
This comment has been minimized.
|
@grobie I think there are some misunderstandings here about how rules work. The timestamp assigned to recorded rules is already independent of how long it takes to evaluate the rules. A timestamp is chosen at the beginning of a rule evaluation cycle, the expression is evaluated at that point in time, then the same timestamp is attached to the resulting samples. Even if a rule took 5 minutes to evaluate, its timestamp would be the same as one that took 1 second. The reason why the rule timestamp conceptually cannot be the timestamp of the underlying samples is that the underlying samples have many different timestamps. Imagine you are summing over the timeseries from 100 different targets. There is then not 1 scrape timestamp, but 100, but the new samples recorded by the rule can only have one timestamp in the end. For that reason, rule evaluations and their timestamps are decoupled at regular intervals from scrapes. @grobie In your case, you are scraping targets every 30 seconds and also evaluating rules every 30 seconds. Complete rule evaluation cycles are taking 0.5s for the 50th percentile, 2s for the 99th percentile. So the time it takes to evaluate the rules is certainly not connected to the lag you are seeing. I think what you are seeing is a combination of two things: the rule evaluation cycle not happening up to 30s after the scrapes, and Prometheus' handling of chains of rules (data dependencies) not being very strong yet. I.e. what can happen is that if you have rules like:
...at the time when We should probably detect data dependencies like this and have something akin to a "memory barrier" (http://en.wikipedia.org/wiki/Memory_barrier) take effect for them. For the time being, if you want the most time-accurate results you can get, I guess you'll want to avoid such chains of rules and write out the full expression in each of them instead, i.e. having only data dependencies to scraped data, not rule-computed data. |
This comment has been minimized.
This comment has been minimized.
|
There's also #17, which is closely related to this, but probably not the problem in this case, as your rules are already topologically sorted in your file. |
This comment has been minimized.
This comment has been minimized.
|
The asynchronous storage appends are fixed (except for the case of completely new time series appearing during a rule evaluation cycle and not being indexed yet), and the other timestamp issues here are basically working as intended for now (or tracked in other issues). Closing this one. |
juliusv
closed this
Jul 22, 2015
simonpasquier
pushed a commit
to simonpasquier/prometheus
that referenced
this issue
Oct 12, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
grobie commentedMar 28, 2014
When using a rule expression to create a metric, it shouldn't matter how long the server needs to create the new data, datapoints should be at added at same time. At the moment, rule created metrics lag behind.
These two graphs show the same data, one queries the times eries directly
sum(foo) by (bar), the other one uses the rule defined metric:job:foo_by_bar:sum.