Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upDrop chunks at the beginning of persistence queue #1119
Comments
This comment has been minimized.
This comment has been minimized.
|
I'd be very cautious here. "Sample data becomes exponentially less important over time" is a big assumption. Throwing data away to ingest new data could be very surprising to some users, and adversary to their use-case. If at all, this needs to be configurable by (yet another) flag. In general, I'd set this to low priority as you are usually in a very bad shape anyway if you run into this situation. You are either consistently overloading your server or you have a broken disk (as in this case). Both cases will most likely trigger other problems anyway, so I'd doubt you would be able to continue monitoring based on the most recent data for much longer even with the change suggested here. Whoever wants to implement this: It will break invariants of the storage model because it will create holes in the middle of a time series. A dropped chunk needs to get its chunk descriptor removed, too, and offsets (like the persist watermark and the chunk desc offset) need to be adjusted. Dropping chunks to be persisted will most certainly render the time series "dirty" (triggering more frequent checkpointing - which doesn't work anyway because of the overloaded/broken disk... but then we are only trying to keep monitoring running for a bit longer here...) |
This comment has been minimized.
This comment has been minimized.
|
If the persistence queue is full Prometheus is already running in degraded mode. Something is broken and either way we are going to lose data. I don't see how this breaks invariants. If samples are dropped a bit further ahead it causes holes in the time series in the same manner. I see no reason for this to be configurable. If the question is what my cpu load was 45 minutes ago and what it is now, for monitoring the answer simply is "now". |
This comment has been minimized.
This comment has been minimized.
|
Trying to resolve some confusion here, I guess @fabxc is talking about changing the behavior of this bit of code: https://github.com/prometheus/prometheus/blob/master/storage/local/storage.go#L540-L549 I.e. when samples haven't even been added to a chunk yet, but are waiting for being appended and are blocking newer values. Is that correct? That piece of code could instead drop the current samples upon backlog and favor new ones. I agree though that in this degraded mode, things are likely screwed up enough to not matter much either way. |
This comment has been minimized.
This comment has been minimized.
|
I share Beorn's concerns here. |
This comment has been minimized.
This comment has been minimized.
|
The invariant I was speaking about is something the internal implementation of the storage layer depends on, not an external assumption of storage behavior. It's just a heads up for whoever implements this that means something has to change at many places to not create a lot of inconsistencies. @juliusv The code you have referenced is the code to append the most recent sample. With the suggestion here, we would not block here but drop the oldest chunks of each time series that are waiting for persistence. Semantically, that's pretty clear. It's just that the internal implementation is not good in creating holes in the middle of a time series (while simply not letting samples enter the storage layer does not cause any trouble to the storage layer's consistency - it's not a "hole", it's more like "has never happened"). But too be clear: I'm talking about two completely different aspects:
In any case, I believe we are discussing a fringe case of little relevance here. |
This comment has been minimized.
This comment has been minimized.
barkerd427
commented
Sep 28, 2015
|
What does the HA story look like for Prometheus? This seems like an issue
|
This comment has been minimized.
This comment has been minimized.
|
Yes, another reason why this is an issue of low relevance. To truly defend against single-host failure, you would run two Prometheis in parallel, scraping the same targets. |
This comment has been minimized.
This comment has been minimized.
|
The usual cause of failure is going to be too many samples, or rules that are too expensive. Running another identical Prometheus isn't going to help you there, the best you can do is keeping an eye on the capacity of your monitoring, and when things go bad to detect the situation with metamonitoring and get a human in to fix it. I'm not sure it's worth doing anything here. |
This comment has been minimized.
This comment has been minimized.
barkerd427
commented
Sep 28, 2015
|
From an operations standpoint, it would be ideal to run two Prometheis with
|
This comment has been minimized.
This comment has been minimized.
|
As a reminder, this issue was triggered by a disk failure. That's the case I was referring to when describing the HA setup with 2 (or more) identically configured Prometheis. |
This comment has been minimized.
This comment has been minimized.
|
I agree that in context of the implementation details this is nothing On Mon, Sep 28, 2015, 2:16 PM Daniel Barker notifications@github.com
|
fabxc
closed this
Sep 29, 2015
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
fabxc commentedSep 27, 2015
We just had a case of suspected disk failure. Chunks were piling up waiting for persistence. When the capacity was reached, no new chunks were created and ingestion essentially stopped.
As sample data becomes exponentially less important over time, the oldest chunks waiting for persistence should be dropped. This way the most recent data is at least in memory and the current state of the world remains queryable.
@beorn7