MaxBlockDuration is 31 days when only using size based retention configuration #6857

richardwilko · 2020-02-21T15:57:42Z

cfg.tsdb.MaxBlockDuration defaults to 31 days, not 10% of 31 days when only using size based retention.

Bug is in on line 313 of prometheus\cmd\prometheus\main.go, the 10% scaling only applied when the retention duration is non-zero, but retention duration is zero when setting only storage.tsdb.retention.size

Currently an issue on master

brian-brazil · 2020-02-21T16:03:01Z

That seems right to me, we want the default to be 31d in that case.

richardwilko · 2020-02-21T16:09:43Z

I currently have a size based cutoff (50GB), and the first retention clear out deleted almost all my metric history, because largest block contained almost all my metric history.

Maybe 50GB is pretty small compared to a usual case, but its quite unexpected to loose almost all my history.

Clearly i can set both a time based and a size based retention to 'fix' this as it will force smaller block sizes, but its not obvious. Maybe it just a case of updating the docs to make this clear?

brian-brazil · 2020-02-21T18:17:40Z

If you want to keep 30d of history, you're going to need ~60d of disk space given how everything works. Changing the retention period doesn't really change that.

dprittie · 2020-06-17T11:36:00Z

@brian-brazil - I don't think that is true. Surely if storage.tsdb.retention.time = 34d then this bit of code from cmd/prometheus/main.go comes into effect: maxBlockDuration = cfg.tsdb.RetentionDuration / 10. So the max block duration would be set to 3.4 days, so when the retention time of 34d is exceeded a 3.4 day block would be removed resulting in 30 days of data always being visible.

I have exactly the same problem as @richardwilko I am currently only using size based retention and I see blocks created which are as big as 45% of my retention policy. Whereas ideally this would never be larger than 10%.

Would it not be possible to implement a similar strategy for max block size as is currently done for database retention, ie consider both size and length and take whichever limit is hit first? I am going to take a look and see if I can put together a PR for that, but obviously that would not be worth doing if you don't think this is a valid approach.

dprittie · 2020-06-17T11:43:16Z

@richardwilko - my current workaround is to set storage.tsdb.max-block-duration=2d, this is obviously not ideal as it requires that you write in a little logic into whatever method you use to launch prometheus to determine how many days worth of data your size based retention can handle, then multiple that by 0.10. If you are like us and the amount of data being ingested varies quite a bit over time it means you need to restart prometheus regularly in order to recalculate a sensible value for storage.tsdb.max-block-duration

brian-brazil · 2020-06-17T11:53:02Z

The problem is that if there's a size configured but no time, we have no idea what the size translates to in time terms.

dprittie · 2020-06-17T12:01:56Z

So its not possible to work out how much data has been written to a block and stop writing once you hit 10% of you storage.tsdb.retention.size?

Does that mean that the storage.tsdb.retention.size setting is intended to only ever be used in conjunction with storage.tsdb.retention.time?

brian-brazil · 2020-06-17T12:34:35Z

That's not how compaction works, once we've chosen to compact we work series by series rather than in time slices.

bboreham · 2024-04-09T11:27:41Z

Reviewing this at the bug scrub, we agreed both with the sentiment that 31 days is a very big block for most people, and that Prometheus can't easily target a size in bytes.

Some suggestions came up in discussion:

If a block is already over 10% of the max retention size, then don't include it in further compaction. This will avoid the worst symptoms.
Drop the default max from 31 days to something more like 4 days, then it will work fine for most people and those that really want enormous blocks can configure them. This might need to be a Prometheus 3.0 change.

frittentheke · 2024-04-10T11:18:48Z

@bboreham I know every body loves their own bugs the most. But while reading your suggestions I remembered running into and reporting yet another issue in relation to size based retention: #11112. My issue is more about running out of disk space doing ~~rotations~~ compactions, but I'd still love for Prometheus to be able to just work with the given space (volume), without any manual tuning, adjustments or guesses about how much churn there is or how large compaction might become.

bboreham added the help wanted label Apr 9, 2024

beorn7 added priority/P3 component/tsdb kind/enhancement labels Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaxBlockDuration is 31 days when only using size based retention configuration #6857

MaxBlockDuration is 31 days when only using size based retention configuration #6857

richardwilko commented Feb 21, 2020

brian-brazil commented Feb 21, 2020

richardwilko commented Feb 21, 2020

brian-brazil commented Feb 21, 2020

dprittie commented Jun 17, 2020

dprittie commented Jun 17, 2020

brian-brazil commented Jun 17, 2020

dprittie commented Jun 17, 2020

brian-brazil commented Jun 17, 2020

bboreham commented Apr 9, 2024

frittentheke commented Apr 10, 2024 •

edited

MaxBlockDuration is 31 days when only using size based retention configuration #6857

MaxBlockDuration is 31 days when only using size based retention configuration #6857

Comments

richardwilko commented Feb 21, 2020

brian-brazil commented Feb 21, 2020

richardwilko commented Feb 21, 2020

brian-brazil commented Feb 21, 2020

dprittie commented Jun 17, 2020

dprittie commented Jun 17, 2020

brian-brazil commented Jun 17, 2020

dprittie commented Jun 17, 2020

brian-brazil commented Jun 17, 2020

bboreham commented Apr 9, 2024

frittentheke commented Apr 10, 2024 • edited

frittentheke commented Apr 10, 2024 •

edited