disk size based retention #124

gouthamve · 2017-08-20T08:03:17Z

With talk about 2.0 defaulting to +Inf retention time, there is also talk about adding a flag which does retention based on the data size on disk. Do we want to support that?

If yes, a simple hacky approach would be to check the size of a block after persisting and then updating meta.json with that info. On startup and reload, update the metric. But one obvious problem with that is the WAL size.

fabxc · 2017-08-20T10:09:29Z

I considered that initially but we dropped the idea. You either alert on running out of disk space or retention becoming too short. Which one you pick doesn't really matter just that an explicit retention gives more predictable/consistent behavior for end users.

…

On Sun, Aug 20, 2017, 10:03 AM Goutham Veeramachaneni < ***@***.***> wrote: With talk about 2.0 defaulting to +Inf retention time, there is also talk about adding a flag which does retention based on the data size on disk. Do we want to support that? If yes, a simple hacky approach would be to check the size of a block after persisting and then updating meta.json with that info. On startup and reload, update the metric. But one obvious problem with that is the WAL size. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#124>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEuA8oOjLpBSVS6bHuJuohiQtPE9ANMKks5sZ-hGgaJpZM4O8jFT> .

SuperQ · 2017-08-20T11:38:44Z

Yes, having a disk-size retention would be very valuable, and possibly more useful than a time-based setup for many installation methods.

From the "SRE Hat" perspective. Having systems automatically behave rather than have to rely on alerting and operator action is highly desired.

Supporting both time and byte retention flags is very much desired.

fabxc · 2017-08-20T12:54:10Z

Retention getting shortened is something that needs operator intervention in the general case. Just deleting old data without explicit consent is not graceful degradation. I think the disk based alert is much more desirable, especially since predict_linear will basically work out reliably with days to weeks of notice period in all cases. We can give quite reliable formulas to estimate required storage size now. So this should very rarely need adjustment and if, it should happen consciously. Also, reducing retention is not the only/best way to deal with limited disk space. Reducing the scraping interval or deleting subsets of older data are just as valid and often the better alternative. This alone seems like sufficient reason to me to not make this a first class concept. One can totally write a small script doing this that get's triggered by an alert or similar.

…

On Sun, Aug 20, 2017 at 1:38 PM Ben Kochie ***@***.***> wrote: Yes, having a disk-size retention would be very valuable, and possibly more useful than a time-based setup for many installation methods. From the "SRE Hat" perspective. Having systems automatically behave rather than have to rely on alerting and operator action is highly desired. Supporting both time and byte retention flags is very much desired. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#124 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEuA8l_a0GT-8sh3ku96jLP97ZD5J-JCks5saBrFgaJpZM4O8jFT> .

RichiH · 2017-08-20T19:59:42Z

There's really no graceful degradation either way. You can choose between a crash loop without ingesting new data and old losing data; both are near pathological.

Especially in testing scenarios and for aggressive federation, a cap on size rather than time seems to make sense. If both are set, whichever is lower should hit.

Also, we talked about exposing the local storage size and currently oldest time series as a metric.

SuperQ · 2017-08-21T11:19:01Z

Just deleting old data without explicit consent is not graceful degradation.

Having a flag for retention-in-bytes is explicit consent for doing exactly this.

zhanglijingisme · 2018-12-13T06:52:28Z

Is there any possibility to get approximate date for this feature?
It could be a great help if we can know when & in which Prometheus version this could be available.
Sincerelly thanks.

mknapphrt mentioned this issue Aug 31, 2018

Added storage size based retention method and new metrics #343

Merged

krasi-georgiev added priority: medium new feature labels Aug 31, 2018

krasi-georgiev closed this as completed in #343 Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk size based retention #124

disk size based retention #124

gouthamve commented Aug 20, 2017

fabxc commented Aug 20, 2017 via email

SuperQ commented Aug 20, 2017

fabxc commented Aug 20, 2017 via email

RichiH commented Aug 20, 2017

SuperQ commented Aug 21, 2017 •

edited

Loading

zhanglijingisme commented Dec 13, 2018

disk size based retention #124

disk size based retention #124

Comments

gouthamve commented Aug 20, 2017

fabxc commented Aug 20, 2017 via email

SuperQ commented Aug 20, 2017

fabxc commented Aug 20, 2017 via email

RichiH commented Aug 20, 2017

SuperQ commented Aug 21, 2017 • edited Loading

zhanglijingisme commented Dec 13, 2018

SuperQ commented Aug 21, 2017 •

edited

Loading