-
Notifications
You must be signed in to change notification settings - Fork 180
disk size based retention #124
Comments
I considered that initially but we dropped the idea. You either alert on
running out of disk space or retention becoming too short.
Which one you pick doesn't really matter just that an explicit retention
gives more predictable/consistent behavior for end users.
…On Sun, Aug 20, 2017, 10:03 AM Goutham Veeramachaneni < ***@***.***> wrote:
With talk about 2.0 defaulting to +Inf retention time, there is also talk
about adding a flag which does retention based on the data size on disk. Do
we want to support that?
If yes, a simple hacky approach would be to check the size of a block
after persisting and then updating meta.json with that info. On startup and
reload, update the metric. But one obvious problem with that is the WAL
size.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#124>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEuA8oOjLpBSVS6bHuJuohiQtPE9ANMKks5sZ-hGgaJpZM4O8jFT>
.
|
Yes, having a disk-size retention would be very valuable, and possibly more useful than a time-based setup for many installation methods. From the "SRE Hat" perspective. Having systems automatically behave rather than have to rely on alerting and operator action is highly desired. Supporting both time and byte retention flags is very much desired. |
Retention getting shortened is something that needs operator intervention
in the general case. Just deleting old data without explicit consent is not
graceful degradation. I think the disk based alert is much more desirable,
especially since predict_linear will basically work out reliably with days
to weeks of notice period in all cases.
We can give quite reliable formulas to estimate required storage size now.
So this should very rarely need adjustment and if, it should happen
consciously.
Also, reducing retention is not the only/best way to deal with limited disk
space.
Reducing the scraping interval or deleting subsets of older data are just
as valid and often the better alternative. This alone seems like sufficient
reason to me to not make this a first class concept. One can totally write
a small script doing this that get's triggered by an alert or similar.
…On Sun, Aug 20, 2017 at 1:38 PM Ben Kochie ***@***.***> wrote:
Yes, having a disk-size retention would be very valuable, and possibly
more useful than a time-based setup for many installation methods.
From the "SRE Hat" perspective. Having systems automatically behave rather
than have to rely on alerting and operator action is highly desired.
Supporting both time and byte retention flags is very much desired.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#124 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEuA8l_a0GT-8sh3ku96jLP97ZD5J-JCks5saBrFgaJpZM4O8jFT>
.
|
There's really no graceful degradation either way. You can choose between a crash loop without ingesting new data and old losing data; both are near pathological. Especially in testing scenarios and for aggressive federation, a cap on size rather than time seems to make sense. If both are set, whichever is lower should hit. Also, we talked about exposing the local storage size and currently oldest time series as a metric. |
Having a flag for retention-in-bytes is explicit consent for doing exactly this. |
Is there any possibility to get approximate date for this feature? |
With talk about 2.0 defaulting to +Inf retention time, there is also talk about adding a flag which does retention based on the data size on disk. Do we want to support that?
If yes, a simple hacky approach would be to check the size of a block after persisting and then updating meta.json with that info. On startup and reload, update the metric. But one obvious problem with that is the WAL size.
The text was updated successfully, but these errors were encountered: