Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

on disk replay log for metrics #275

Closed
woodsaj opened this issue Aug 6, 2016 · 2 comments
Closed

on disk replay log for metrics #275

woodsaj opened this issue Aug 6, 2016 · 2 comments
Labels

Comments

@woodsaj
Copy link
Member

woodsaj commented Aug 6, 2016

As metrics are buffered in memory as "chunks" and periodically flushed to cassandra, restarting the metrictank process results in data loss.

Though we plan to solve this by replaying data in kafka, i think it would be great if we could provide a generic solution that works for all input options, specifically carbon ingestion. This will allow single server installations to be able to experience crashes/restarts without experiencing data loss or at least limiting it to a few seconds.

What i propose is writing metrics to disk (an append only log), with a file per chunk window (aka chunkspan). In addition to the log, we keep an on-disk index of all series seen during the chunk window with a flag to indicate if they have been saved or not.

A background task would "compact" the logs once we are half way through the next chunk window. The compaction process would start by reading the index to identify the unsaved chunks. If there are no unsaved chunks then the whole log can be deleted (ideal case), else it is copied to a new file with saved series excluded.

After a restart we would then have at most 1.5 * chunkspan worth of data to replay be streaming from disk and discarding series from the chunk window we know were already saved.

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 8, 2016

seems useful but also quite an undertaking.
I see two solutions to the restarts-without-data-loss-when-not-using-kafka problem: one is a WAL like you describe, the other is just to instruct people to spin up another instance and wait however long needed and then switch primary role, at which point you can stop the old one.
it's arguably a bit more complicated but then again, metrictank is also not aimed to "small-scale" installs.

@stale
Copy link

stale bot commented Apr 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 4, 2020
@stale stale bot closed this as completed Apr 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants