You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.
As metrics are buffered in memory as "chunks" and periodically flushed to cassandra, restarting the metrictank process results in data loss.
Though we plan to solve this by replaying data in kafka, i think it would be great if we could provide a generic solution that works for all input options, specifically carbon ingestion. This will allow single server installations to be able to experience crashes/restarts without experiencing data loss or at least limiting it to a few seconds.
What i propose is writing metrics to disk (an append only log), with a file per chunk window (aka chunkspan). In addition to the log, we keep an on-disk index of all series seen during the chunk window with a flag to indicate if they have been saved or not.
A background task would "compact" the logs once we are half way through the next chunk window. The compaction process would start by reading the index to identify the unsaved chunks. If there are no unsaved chunks then the whole log can be deleted (ideal case), else it is copied to a new file with saved series excluded.
After a restart we would then have at most 1.5 * chunkspan worth of data to replay be streaming from disk and discarding series from the chunk window we know were already saved.
The text was updated successfully, but these errors were encountered:
seems useful but also quite an undertaking.
I see two solutions to the restarts-without-data-loss-when-not-using-kafka problem: one is a WAL like you describe, the other is just to instruct people to spin up another instance and wait however long needed and then switch primary role, at which point you can stop the old one.
it's arguably a bit more complicated but then again, metrictank is also not aimed to "small-scale" installs.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
As metrics are buffered in memory as "chunks" and periodically flushed to cassandra, restarting the metrictank process results in data loss.
Though we plan to solve this by replaying data in kafka, i think it would be great if we could provide a generic solution that works for all input options, specifically carbon ingestion. This will allow single server installations to be able to experience crashes/restarts without experiencing data loss or at least limiting it to a few seconds.
What i propose is writing metrics to disk (an append only log), with a file per chunk window (aka chunkspan). In addition to the log, we keep an on-disk index of all series seen during the chunk window with a flag to indicate if they have been saved or not.
A background task would "compact" the logs once we are half way through the next chunk window. The compaction process would start by reading the index to identify the unsaved chunks. If there are no unsaved chunks then the whole log can be deleted (ideal case), else it is copied to a new file with saved series excluded.
After a restart we would then have at most 1.5 * chunkspan worth of data to replay be streaming from disk and discarding series from the chunk window we know were already saved.
The text was updated successfully, but these errors were encountered: