-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post restore reset #545
Post restore reset #545
Conversation
ec3d577
to
795e781
Compare
This commit introduces an interface which acts as a handler for a leaky abstraction in the structure of underlying log stores. In order to properly handle post-snapshot-restore cleanup for log stores generically, we need some awareness of whether the underlying store permits gaps. Boltdb allows for gaps in log store indices, but to handle them it requires a freelist, which is written on every commit. This is costly, particularly when the freelist is large. By completely resetting the LogStore after snapshot, we grow the size of the freelist, which would result in performance degradation. The MonotonicLogStore interface is implemented by LogStores with guarantees of sequential/monotonic indices, like raft-wal, but reverts to the old behavior for boltdb. This also requires special handling within LogStore wrappers (like LogCache), to ensure that the type assertion is passed to the underlying store.
This commit makes use MonotonicLogStore type assertion to delete all entries from the LogStore after snapshot restore.
7353891
to
f10b599
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpalmi Nice. This is also super close, but as it stands it has a huge issue we should fix!
In the last changes we added the log clearing code into the startup restore which is wrong and will delete loose committed data!
Let me know if it's not clear why that is or if I've misunderstood something here - restoring the on-disk snapshot is very different and not at all problematic. We are only trying to fix the problem of restoring from an external snapshot which has the effect of making all the current state invalid and useless!
…onotonicLogStore Co-authored-by: Paul Banks <banks@banksco.de>
…t to ensure logs are not deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work here by all!
…tic during user restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just a couple of questions that are not showstoppers, but one is related to a comment from @banks, so might be worth addressing.
…. re-use monotonic cluster options in test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jmurret.
I had some minor comment suggestions and a test naming suggestion in line but nothing blocking.
@dhiaayachi do you want to do a final pass at this?
I had this branch running against Consul's restore integration test yesterday so I feel happy we resolved the issues here (in combination with hashicorp/raft-wal#24),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Job All!!
I added a nit but I'm ok with merging this as is.
Co-authored-by: Paul Banks <banks@banksco.de>
This PR introduces an interface which acts as a handler for a leaky
abstraction in the structure of underlying log stores. In order to
properly handle post-snapshot-restore cleanup for log stores
generically, we need some awareness of whether the underlying store
permits gaps.
Boltdb allows for gaps in log store indexes, but to handle them it
requires a freelist, which is written on every commit. This is costly,
particularly when the freelist is large. By completely resetting the
LogStore after snapshot restore, we grow the size of the freelist, which would result in performance degradation.
The MonotonicLogStore interface is implemented by LogStores with
guarantees of sequential/monotonic indexes, like raft-wal, but reverts
to the old behavior for for LogStores with index holes, like Boltdb.
The interface also requires special handling within LogStore wrappers (like LogCache), to ensure that the type assertion is passed to the underlying store.
We then use the MonotonicLogStore type assertion to delete all
entries from the LogStore after snapshot restore.