-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: clean up staging files on deletion #7912
Conversation
Context here is that there's a cluster that has several orphaned files, and in their logs I see:
In my runs of the test, I couldn't reliably reproduce the adjacent segment compaction merging abort that I expected to, but I'm fairly certain this is these code path being hit. Open to further test suggestions. I also considered making a more wholistic change that passed an out-parameter to callers the populate files to clean up, but opted to go with a less invasive approach to start. |
@@ -2620,6 +2620,12 @@ FIXTURE_TEST(write_truncate_compact, storage_test_fixture) { | |||
info("produce_done"); | |||
truncate.get(); | |||
info("truncate_done"); | |||
|
|||
// Ensure we've cleaned up all our staging segments such that a removal of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do a more localized unit test of the compaction code that twiddles the generation to force the abort path and validates deletion? Perhaps not, but it would be nice to have a test that deterministically exercises it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it'd be nice to have a better way to reproduce this bug, though I changed approaches so a more targeted test makes a bit less sense for this PR.
Would be awesome to get this in for v22.3.10, scheduled 6h Jan, @andrwng |
Will keep that in mind. I needs some updates though; after injecting the failure I'm still seeing some files leftover. |
37f6726
to
81da6d3
Compare
In manually twiddling the generation ID condition to always trigger the aborted adjacent segment compaction path, I found more edge cases in cleanup that made this change a bit trickier. To boot, I found myself chasing down a staging file that I ultimately couldn't find the source of (and thus couldn't find a place to clean it up). Our implementation of using staging files seems a little brittle, so if going down the route of cleaning up after abort, perhaps we should tackle this even more holistically (e.g. alongside any crash-consistency efforts). For now, I've changed approaches to just clean up staging files on removal. It's not the best approach, but it is an improvement over what we have today. EDIT: as I was typing this up, I felt a nagging that we should still be doing some cleanup if we can, so I've also brought back the cleanup in the initial draft. |
e9d2041
to
68a4a24
Compare
ducktape test failure: #8072 |
CI failure is #8084 |
@@ -470,6 +470,10 @@ ss::future<std::optional<size_t>> do_self_compact_segment( | |||
"generation: {}, skipping compaction", | |||
s->get_generation_id(), | |||
segment_generation); | |||
const ss::sstring staging_file = s->reader().path().to_staging(); | |||
if (co_await ss::file_exists(staging_file)) { | |||
co_await ss::remove_file(staging_file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's log an error here so that we notice if it is happening in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged at the removal site (here just indicates concurrent compaction which isn't problematic)
68a4a24
to
a3720c0
Compare
If the generation_id is updated while we write the compaction output, we end up returning early without keeping track of the staging files. This could result in files being left over, even after removal of the partition since we currently don't allow removing the NTP directory while any unexpected files exist. This commit addresses this by removing all files suffixed with ".staging" when a partition is deleted. I considered an alternate fix wherein we kept track of all staging files while compacting, but opted to scrap the approach, as it became a fairly invasive change with several edge cases (e.g. staging files when compacting a staged segment), and this fix will likely need to be backported, so a simpler approach is preferrable.
If the generation_id is updated while we write the compaction output, we end up returning early without keeping track of the staging files. This could result in files being left over, even after removal of the partition. This commit addresses this by immediately removing files that may go unused upon exiting early out of a compaction due to a generation ID mismatch.
a3720c0
to
4a2cd2c
Compare
if (co_await ss::file_exists(ss::sstring(f))) { | ||
co_await ss::remove_file(ss::sstring(f)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the race condition here is a concern, you could call remove_file
and ignore an exception containing an ENOENT error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems like a good idea. It's unclear to what extent these operations race with one another, but I can imagine there being some race with truncation that results in weird behavior. Will revisit this, since it looks like there are still some leftover files.
/backport v22.3.x |
/backport v22.2.x |
If the generation_id is updated while we write the compaction output, we
end up returning early without keeping track of the staging files. This
could result in files being left over, even after removal of the
partition since we currently don't allow removing the NTP directory
while any unexpected files exist.
This PR addresses this in two ways:
The latter approach as implemented by this PR doesn't completely cover every instance of aborted compactions, just the ones seen in the wild and commonly seen in the storage unit test. Tackling this more holistically will be a broader change that will take more time and be harder to backport.
Backports Required
UX Changes
Release Notes
Bug Fixes