storage: clean up staging files on deletion #7912

andrwng · 2022-12-22T06:54:52Z

If the generation_id is updated while we write the compaction output, we
end up returning early without keeping track of the staging files. This
could result in files being left over, even after removal of the
partition since we currently don't allow removing the NTP directory
while any unexpected files exist.

This PR addresses this in two ways:

by removing all files suffixed with ".staging" when a partition is deleted
by immediately removing staging files if exiting out of compaction early

The latter approach as implemented by this PR doesn't completely cover every instance of aborted compactions, just the ones seen in the wild and commonly seen in the storage unit test. Tackling this more holistically will be a broader change that will take more time and be harder to backport.

Backports Required

UX Changes

Release Notes

Bug Fixes

Files left over from aborted compactions will now be cleaned up more robustly.

andrwng · 2022-12-22T06:58:46Z

Context here is that there's a cluster that has several orphaned files, and in their logs I see:

2022-12-09T21:30:56+09:00 {} 2022-12-09T12:30:56.667153985Z stderr F INFO 2022-12-09 12:30:56,666 [shard 0] storage-gc - disk_log_impl.cc:529 - Aborting compaction of a segment: {offset_tracker:{term:1, base_offset:117199576, committed_offset:133010063, dirty_offset:133010063}, compacted_segment=1, finished_self_compaction=1, generation={68371}, reader={/var/lib/redpanda/data/<ntp>/1_173872/117199576-1-v1.log, (1932550142 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/<ntp>/1_173872/117199576-1-v1.base_index, offsets:{0}, index:{header_bitflags:0, base_offset:{0}, max_offset:{133010063}, base_timestamp:{timestamp: 1670492269909}, max_timestamp:{timestamp: 1670501374930}, index(52261,52261,52261)}, step:32768, needs_persistence:0}}. Generation id mismatch, previous generation: 68370

In my runs of the test, I couldn't reliably reproduce the adjacent segment compaction merging abort that I expected to, but I'm fairly certain this is these code path being hit. Open to further test suggestions.

I also considered making a more wholistic change that passed an out-parameter to callers the populate files to clean up, but opted to go with a less invasive approach to start.

jcsp · 2022-12-22T10:13:21Z

src/v/storage/tests/storage_e2e_test.cc

@@ -2620,6 +2620,12 @@ FIXTURE_TEST(write_truncate_compact, storage_test_fixture) {
    info("produce_done");
    truncate.get();
    info("truncate_done");
+
+    // Ensure we've cleaned up all our staging segments such that a removal of


Is it possible to do a more localized unit test of the compaction code that twiddles the generation to force the abort path and validates deletion? Perhaps not, but it would be nice to have a test that deterministically exercises it.

Agreed, it'd be nice to have a better way to reproduce this bug, though I changed approaches so a more targeted test makes a bit less sense for this PR.

piyushredpanda · 2022-12-23T06:43:01Z

Would be awesome to get this in for v22.3.10, scheduled 6h Jan, @andrwng

andrwng · 2022-12-23T17:55:52Z

Would be awesome to get this in for v22.3.10, scheduled 6h Jan, @andrwng

Will keep that in mind. I needs some updates though; after injecting the failure I'm still seeing some files leftover.

andrwng · 2023-01-05T23:18:35Z

Context here is that there's a cluster that has several orphaned files, and in their logs I see:

2022-12-09T21:30:56+09:00 {} 2022-12-09T12:30:56.667153985Z stderr F INFO 2022-12-09 12:30:56,666 [shard 0] storage-gc - disk_log_impl.cc:529 - Aborting compaction of a segment: {offset_tracker:{term:1, base_offset:117199576, committed_offset:133010063, dirty_offset:133010063}, compacted_segment=1, finished_self_compaction=1, generation={68371}, reader={/var/lib/redpanda/data/<ntp>/1_173872/117199576-1-v1.log, (1932550142 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/<ntp>/1_173872/117199576-1-v1.base_index, offsets:{0}, index:{header_bitflags:0, base_offset:{0}, max_offset:{133010063}, base_timestamp:{timestamp: 1670492269909}, max_timestamp:{timestamp: 1670501374930}, index(52261,52261,52261)}, step:32768, needs_persistence:0}}. Generation id mismatch, previous generation: 68370

In my runs of the test, I couldn't reliably reproduce the adjacent segment compaction merging abort that I expected to, but I'm fairly certain this is these code path being hit. Open to further test suggestions.

I also considered making a more wholistic change that passed an out-parameter to callers the populate files to clean up, but opted to go with a less invasive approach to start.

In manually twiddling the generation ID condition to always trigger the aborted adjacent segment compaction path, I found more edge cases in cleanup that made this change a bit trickier. To boot, I found myself chasing down a staging file that I ultimately couldn't find the source of (and thus couldn't find a place to clean it up). Our implementation of using staging files seems a little brittle, so if going down the route of cleaning up after abort, perhaps we should tackle this even more holistically (e.g. alongside any crash-consistency efforts).

For now, I've changed approaches to just clean up staging files on removal. It's not the best approach, but it is an improvement over what we have today.

EDIT: as I was typing this up, I felt a nagging that we should still be doing some cleanup if we can, so I've also brought back the cleanup in the initial draft.

andrwng · 2023-01-06T03:08:44Z

ducktape test failure: #8072

andrwng · 2023-01-06T16:05:32Z

CI failure is #8084

jcsp · 2023-01-06T16:17:25Z

src/v/storage/segment_utils.cc

@@ -470,6 +470,10 @@ ss::future<std::optional<size_t>> do_self_compact_segment(
          "generation: {}, skipping compaction",
          s->get_generation_id(),
          segment_generation);
+        const ss::sstring staging_file = s->reader().path().to_staging();
+        if (co_await ss::file_exists(staging_file)) {
+            co_await ss::remove_file(staging_file);


Let's log an error here so that we notice if it is happening in tests.

Logged at the removal site (here just indicates concurrent compaction which isn't problematic)

If the generation_id is updated while we write the compaction output, we end up returning early without keeping track of the staging files. This could result in files being left over, even after removal of the partition since we currently don't allow removing the NTP directory while any unexpected files exist. This commit addresses this by removing all files suffixed with ".staging" when a partition is deleted. I considered an alternate fix wherein we kept track of all staging files while compacting, but opted to scrap the approach, as it became a fairly invasive change with several edge cases (e.g. staging files when compacting a staged segment), and this fix will likely need to be backported, so a simpler approach is preferrable.

If the generation_id is updated while we write the compaction output, we end up returning early without keeping track of the staging files. This could result in files being left over, even after removal of the partition. This commit addresses this by immediately removing files that may go unused upon exiting early out of a compaction due to a generation ID mismatch.

dotnwat · 2023-01-08T22:06:03Z

src/v/storage/disk_log_impl.cc

+                if (co_await ss::file_exists(ss::sstring(f))) {
+                    co_await ss::remove_file(ss::sstring(f));


if the race condition here is a concern, you could call remove_file and ignore an exception containing an ENOENT error.

Yeah this seems like a good idea. It's unclear to what extent these operations race with one another, but I can imagine there being some race with truncation that results in weird behavior. Will revisit this, since it looks like there are still some leftover files.

jcsp · 2023-01-10T14:37:13Z

/backport v22.3.x

daisukebe · 2023-01-11T08:51:17Z

/backport v22.2.x

github-actions bot added the area/redpanda label Dec 22, 2022

andrwng requested a review from mmaslankaprv December 22, 2022 06:56

jcsp reviewed Dec 22, 2022

View reviewed changes

jcsp previously approved these changes Dec 22, 2022

View reviewed changes

piyushredpanda added this to the v22.3.10 milestone Dec 22, 2022

storage: coroutinize log_manager::remove()

30e6bad

andrwng dismissed jcsp’s stale review via 81da6d3 January 5, 2023 23:09

andrwng force-pushed the storage-cleanup branch from 37f6726 to 81da6d3 Compare January 5, 2023 23:09

andrwng changed the title ~~storage: clean up after aborted compaction~~ storage: clean up staging files on deletion Jan 5, 2023

andrwng marked this pull request as ready for review January 5, 2023 23:30

andrwng force-pushed the storage-cleanup branch from e9d2041 to 68a4a24 Compare January 5, 2023 23:31

jcsp reviewed Jan 6, 2023

View reviewed changes

andrwng force-pushed the storage-cleanup branch from 68a4a24 to a3720c0 Compare January 6, 2023 17:07

andrwng added 2 commits January 6, 2023 09:11

andrwng force-pushed the storage-cleanup branch from a3720c0 to 4a2cd2c Compare January 6, 2023 17:11

dotnwat reviewed Jan 8, 2023

View reviewed changes

piyushredpanda modified the milestones: v22.3.10, v22.3.x-next Jan 9, 2023

jcsp approved these changes Jan 10, 2023

View reviewed changes

jcsp merged commit efe7d8a into redpanda-data:dev Jan 10, 2023

vbotbuildovich mentioned this pull request Jan 10, 2023

[v22.3.x] storage: clean up staging files on deletion #8140

Merged

jcsp mentioned this pull request Jan 10, 2023

Failure in storage_single_thread_rpunit (write_truncate_compact) #8153

Closed

vbotbuildovich mentioned this pull request Jan 11, 2023

[v22.2.x] storage: clean up staging files on deletion #8166

Merged

jcsp mentioned this pull request Jan 11, 2023

storage_e2e_test: silence botched remove() in test #8163

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: clean up staging files on deletion #7912

storage: clean up staging files on deletion #7912

andrwng commented Dec 22, 2022 •

edited

andrwng commented Dec 22, 2022 •

edited

jcsp Dec 22, 2022

andrwng Jan 5, 2023

piyushredpanda commented Dec 23, 2022

andrwng commented Dec 23, 2022

andrwng commented Jan 5, 2023 •

edited

andrwng commented Jan 6, 2023

andrwng commented Jan 6, 2023

jcsp Jan 6, 2023

andrwng Jan 6, 2023

dotnwat Jan 8, 2023

andrwng Jan 12, 2023

jcsp commented Jan 10, 2023

daisukebe commented Jan 11, 2023

		if (co_await ss::file_exists(ss::sstring(f))) {
		co_await ss::remove_file(ss::sstring(f));

storage: clean up staging files on deletion #7912

storage: clean up staging files on deletion #7912

Conversation

andrwng commented Dec 22, 2022 • edited

Backports Required

UX Changes

Release Notes

Bug Fixes

andrwng commented Dec 22, 2022 • edited

jcsp Dec 22, 2022

Choose a reason for hiding this comment

andrwng Jan 5, 2023

Choose a reason for hiding this comment

piyushredpanda commented Dec 23, 2022

andrwng commented Dec 23, 2022

andrwng commented Jan 5, 2023 • edited

andrwng commented Jan 6, 2023

andrwng commented Jan 6, 2023

jcsp Jan 6, 2023

Choose a reason for hiding this comment

andrwng Jan 6, 2023

Choose a reason for hiding this comment

dotnwat Jan 8, 2023

Choose a reason for hiding this comment

andrwng Jan 12, 2023

Choose a reason for hiding this comment

jcsp commented Jan 10, 2023

daisukebe commented Jan 11, 2023

andrwng commented Dec 22, 2022 •

edited

andrwng commented Dec 22, 2022 •

edited

andrwng commented Jan 5, 2023 •

edited