storage: move ghost batch generation into log reader #16721

andrwng · 2024-02-27T02:47:13Z

Ghost batches are batches that don't contain data whose purpose is to
ensure:

the caller (recovery_stm) sees a contiguous set of offsets
the terms of the returned batches match with the original terms
provided at append time, regardless of whether compactions have
happened

These requirements are currently expected to be honored by the Raft
layer at recovery time, where it currently generates batches in between
batches read from storage. Since the storage layer is the current owner
of term metadata and compactions, the fact that these ghost batches are
generated by the Raft layer makes it trickier to evolve the underlying
storage implementation, since the guarantees would need to be tested /
validated at multiple layers.

So, to support development and testing of future log implementations,
this patch pulls out ghost batch generation into the storage layer as an
option to the log_reader.

Backports Required

Release Notes

none

I have an upcoming change that will iterate through a randomized complete list (i.e. all numbers between A and B, in random order). This adds a method to generate such a list.

Ghost batches are batches that don't contain data whose purpose is to ensure: - the caller (recovery_stm) sees a contiguous set of offsets - the terms of the returned batches match with the original terms provided at append time, regardless of whether compactions have happened These requirements are currently expected to be honored by the Raft layer at recovery time, where it currently generates batches in between batches read from storage. Since the storage layer is the current owner of term metadata and compactions, the fact that these ghost batches are generated by the Raft layer makes it trickier to evolve the underlying storage implementation, since the guarantees would need to be tested / validated at multiple layers. So, to support development and testing of future log implementations, this patch pulls out ghost batch generation into the storage layer as an option to the log_reader.

andrwng · 2024-02-27T02:54:16Z

/ci-repeat

vbotbuildovich · 2024-02-27T18:37:27Z

new failures in https://buildkite.com/redpanda/redpanda/builds/45411#018deb99-4dff-4dcf-8fb3-1e3c6ba0cf90:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_missing_partition.cloud_storage_type=CloudStorageType.ABS"

vbotbuildovich · 2024-02-27T18:42:46Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45411#018deb99-4dfc-497b-92ec-6b6b2caeb0a0

andrwng · 2024-02-28T02:28:28Z

CI failure: #16764

dotnwat

this looks good, but there is something i don't understand yet.

the motivation here seems to be that storage contains all the term info/compaction info etc... so that makes it better suited to deal with ghost batches as storage implementation evolves. additionally, it seems like there is a simplification and potential performance benefit since raft doesn't need to do the read and then go back and fill in the gaps as a second pass.

however, i don't see why storage needs to do this from the point of view of making it easier to evolve the implementation because currently it seems like raft depends on information in batches that will be be the same both before and after storage evolution project.

maybe there is a specific change in the evolution project that fills in this knowledge gap (yep)?

andrwng · 2024-02-28T05:15:09Z

this looks good, but there is something i don't understand yet.

the motivation here seems to be that storage contains all the term info/compaction info etc... so that makes it better suited to deal with ghost batches as storage implementation evolves. additionally, it seems like there is a simplification and potential performance benefit since raft doesn't need to do the read and then go back and fill in the gaps as a second pass.

however, i don't see why storage needs to do this from the point of view of making it easier to evolve the implementation because currently it seems like raft depends on information in batches that will be be the same both before and after storage evolution project.

maybe there is a specific change in the evolution project that fills in this knowledge gap (yep)?

You're right that this isn't needed for the goal of an MVCC-versioned log. The aspect of the storage format evolution this helps with is when we get to implementing compaction that can remove entire terms' worth of batches -- once we get to that, the owner of term metadata should be the one to create ghost batches

random: method to generate shuffled sequence

9f7034b

I have an upcoming change that will iterate through a randomized complete list (i.e. all numbers between A and B, in random order). This adds a method to generate such a list.

github-actions bot added the area/redpanda label Feb 27, 2024

andrwng force-pushed the storage-ghost-batch-reader branch from 9411cc8 to 6da7eb8 Compare February 27, 2024 02:48

andrwng force-pushed the storage-ghost-batch-reader branch from 6da7eb8 to 9a8d255 Compare February 27, 2024 02:51

andrwng marked this pull request as ready for review February 27, 2024 16:03

andrwng requested review from dotnwat, bharathv and mmaslankaprv February 27, 2024 16:04

dotnwat reviewed Feb 28, 2024

View reviewed changes

mmaslankaprv approved these changes Feb 28, 2024

View reviewed changes

dotnwat approved these changes Feb 28, 2024

View reviewed changes

dotnwat merged commit 5b8dc1e into redpanda-data:dev Feb 28, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: move ghost batch generation into log reader #16721

storage: move ghost batch generation into log reader #16721

andrwng commented Feb 27, 2024

andrwng commented Feb 27, 2024

vbotbuildovich commented Feb 27, 2024

vbotbuildovich commented Feb 27, 2024

andrwng commented Feb 28, 2024

dotnwat left a comment

andrwng commented Feb 28, 2024

storage: move ghost batch generation into log reader #16721

storage: move ghost batch generation into log reader #16721

Conversation

andrwng commented Feb 27, 2024

Backports Required

Release Notes

andrwng commented Feb 27, 2024

vbotbuildovich commented Feb 27, 2024

vbotbuildovich commented Feb 27, 2024

andrwng commented Feb 28, 2024

dotnwat left a comment

Choose a reason for hiding this comment

andrwng commented Feb 28, 2024