-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v23.1.x] cloud_storage: fix HWM on read replicas #9770
Conversation
CONFLICT: - ducktae test parameterization - missing last_segments() call We were previously doing no offset translation of the HWM in read replicas, meaning the returned offset would almost always be higher than it should've been. This commit updates to translate based on the `delta_offset_end` of the last segment.
CONFLICT: - no last_segment() call in manifest The field is currently not expressive enough to report the HWM, the Kafka offset just past the end of a partition. Currently each segment records delta_offset_end as the difference between Kafka and RP offset at the end of the log: c: config batch d: data batch delta: computed by counting the number of non-data batches prior to the given offset kafka = model - delta type: c c d d d c c c c +-----------------------------------------+ +-----------+ | | | | | | | | | | | model:| 0 1 2 ... 138 139 140 | | 141 142 | | | | | | | | | | | | +-----------------------------------------+ +-----------+ delta: 0 1 2 ... 2 2 3 4 5 6 +-----------------------------------------+ +-----------+ | | | | | | | kafka: 0 0 0 1.. 136 137 137 | | 137 137 | | | | | | | | +-----------------------------------------+ +-----------+ In the above example, consider when the first segment ([0, 140]) is uploaded, and the second ([141, 142]) isn't. A read replica would previously report the HWM as 138 (140 - 3 + 1), when really it should be 137, since the highest data batch is 136. To translate the correct HWM, we would need to translate 141 and record its delta as 4, giving us enough information in metadata to tell that 140 is a config batch. To remediate this, this commit changes the existing delta_offset_end field to translate the next offset, and expecting the next Kafka offset to be translated with the next offset. Regarding backwards compatibility, conditionally increasing the delta_offset_end value means older data will over-report by 1. This should be safe, given the value is currently only used for optimizations that skip segments with no data batches. Fixes redpanda-data#9513
@@ -263,12 +263,12 @@ const model::offset partition_manifest::get_last_offset() const { | |||
} | |||
|
|||
const std::optional<kafka::offset> | |||
partition_manifest::get_last_kafka_offset() const { | |||
partition_manifest::get_next_kafka_offset() const { | |||
auto last_seg = last_segment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition_manifest::last_segment
doesn't exist on this version. Did you mean to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice spot, this was unintentional. I just removed it.
bc39573
to
c4bf22f
Compare
CONFLICT: - ducktape headers - fast_uploads flag not in this branch A fix is included in this patch series that fixes read replica high watermark translation. To validate this, this commit adds a test that performs a version from a previously broken version, and validates equality with respect to the source cluster.
c4bf22f
to
6bc80e1
Compare
v22.3.x backport: #9727 |
Pre-emptive backport of #9493
We were previously doing no offset translation of the HWM in read replicas, meaning the returned offset would almost always be higher than it should've been.
This PR updates the behavior of delta_offset_end to return the delta one past the end of each segment, allowing translation to occur by translating the next offset after the end with delta_offset_end (vs translating the end and adding 1).
The caveat here is reconciling the updated semantics with data written in older versions. We need to be mindful that older versions may have a delta_offset_end that is 1 lower than what it should be, meaning the reported HWM and next_kafka_offset calls may be 1 higher than they should be.
Looking at their usages, it seems like this should be a benign change, but it would be good to be wary of this moving forward.
Fixes #9513
Backports Required
Release Notes
Bug Fixes