Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] cloud_storage: fix HWM on read replicas #9770

Merged
merged 3 commits into from
Apr 3, 2023

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Mar 31, 2023

Pre-emptive backport of #9493

We were previously doing no offset translation of the HWM in read replicas, meaning the returned offset would almost always be higher than it should've been.

This PR updates the behavior of delta_offset_end to return the delta one past the end of each segment, allowing translation to occur by translating the next offset after the end with delta_offset_end (vs translating the end and adding 1).

The caveat here is reconciling the updated semantics with data written in older versions. We need to be mindful that older versions may have a delta_offset_end that is 1 lower than what it should be, meaning the reported HWM and next_kafka_offset calls may be 1 higher than they should be.

Looking at their usages, it seems like this should be a benign change, but it would be good to be wary of this moving forward.

Fixes #9513

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Bug Fixes

  • Fixed a bug that would result in read replicas reporting a high watermark that was too high.

CONFLICT:
- ducktae test parameterization
- missing last_segments() call

We were previously doing no offset translation of the HWM in read
replicas, meaning the returned offset would almost always be higher than
it should've been.

This commit updates to translate based on the `delta_offset_end` of the
last segment.
CONFLICT:
- no last_segment() call in manifest

The field is currently not expressive enough to report the HWM, the
Kafka offset just past the end of a partition. Currently each segment
records delta_offset_end as the difference between Kafka and RP offset
at the end of the log:

c:     config batch
d:     data batch
delta: computed by counting the number of non-data batches prior to the given
       offset

kafka = model - delta

 type:   c     c     d     d     d     c     c         c     c
      +-----------------------------------------+   +-----------+
      |     |     |     |     |     |     |     |   |     |     |
model:|  0     1     2    ...   138   139   140 |   | 141   142 |
      |     |     |     |     |     |     |     |   |     |     |
      +-----------------------------------------+   +-----------+
delta:   0     1     2    ...    2     2     3         4     5     6
      +-----------------------------------------+   +-----------+
      |                 |     |     |           |   |           |
kafka:   0     0     0    1..   136   137   137 |   | 137   137 |
      |                 |     |     |           |   |           |
      +-----------------------------------------+   +-----------+

In the above example, consider when the first segment ([0, 140]) is uploaded,
and the second ([141, 142]) isn't. A read replica would previously report the
HWM as 138 (140 - 3 + 1), when really it should be 137, since the highest data
batch is 136. To translate the correct HWM, we would need to translate 141 and
record its delta as 4, giving us enough information in metadata to tell that
140 is a config batch.

To remediate this, this commit changes the existing delta_offset_end field to
translate the next offset, and expecting the next Kafka offset to be translated
with the next offset.

Regarding backwards compatibility, conditionally increasing the
delta_offset_end value means older data will over-report by 1. This should be
safe, given the value is currently only used for optimizations that skip
segments with no data batches.

Fixes redpanda-data#9513
@@ -263,12 +263,12 @@ const model::offset partition_manifest::get_last_offset() const {
}

const std::optional<kafka::offset>
partition_manifest::get_last_kafka_offset() const {
partition_manifest::get_next_kafka_offset() const {
auto last_seg = last_segment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition_manifest::last_segment doesn't exist on this version. Did you mean to add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot, this was unintentional. I just removed it.

@andrwng andrwng force-pushed the v23.1.x-rrr-hwm branch 3 times, most recently from bc39573 to c4bf22f Compare March 31, 2023 18:52
CONFLICT:
- ducktape headers
- fast_uploads flag not in this branch

A fix is included in this patch series that fixes read replica high
watermark translation. To validate this, this commit adds a test that
performs a version from a previously broken version, and validates
equality with respect to the source cluster.
@andrwng andrwng merged commit 2215d7a into redpanda-data:v23.1.x Apr 3, 2023
@RafalKorepta RafalKorepta added this to the v23.1.5 milestone Apr 3, 2023
@vshtokman
Copy link
Contributor

v22.3.x backport: #9727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants