Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r/offset_translator: remove unsafe bootstrap code #15919

Merged
merged 1 commit into from
Jan 8, 2024

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Jan 2, 2024

When the corresponding kvstore state is not found, we should recover offset_translator state from the log, not use configuration_manager state - the latter is incorrect because there are other batch types contributing to offset delta, not just configuration batches. This code is a vestige from the time when the separate offset_translator component was just introduced and its kvstore state needed to be bootstrapped from configuration_manager state.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@dotnwat
Copy link
Member

dotnwat commented Jan 3, 2024

recover offset_translator state from the log, not use configuration_manager state - the latter is incorrect because there are other batch types contributing to offset delta, not just configuration batches.

oh nice find. sounds like this could have (or was?) been a real headache!

This code is a vestige from the time when the separate offset_translator component was just introduced and its kvstore state needed to be bootstrapped from configuration_manager state.

Anything to do to remove the old bits or make them unlikely to be used incorrectly going forward?

@dotnwat
Copy link
Member

dotnwat commented Jan 3, 2024

Looks like a little build error

/__w/redpanda/redpanda/src/v/archival/tests/ntp_archiver_test.cc:960:56: error: too many arguments to function call, expected single argument 'reset', have 2 arguments
    tr.start(raft::offset_translator::must_reset::yes, {}).get();
    ~~~~~~~~                                           ^~
/__w/redpanda/redpanda/src/v/raft/offset_translator.h:64:18: note: 'start' declared here
    ss::future<> start(must_reset reset);

When the corresponding kvstore state is not found, we should recover offset_translator
state from the log, not use configuration_manager state - the latter is incorrect because
there are other batch types contributing to offset delta, not just configuration batches.
This code is a vestige from the time when the separate offset_translator component was
just introduced and its kvstore state needed to be bootstrapped from configuration_manager
state.
@ztlpn ztlpn force-pushed the fix-offset-translator-bootstrap branch from e5714ba to 8c38ccf Compare January 3, 2024 22:23
@ztlpn
Copy link
Contributor Author

ztlpn commented Jan 3, 2024

sounds like this could have (or was?) been a real headache!

Right, to my knowledge this issue hasn't occurred in real deployments - we are lucky that we checkpoint the state at the earliest possible opportunity, so that empty kvstore state is unlikely to occur for existing partitions. I was reminded of this piece of code when I was thinking about how to reset a buggy offset translator map that has been accidentally persisted - in theory just nuking kvstore state should be enough, but currently this is not the case.

a little build error

🤦‍♂️ fixed

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/43406#018cd622-adf9-41c8-ad82-1564496e42db:

"rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable"

@vbotbuildovich
Copy link
Collaborator

@ztlpn ztlpn merged commit de546c8 into redpanda-data:dev Jan 8, 2024
19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants