Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: initialize node ID assignment counter #7821

Merged
merged 2 commits into from
Dec 19, 2022

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Dec 16, 2022

We previously weren't initializing the node ID assignment counter with the node IDs assigned to the cluster founders. This PR bumps the counter with the node IDs available at initial cluster bootstrap. This fix is adjacent to the one in #7789 -- it's just another layer of protection.

This PR also adds a test that restacks a cluster in random order, ensuring that no node IDs are reused, which would have caught this issue.

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

Release Notes

Bug Fixes

  • Node ID assignment will now use node IDs higher than those used by the initial seed servers, avoiding a potential node ID re-use in cases where a seed server is fully decommissioned before adding a new node.

The counter isn't currently initialized with any of the node IDs that
are assigned to seed servers. This can result in a seed server's node ID
being reused after it's been decommissioned.

This commit adds a layer of protection by initializing the node ID
counter to just past the highest assigned node ID.

I updated some of the existing tests with a stricter check that would
have caught this issue.

Another approach to protect this case is to check the removed set of
nodes when assigning a node ID. A separate change will add that.
This commit adds a test that wipes and restarts a cluster one node at a
time in sequence with a workload running, leveraging the new features
that enable homogeneous node configuration.
@andrewhsu
Copy link
Member

andrewhsu commented Dec 17, 2022

@andrwng i tried to rebuild the buildkite job but getting error on attempt to checkout a83dfce. i think the build job did not pick up your force-push. when you get the chance, can you push or force-push a new commit again to trigger a new build with a new sha?

nevermind. i was clicking on the Rebuild button for the a83dfce job instead of the 0f3438c job.

@mmaslankaprv mmaslankaprv added this to the v22.3.9 milestone Dec 19, 2022
@andrwng andrwng merged commit d399c7e into redpanda-data:dev Dec 19, 2022
@andrwng
Copy link
Contributor Author

andrwng commented Dec 19, 2022

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x c4fe53e7d9ed4372c539b540ee6d31aa97fbe1ae 0f3438c5422493ab9a1237766cc87bd7b520207d

Workflow run logs.

andrwng added a commit that referenced this pull request Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants