Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: fix a case where version updates weren't generated #8902

Merged
merged 1 commit into from
Feb 20, 2023

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Feb 15, 2023

This could result in clusters failing to activate features after an upgrade, if the controller leadership changed at just the wrong moment.

Fixes: #8758

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

None

Release Notes

Bug Fixes/

  • An issue is fixed where a cluster might not activate new functionality after an upgrade if a controller leadership change happened at a particular point during upgrade

@jcsp jcsp added kind/bug Something isn't working area/controller labels Feb 15, 2023
Calculating deltas on health reports was not robust,
because after a leadership change theq controller can come
up on a node that already has a health report showing
a newer version, i.e. no delta wrt the next report.

Instead, generate an update to _node_versions whenever
we see a version that is newer than that in the map, and
clear _node_versions on leadership/term changes.  This
guarantees that within each controller term, we will
pay attention to node versions from all nodes until we
have accumulated a version from each node, and thereafter
we will only submit updates if the version in a health
report is newer than the one in _node_versions.

Fixes: redpanda-data#8758
@jcsp jcsp force-pushed the issue-8758-feature-node-version branch from 21fab3b to f54c7c9 Compare February 15, 2023 14:47
@jcsp
Copy link
Contributor Author

jcsp commented Feb 15, 2023

I did a first cut of this that explicitly tracked the controller term of each node's most recent update (jcsp@21fab3b) but that felt a bit over-complex. The approach in this PR should be easier to reason about.

@jcsp
Copy link
Contributor Author

jcsp commented Feb 16, 2023

Test failures:

Copy link
Member

@mmaslankaprv mmaslankaprv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@jcsp jcsp merged commit da2de23 into redpanda-data:dev Feb 20, 2023
@jcsp jcsp deleted the issue-8758-feature-node-version branch February 20, 2023 10:47
@jcsp
Copy link
Contributor Author

jcsp commented Feb 20, 2023

/backport v22.3.x

@jcsp
Copy link
Contributor Author

jcsp commented Feb 20, 2023

/backport v22.2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants