-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using node_status_table
to determine node unavailability in partition balancer
#11146
Using node_status_table
to determine node unavailability in partition balancer
#11146
Conversation
d968578
to
d281b15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.. probably good to have Alexey do another pass.
d281b15
to
d31c137
Compare
for (const auto& follower : follower_metrics) { | ||
auto unavailable_dur = now - follower.last_heartbeat; | ||
auto node_status = _state.node_status().get_node_status(id); | ||
// node status is not yet available, wait for it to be updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, we should cover the case when a remote node didn't respond to pings at all since the current node startup. I guess we should use time when we discovered the node through the members table as last_seen in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will open a follow up pr with the changes in node_status
subsystem it seems that we also do not clean the status table when node is removed from the cluster
Since `partition_balancer_backend` is going to use the `node_status_table` to recognize unavailable node it has be has an access to it. Signed-off-by: Michal Maslanka <michal@redpanda.com>
a4bed64
to
b1edff0
Compare
Using `node_status_table` last heartbeat timestamps to determine if a node is unavailable. Previously `partition_balancer_planner` was using follower state coming from the controller raft group. Signed-off-by: Michal Maslanka <michal@redpanda.com>
b1edff0
to
a66fccb
Compare
Using
node_status_table
last heartbeat timestamps to determine if anode is unavailable. Previously
partition_balancer_planner
was usingfollower state coming from the controller raft group.
Fixes: #7218
Backports Required
Release Notes