Using `node_status_table` to determine node unavailability in partition balancer #11146

mmaslankaprv · 2023-06-01T15:16:44Z

Using node_status_table last heartbeat timestamps to determine if a
node is unavailable. Previously partition_balancer_planner was using
follower state coming from the controller raft group.

Fixes: #7218

Backports Required

Release Notes

none

bharathv

lgtm.. probably good to have Alexey do another pass.

src/v/cluster/partition_balancer_backend.h

src/v/cluster/partition_balancer_state.h

ztlpn · 2023-06-02T11:16:19Z

src/v/cluster/partition_balancer_planner.cc

-    for (const auto& follower : follower_metrics) {
-        auto unavailable_dur = now - follower.last_heartbeat;
+        auto node_status = _state.node_status().get_node_status(id);
+        // node status is not yet available, wait for it to be updated


As discussed on slack, we should cover the case when a remote node didn't respond to pings at all since the current node startup. I guess we should use time when we discovered the node through the members table as last_seen in this case.

i will open a follow up pr with the changes in node_status subsystem it seems that we also do not clean the status table when node is removed from the cluster

Since `partition_balancer_backend` is going to use the `node_status_table` to recognize unavailable node it has be has an access to it. Signed-off-by: Michal Maslanka <michal@redpanda.com>

Using `node_status_table` last heartbeat timestamps to determine if a node is unavailable. Previously `partition_balancer_planner` was using follower state coming from the controller raft group. Signed-off-by: Michal Maslanka <michal@redpanda.com>

mmaslankaprv · 2023-06-05T15:59:47Z

unrelated ci failures:

github-actions bot added the area/redpanda label Jun 1, 2023

mmaslankaprv requested review from ztlpn and bharathv June 1, 2023 15:20

mmaslankaprv force-pushed the partition-balancer-node-status branch from d968578 to d281b15 Compare June 1, 2023 17:31

bharathv previously approved these changes Jun 1, 2023

View reviewed changes

mmaslankaprv dismissed bharathv’s stale review via d31c137 June 2, 2023 07:29

mmaslankaprv force-pushed the partition-balancer-node-status branch from d281b15 to d31c137 Compare June 2, 2023 07:29

mmaslankaprv requested a review from bharathv June 2, 2023 09:22

ztlpn reviewed Jun 2, 2023

View reviewed changes

c/balancer_state: wire node_status_table into balancer state

d437763

Since `partition_balancer_backend` is going to use the `node_status_table` to recognize unavailable node it has be has an access to it. Signed-off-by: Michal Maslanka <michal@redpanda.com>

mmaslankaprv force-pushed the partition-balancer-node-status branch 2 times, most recently from a4bed64 to b1edff0 Compare June 2, 2023 12:48

mmaslankaprv force-pushed the partition-balancer-node-status branch from b1edff0 to a66fccb Compare June 2, 2023 12:57

ztlpn approved these changes Jun 2, 2023

View reviewed changes

mmaslankaprv merged commit a776aaa into redpanda-data:dev Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `node_status_table` to determine node unavailability in partition balancer #11146

Using `node_status_table` to determine node unavailability in partition balancer #11146

mmaslankaprv commented Jun 1, 2023 •

edited

Loading

bharathv left a comment

ztlpn Jun 2, 2023

mmaslankaprv Jun 2, 2023

mmaslankaprv commented Jun 5, 2023

Using node_status_table to determine node unavailability in partition balancer #11146

Using node_status_table to determine node unavailability in partition balancer #11146

Conversation

mmaslankaprv commented Jun 1, 2023 • edited Loading

Backports Required

Release Notes

bharathv left a comment

Choose a reason for hiding this comment

ztlpn Jun 2, 2023

Choose a reason for hiding this comment

mmaslankaprv Jun 2, 2023

Choose a reason for hiding this comment

mmaslankaprv commented Jun 5, 2023

Using `node_status_table` to determine node unavailability in partition balancer #11146

Using `node_status_table` to determine node unavailability in partition balancer #11146

mmaslankaprv commented Jun 1, 2023 •

edited

Loading