-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribute health report collection #17158
Conversation
Size of partition contained in the node health report doesn't have to be equal on all of the nodes. Change the health monitor test to account for that fact. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Change the health monitor logic to distribute the health report collection logic. Previously all the nodes queried the cluster health from the `redpanda/controller/0` partition leader. This put additional pressure on that node as it had to deal with serialization of node reports. Changed health report collection logic so that every node queries each other to collect its health report statistics. This way the overhead related with serialization and handling health report request is evenly distributed among all the nodes in the cluster. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46352#018e5112-4b55-4a31-8cac-a6589b54a3dc |
new failures in https://buildkite.com/redpanda/redpanda/builds/46352#018e5123-8908-4be6-a8d9-65917ff8736a:
new failures in https://buildkite.com/redpanda/redpanda/builds/46397#018e5380-ef83-48ca-9be7-49e0e580eae6:
|
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
/ci-repeat 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller leader likes this change (rest of the nodes don't :P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, so this is even fully backwards compatible as we just reuse the APIs that the controller was already using.
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
Change the health monitor backend logic to distribute the health report collection logic. Previously all the nodes queried the cluster health from the
redpanda/controller/0
partition leader. This put additional pressure on that node as it had to deal with serialization of node reports.Changed health report collection logic so that every node queries each other to collect its health report statistics. This way the overhead related with serialization and handling health report request is evenly distributed among all the nodes in the cluster.
Backports Required
Release Notes
Improvements