-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c/health: reduced number of health_node_report copies #17863
Conversation
a762ea0
to
46e3b81
Compare
auto serde_fields() { | ||
return std::tie( | ||
raft0_leader, node_states, node_reports, bytes_in_cloud_storage); | ||
cluster_health_report copy() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these copy()
functions actually used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests only
@@ -114,8 +114,8 @@ class health_monitor_backend { | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So most of the methods like get_cluster_health
, collect_current_node_health
etc still return a copy. Is that expected? Aren't those the most used methods to access the health report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed on slack:
get_cluster_health
returns the report struct which internall has the ptrs to the invidivual node reports.
get_current_node_health
still returns by value but those are not used on in the metadata update path.
46e3b81
to
062c807
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47792#018ee1df-5556-459f-a026-38304debb5e9 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47792#018ee1e7-0381-4848-8c36-cde1b1a5fdc9 |
062c807
to
cfae3ff
Compare
write(out, bytes_in_cloud_storage); | ||
} | ||
|
||
ss::future<> serde_async_read(iobuf_parser& in, const serde::header& h) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a lot going on in this one commit. is it all related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i know it is a lot of changes for one commit, it is indeed related as this one change (a type that we use to propagate cluster_health_reports
) is influencing a lot of other places like:
- serialization
- callers of health_monitor_frontend apis
- tests
Every time the `health_monitor_backend::get_cluster_health` was called we returned a copy of each node health report. The health report data structure maybe large as it contains all partition replica information. To reduce the number of copies while still not interrupting asynchronous iteration over the health report changed the API to return a list of shared pointers. Node health reports aren't modified in place by health monitor backend rather than that they are completely replaced. This makes it easy to share the underlying data without worrying about concurrent access. Signed-off-by: Michał Maślanka <michal@redpanda.com>
cfae3ff
to
226b9bf
Compare
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
Every time the
health_monitor_backend::get_cluster_health
was called we returned a copy of each node health report. The health report data structure maybe large as it contains all partition replica information.To reduce the number of copies while still not interrupting asynchronous iteration over the health report changed the API to return a list of shared pointers.
Node health reports aren't modified in place by health monitor backend rather than that they are completely replaced. This makes it easy to share the underlying data without worrying about concurrent access.
Backports Required
Release Notes
Improvements