-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored cluster::partition_leaders_table
to use hierarchical structure of metadata
#16512
Refactored cluster::partition_leaders_table
to use hierarchical structure of metadata
#16512
Conversation
af1fc75
to
55841f5
Compare
/dt |
55841f5
to
716f02a
Compare
/dt |
716f02a
to
8d1ccf6
Compare
/dt |
8d1ccf6
to
1ad7040
Compare
/dt |
1 similar comment
/dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, hope you don't mind the drive by review 😄
d7d76c5
to
51b2159
Compare
51b2159
to
cbc2821
Compare
@@ -109,6 +109,10 @@ class partition_leaders_table { | |||
|
|||
leaders_info_t get_leaders() const; | |||
|
|||
uint64_t leaderless_partition_count() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if there were unit tests for that we bookkeep this right, but I guess there are no existing tests for partition leaders table, so maybe we should file a ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking about this as well, let me figure out something
cbc2821
to
5945d9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, once we are close to merge I can get a flamegraph from a high partition load again.
ntp); | ||
return; | ||
} | ||
const model::ntp ntp(t_it->first.ns, t_it->first.tp, p_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This potentially voids the benefits that we are getting as it likely causes an alloc + a memcpy.
It seems to be used:
- In the trace logs: Can we just was pass in a ref to the ns_tp and then use that plus the p_id in the log statements?
- Later down in the call to
_watchers.notify
: Can we just construct it then or work around somehow else (passing ntp twice there seems weird). I expect that if statement to be rare anyway.
e8eca04
to
639aa60
Compare
Added tracking the number of leader less partitions in leaders table. This prevents iterating over the whole list of leaders when generating cluster metrics. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added tracking version of partition leaders table to be able to identify concurrent modification. This will allow yielding while iterating the leaders table. If a table is modified during operation an exception is thrown and operation can be retried. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Leveraging the hierarchical structure of node health report and internals of partition leaders table to minimize the number of lookups in leaders map. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Previously `get_leadership_reply` did not contain any information about the operation state, therefore it was impossible to propagate service error to the client. Added a field indicating if response is successful. The field allow us to explicitly handle errors like partition leaders table concurrent modification. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Using `ntp_callbacks` to wait for the leaders without additional promises map. When caller requests to wait for a leader we register the notification which sets the promise value when called. This way we do not need a separate mechanism to keep track of leadership notifications. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Replaced previously used `chunked_fifo` with dynamically sized fragmented vectors. Fragemented vector provides a random access iterator and automatically controls the size of allocated chunks Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added methods allowing `fragmented_vector::iter` to satisfy `std::random_random__iterator` concept. Signed-off-by: Michal Maslanka <michal@redpanda.com>
461ff36
to
707a58e
Compare
Using async algorithm will call `ss::coroutine::maybe_yield()` every 100 operations while still being lightweight while iterating over synchronously over a chunk. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
707a58e
to
8948fe5
Compare
/backport v23.3.x |
/backport v23.2.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
Failed to create a backport PR to v23.2.x branch. I tried:
|
Using a hierarchical data structure to store leader metadata in
cluster::partition_leaders_table
. Using a map of maps to store topic partition information as a value in top level topic only map. This way we do not need to keep a separate copy of topic name for each partition and can leverage the same hierarchical structure ofnode_health_report
to reduce number of hash table look ups.Fixes: https://github.com/redpanda-data/core-internal/issues/1061
Backports Required
Release Notes
Improvements