Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored cluster::partition_leaders_table to use hierarchical structure of metadata #16512

Merged
merged 16 commits into from
Feb 26, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Feb 7, 2024

Using a hierarchical data structure to store leader metadata in cluster::partition_leaders_table. Using a map of maps to store topic partition information as a value in top level topic only map. This way we do not need to keep a separate copy of topic name for each partition and can leverage the same hierarchical structure of node_health_report to reduce number of hash table look ups.

Fixes: https://github.com/redpanda-data/core-internal/issues/1061

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Improvements

  • optimized updating leadership metadata with health reports

@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

1 similar comment
@mmaslankaprv
Copy link
Member Author

/dt

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, hope you don't mind the drive by review 😄

src/v/cluster/partition_leaders_table.cc Outdated Show resolved Hide resolved
src/v/cluster/partition_leaders_table.cc Outdated Show resolved Hide resolved
src/v/cluster/partition_leaders_table.h Outdated Show resolved Hide resolved
src/v/cluster/metadata_dissemination_service.cc Outdated Show resolved Hide resolved
@rockwotj rockwotj mentioned this pull request Feb 8, 2024
7 tasks
@mmaslankaprv mmaslankaprv force-pushed the hierarchy-leaders-table branch 2 times, most recently from d7d76c5 to 51b2159 Compare February 9, 2024 07:14
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Feb 9, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Feb 9, 2024
@mmaslankaprv mmaslankaprv marked this pull request as ready for review February 9, 2024 10:34
@@ -109,6 +109,10 @@ class partition_leaders_table {

leaders_info_t get_leaders() const;

uint64_t leaderless_partition_count() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if there were unit tests for that we bookkeep this right, but I guess there are no existing tests for partition leaders table, so maybe we should file a ticket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking about this as well, let me figure out something

src/v/cluster/metadata_dissemination_types.h Outdated Show resolved Hide resolved
Copy link
Member

@StephanDollberg StephanDollberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, once we are close to merge I can get a flamegraph from a high partition load again.

ntp);
return;
}
const model::ntp ntp(t_it->first.ns, t_it->first.tp, p_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This potentially voids the benefits that we are getting as it likely causes an alloc + a memcpy.

It seems to be used:

  • In the trace logs: Can we just was pass in a ref to the ns_tp and then use that plus the p_id in the log statements?
  • Later down in the call to _watchers.notify: Can we just construct it then or work around somehow else (passing ntp twice there seems weird). I expect that if statement to be rare anyway.

src/v/cluster/partition_leaders_table.cc Outdated Show resolved Hide resolved
src/v/cluster/partition_leaders_table.cc Outdated Show resolved Hide resolved
src/v/cluster/partition_leaders_table.h Outdated Show resolved Hide resolved
@mmaslankaprv mmaslankaprv force-pushed the hierarchy-leaders-table branch 3 times, most recently from e8eca04 to 639aa60 Compare February 10, 2024 13:46
Added tracking the number of leader less partitions in leaders table.
This prevents iterating over the whole list of leaders when generating
cluster metrics.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added tracking version of partition leaders table to be able to identify
concurrent modification. This will allow yielding while iterating the
leaders table. If a table is modified during operation an exception is
thrown and operation can be retried.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Leveraging the hierarchical structure of node health report and
internals of partition leaders table to minimize the number of lookups
in leaders map.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Previously `get_leadership_reply` did not contain any information about
the operation state, therefore it was impossible to propagate service
error to the client. Added a field indicating if response is successful.

The field allow us to explicitly handle errors like partition leaders
table concurrent modification.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Using `ntp_callbacks` to wait for the leaders without additional
promises map. When caller requests to wait for a leader we register the
notification which sets the promise value when called. This way we do
not need a separate mechanism to keep track of leadership notifications.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Replaced previously used `chunked_fifo` with dynamically sized
fragmented vectors. Fragemented vector provides a random access iterator
and automatically controls the size of allocated chunks

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added methods allowing `fragmented_vector::iter` to satisfy
`std::random_random__iterator` concept.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Using async algorithm will call `ss::coroutine::maybe_yield()` every 100
operations while still being lightweight while iterating over
synchronously over a chunk.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv merged commit 60d475e into redpanda-data:dev Feb 26, 2024
16 checks passed
@mmaslankaprv mmaslankaprv deleted the hierarchy-leaders-table branch February 26, 2024 07:45
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16512-v23.3.x-160 remotes/upstream/v23.3.x
git cherry-pick -x 2410d675c0228c00b8811f3aeb01d3472baa51ae 0d946e06cf9effedd2ec7097c4c0d4ba480bebde bd3cc30119409c54f768e713c65d5951e8ce373f b39795695f5b1a8fe2d0963ac47ac85454466965 abe6173b21cb6e86323c48b8b8c6dd5d27fe35b8 fdeb6ac37ab757a13a458c2351f15443525b86bc 9a255f011290dd52842ac7f1d1170521caba0dab aeab006f49f9c1daff9f7c3dfb9756c6277e34a7 2db09f95fd654eabc88f4c900fcbfa817ffeceeb cddc003d3daf75aa001ff7b0d402ffa5d1fa3a3b e37df0cb2f17915357b3622d1e96a9c635b77ec9 4eed31fd55411341997ceeb1af9e7a7f5eae5e5f 44e6cd7837e4e7060d1016f2f5f5f0f77a5d3db2 f4e6bce918c613707526223d9d21bb4a904b9a9d 9dbeb81913f63de12d5c4e0fd72f5c4b85e4d0d6 8948fe5c5214bca5e5011b0c45b4fa7df9ed3e60

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16512-v23.2.x-948 remotes/upstream/v23.2.x
git cherry-pick -x 2410d675c0228c00b8811f3aeb01d3472baa51ae 0d946e06cf9effedd2ec7097c4c0d4ba480bebde bd3cc30119409c54f768e713c65d5951e8ce373f b39795695f5b1a8fe2d0963ac47ac85454466965 abe6173b21cb6e86323c48b8b8c6dd5d27fe35b8 fdeb6ac37ab757a13a458c2351f15443525b86bc 9a255f011290dd52842ac7f1d1170521caba0dab aeab006f49f9c1daff9f7c3dfb9756c6277e34a7 2db09f95fd654eabc88f4c900fcbfa817ffeceeb cddc003d3daf75aa001ff7b0d402ffa5d1fa3a3b e37df0cb2f17915357b3622d1e96a9c635b77ec9 4eed31fd55411341997ceeb1af9e7a7f5eae5e5f 44e6cd7837e4e7060d1016f2f5f5f0f77a5d3db2 f4e6bce918c613707526223d9d21bb4a904b9a9d 9dbeb81913f63de12d5c4e0fd72f5c4b85e4d0d6 8948fe5c5214bca5e5011b0c45b4fa7df9ed3e60

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants