Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c/topic_table: replaced partition metadata map with chunked_vector #16919

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Mar 6, 2024

A set of partition ids in each topic is a monotonically increasing sequence of numbers without gaps. This allows us to use a vector instead of an associative container to keep the partition metadata in topics table. Usage of chunked_vector prevents large memory allocations.

Fixes: #15610

Benchmark results

std::map

test                             iterations      median         mad         min         max      allocs       tasks        inst
full_1000.SequentialFill           15959000    36.492ns     0.017ns    36.431ns    36.539ns       3.587       0.000       520.6
full_1000.RandomFill                8883000    70.880ns     0.036ns    70.844ns    70.952ns       3.587       0.000       448.6
full_1000.LookUp                    1622000    27.718ns     0.028ns    27.687ns    27.756ns       0.000       0.000       130.6
full_1000.Iterate                   1693000     3.894ns     0.002ns     3.886ns     3.897ns       0.000       0.000        16.2

full_100000.SequentialFill         13100000    43.760ns     0.140ns    43.620ns    44.167ns       3.629       0.000       634.1
full_100000.RandomFill              4300000   163.417ns     0.524ns   162.450ns   164.026ns       3.600       0.000       507.7
full_100000.LookUp                    17000   116.046ns     0.268ns   115.778ns   120.801ns       0.000       0.000       204.2
full_100000.Iterate                 1700000     7.345ns     0.039ns     7.306ns     7.674ns       0.000       0.000        16.0

half_full_1000.SequentialFill      13015500    39.139ns     0.011ns    39.129ns    39.162ns       3.620       0.000       504.2
half_full_1000.RandomFill           8263000    67.078ns     0.006ns    67.036ns    67.115ns       3.620       0.000       439.3
half_full_1000.LookUp               2993000    29.814ns     0.007ns    29.761ns    29.870ns       0.000       0.000       120.2
half_full_1000.Iterate              1642500     4.765ns     0.010ns     4.744ns     4.780ns       0.000       0.000        16.6

half_full_100000.SequentialFill    12550000    39.913ns     0.191ns    39.316ns    40.239ns       3.602       0.000       616.1
half_full_100000.RandomFill         4850000   137.204ns     0.510ns   135.549ns   139.806ns       3.556       0.000       494.3
half_full_100000.LookUp               33000    78.105ns     0.218ns    77.475ns    78.568ns       0.000       0.000       193.6
half_full_100000.Iterate            1650000     4.519ns     0.024ns     4.490ns     4.543ns       0.000       0.000        16.2

absl::btree_map

test                             iterations      median         mad         min         max      allocs       tasks        inst
full_1000.SequentialFill           13100000    54.911ns     0.020ns    54.878ns    54.936ns       2.846       0.000       680.1
full_1000.RandomFill                7910000    90.543ns     0.078ns    90.451ns    90.685ns       2.912       0.000       838.0
full_1000.LookUp                    1583000    37.979ns     0.039ns    37.940ns    38.048ns       0.000       0.000       138.5
full_1000.Iterate                   1692000     1.483ns     0.001ns     1.482ns     1.485ns       0.000       0.000        14.3

full_100000.SequentialFill         12000000    58.674ns     0.489ns    57.425ns    59.163ns       2.858       0.000       793.8
full_100000.RandomFill              4100000   191.376ns     0.681ns   190.658ns   193.918ns       2.882       0.000       925.3
full_100000.LookUp                    17000   137.310ns     1.853ns   135.036ns   140.049ns       0.000       0.000       216.5
full_100000.Iterate                 1700000     2.558ns     0.034ns     2.524ns     2.796ns       0.000       0.000        14.0

half_full_1000.SequentialFill      11447000    54.669ns     0.023ns    54.534ns    54.785ns       2.869       0.000       664.1
half_full_1000.RandomFill           7337000    86.871ns     0.025ns    86.816ns    86.928ns       2.940       0.000       827.9
half_full_1000.LookUp               2913000    35.055ns     0.025ns    35.026ns    35.096ns       0.000       0.000       120.8
half_full_1000.Iterate              1636500     1.794ns     0.002ns     1.792ns     1.801ns       0.000       0.000        14.6

half_full_100000.SequentialFill    11150000    55.771ns     0.317ns    54.961ns    56.307ns       2.821       0.000       779.1
half_full_100000.RandomFill         4500000   166.781ns     0.175ns   165.512ns   167.473ns       2.847       0.000       917.1
half_full_100000.LookUp               33000    93.257ns     0.231ns    93.026ns    94.207ns       0.000       0.000       198.6
half_full_100000.Iterate            1650000     1.505ns     0.001ns     1.497ns     1.506ns       0.000       0.000        14.1

contiguous_range_map

test                             iterations      median         mad         min         max      allocs       tasks        inst
full_1000.SequentialFill           22216000    26.186ns     0.010ns    26.176ns    26.234ns       2.600       0.000       446.1
full_1000.RandomFill               16853000    26.494ns     0.011ns    26.480ns    26.564ns       2.601       0.000       443.6
full_1000.LookUp                    1776000     1.553ns     0.001ns     1.549ns     1.557ns       0.000       0.000        34.2
full_1000.Iterate                   1788000     1.419ns     0.002ns     1.415ns     1.430ns       0.000       0.000        32.3

full_100000.SequentialFill         21600000    25.120ns     0.288ns    24.776ns    25.488ns       2.567       0.000       414.8
full_100000.RandomFill             10200000    55.801ns     0.389ns    53.163ns    56.190ns       2.606       0.000       435.5
full_100000.LookUp                    19000     9.691ns     0.096ns     9.595ns     9.894ns       0.000       0.000        34.3
full_100000.Iterate                 1800000     4.454ns     0.011ns     4.424ns     4.547ns       0.000       0.000        32.0

half_full_1000.SequentialFill      13759500    37.094ns     0.007ns    37.065ns    37.190ns       2.641       0.000       486.3
half_full_1000.RandomFill          12300000    31.107ns     0.012ns    31.047ns    31.121ns       2.632       0.000       473.2
half_full_1000.LookUp               3304000     5.142ns     0.001ns     5.136ns     5.144ns       0.000       0.000        28.3
half_full_1000.Iterate              1671000     9.897ns     0.007ns     9.889ns     9.909ns       0.000       0.000        46.7

half_full_100000.SequentialFill    14250000    31.808ns     0.138ns    31.669ns    32.137ns       2.603       0.000       445.7
half_full_100000.RandomFill         9350000    51.917ns     0.905ns    50.982ns    52.972ns       2.597       0.000       453.1
half_full_100000.LookUp               35000    10.103ns     0.133ns     9.937ns    10.262ns       0.000       0.000        28.3
half_full_100000.Iterate            1700000    11.780ns     0.071ns    11.693ns    11.929ns       0.000       0.000        46.2

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

@mmaslankaprv
Copy link
Member Author

/dt

@vbotbuildovich
Copy link
Collaborator

@mmaslankaprv mmaslankaprv marked this pull request as ready for review March 11, 2024 11:06
@StephanDollberg
Copy link
Member

This is cool, instead of using chunked_vector<std::optional<T>> did you by chance try something like having chunked_vector<T> plus something like a (roaring) bitmap indicating whether the element is there?

I guess this might not work non-default-constructible elements.

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! I like this structure

I guess this might not work non-default-constructible elements.

I think we'd have to drop down into using uninitialized memory, which is honestly a lot of work and tricky to get right. I would be OK with requiring default constructable for values in this map too. I don't know how performance critical this structure's uses are.

src/v/container/include/container/fragmented_vector.h Outdated Show resolved Hide resolved
* or decrementing iterators.
*/
template<std::integral KeyT, typename ValueT>
class contiguous_range_map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(very optional) alternative name to consider: sorted_vector_map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am happy to change the name, was struggling with the proper one. Anyone else has other ideas ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like contiguous_range_map more, although it has a shortcoming in that nothing points that the range is 0-based.

src/v/container/include/container/contiguous_range_map.h Outdated Show resolved Hide resolved
src/v/container/include/container/contiguous_range_map.h Outdated Show resolved Hide resolved
src/v/cluster/topic_table.cc Outdated Show resolved Hide resolved
static_assert(std::copy_constructible<int_range_map::iterator>);
static_assert(std::copy_constructible<int_range_map::const_iterator>);

struct verifier {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:chefs_kiss:

src/v/container/include/container/contiguous_range_map.h Outdated Show resolved Hide resolved
using mapped_type = ValueT;

private:
using underlying_t = chunked_vector<std::optional<value_type>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use of making it chunked_vector<std::optional<vaue_type>> instead of chunked_vector<std::optional<ValueT>>, for iterator we already have access to the index so we can construct the pair on the fly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would always need a copy then. I was considering that but the API wasn't ergonomic and it was differnt than for the standard map

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this a bit more? I am wondering the same as Bharath. I don't think it being different to the standard map is a big issue but yeah if it's not ergonomic that's a different question.

we would always need a copy then

You mean in the iterator or somewhere different?

@mmaslankaprv mmaslankaprv force-pushed the fix-15610 branch 3 times, most recently from e6655ef to 772f538 Compare March 11, 2024 19:46
src/v/cluster/topic_table.cc Outdated Show resolved Hide resolved
rockwotj
rockwotj previously approved these changes Mar 12, 2024
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think there is a potential footgun we could avoid with decrement, but lgtm

using mapped_type = ValueT;

private:
using underlying_t = chunked_vector<std::optional<value_type>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this a bit more? I am wondering the same as Bharath. I don't think it being different to the standard map is a big issue but yeah if it's not ergonomic that's a different question.

we would always need a copy then

You mean in the iterator or somewhere different?

src/v/container/tests/map_bench.cc Show resolved Hide resolved
@StephanDollberg
Copy link
Member

Could you also share the output of the benchmark please?

rockwotj
rockwotj previously approved these changes Mar 12, 2024
src/v/container/include/container/contiguous_range_map.h Outdated Show resolved Hide resolved
Contiguous range map is an associative sorted container backed by
chunked_vector designed to efficiently store objects indexed with
contiguous and limited range of integers. The container provides a
wrapper around basic chunked vector that reassembles API of a standard map.

The wrapper allows random inserts of elements to the map event tho this is
not supported by standard vector.

Underlying container is resized every time its size is not sufficient to
account for emplaced key.

The contiguous_range_map tolerates gap in the range of keys however
number and size of gaps is proportional to performance penalty hit when
incrementing or decrementing iterators.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
To avoid large allocations and still provide good performance, replaced
the `absl::node_hash_map` containing per partition metadata with
`contiguous_range_map` which is backed by chunked vector.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of interesting that the btree iteration is faster than the chunked_vector

@mmaslankaprv mmaslankaprv merged commit 5afeaba into redpanda-data:dev Mar 13, 2024
16 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16919-v23.2.x-70 remotes/upstream/v23.2.x
git cherry-pick -x 0a678855a2c5d90e98457c88958d4f20e0516efe 1d2e410054657bd14157904edb30d94f43299237 16fb27079095e4df94cb05c820933aacaf6e7b41

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16919-v23.3.x-626 remotes/upstream/v23.3.x
git cherry-pick -x 0a678855a2c5d90e98457c88958d4f20e0516efe 1d2e410054657bd14157904edb30d94f43299237 16fb27079095e4df94cb05c820933aacaf6e7b41

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Oversized 300K allocation during topic deletion
6 participants