Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container: Introduce chunked_hash_map #17182

Merged
merged 4 commits into from
Mar 26, 2024

Conversation

StephanDollberg
Copy link
Member

@StephanDollberg StephanDollberg commented Mar 19, 2024

Introduces chunked_hash_map which is a type alias for the segmented
version of https://github.com/martinus/unordered_dense/ using our
chunked_vector as a backing array.

This allows us to use a hashmap in scenarios where the map may grow
large (i.e.: scale with partition/topic counts) similar to chunked_vector
and avoids us having to fall back to the suboptimal absl::btree_map.

Below are some numbers from our leader balanacer benchmark comparing the
different map types:

test iterations median mad min max allocs tasks inst
lb.random_generator_movement_absl_node 149 3.120ms 97.624ns 3.120ms 3.122ms 0.000 0.000 44663355.3
lb.random_generator_movement_absl_flat 152 3.090ms 342.546ns 3.089ms 3.090ms 0.000 0.000 44846754.1
lb.random_generator_movement_absl_btree 84 8.103ms 884.595ns 8.102ms 8.105ms 0.000 0.000 57163339.9
lb.random_generator_movement_std_map 117 5.028ms 676.256ns 5.027ms 5.030ms 0.000 0.000 53678485.9
lb.random_generator_movement_unordered_dense 156 2.850ms 576.237ns 2.849ms 2.851ms 0.000 0.000 43115077.3
lb.random_generator_movement_unordered_dense_segmented 149 3.111ms 360.074ns 3.111ms 3.115ms 0.000 0.000 46250454.0
lb.random_generator_movement_unordered_dense_segmented_frag 151 3.063ms 388.795ns 3.063ms 3.065ms 0.000 0.000 45697581.1
lb.random_generator_movement_unordered_dense_segmented_frag_absl_hash 139 3.643ms 444.705ns 3.642ms 3.643ms 0.000 0.000 46476708.7

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@travisdowns
Copy link
Member

Naming: highly recommend chunked over fragmented as we are trying to align on that naming after @rockwotj changes to introduce chunked_vector as well.

@StephanDollberg StephanDollberg force-pushed the stephan/lb-fragmented-hashmap branch 2 times, most recently from 46fd9b4 to 7b1794d Compare March 20, 2024 09:51
@StephanDollberg
Copy link
Member Author

@redpanda-data redpanda-data deleted a comment from vbotbuildovich Mar 20, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Mar 20, 2024
@StephanDollberg
Copy link
Member Author

Failure seems to be a dupe of: #17205

@emaxerrno
Copy link
Contributor

this is neat!

@emaxerrno
Copy link
Contributor

Given the intent is to use as widely as possible maybe the name of “fmap” is better. For commonly used classes, small tokens is better.

@StephanDollberg
Copy link
Member Author

Given the intent is to use as widely as possible maybe the name of “fmap” is better. For commonly used classes, small tokens is better.

While I generally agree with this idea (e.g.: dict in python) I don't think we follow that anywhere else in RP so I'd rather stay with what we do otherwise.

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a question and FYI the cover letter needs updating from the new name

src/v/container/include/container/fragmented_vector.h Outdated Show resolved Hide resolved
@StephanDollberg StephanDollberg changed the title container: Introduce fragmented_hash_map container: Introduce chunked_hash_map Mar 21, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Mar 21, 2024
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I think we need to fix the oss build

fetch_dep(unordered_dense
REPO https://github.com/redpanda-data/unordered_dense
TAG 9338f301522a965309ecec58ce61f54a52fb5c22
SOURCE_DIR redpanda_build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this directory doesn't exist in the repo copy paste error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you need to add this to MakeAvailable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah brilliant, yeah I was still experimenting with this (== guessing).

travisdowns
travisdowns previously approved these changes Mar 23, 2024
@@ -1,7 +1,7 @@
find_package(Roaring REQUIRED)
v_cc_library(
NAME container
DEPS Roaring::roaring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAILED: lib/libv_v_io.so
: && /vectorized/llvm/bin/clang++ -fPIC -fPIC -march=westmere -fuse-ld=lld -Wno-unused-lambda-capture -Wno-unused-command-line-argument -stdlib=libc++ -fsanitize=undefined -fsanitize=address -fstack-protector-strong -fno-plt -fstack-clash-protection -O0 -g -gdwarf-4 -gsplit-dwarf  -Wl,--build-id=sha1 -L/vectorized/lib -L/vectorized/lib64 -stdlib=libc++ -fsanitize=undefined -fsanitize=address -Wl,-z,relro,-z,now -shared -Wl,-soname,libv_v_io.so -o lib/libv_v_io.so src/v/io/CMakeFiles/v_io.dir/persistence.cc.o src/v/io/CMakeFiles/v_io.dir/page.cc.o src/v/io/CMakeFiles/v_io.dir/page_set.cc.o src/v/io/CMakeFiles/v_io.dir/clang-tidy-helper.cc.o src/v/io/CMakeFiles/v_io.dir/logger.cc.o src/v/io/CMakeFiles/v_io.dir/io_queue.cc.o src/v/io/CMakeFiles/v_io.dir/scheduler.cc.o  -Wl,-rpath,/vectorized/lib64  /vectorized/lib64/libseastar.so  /vectorized/lib64/libboost_program_options.so.1.82.0  /vectorized/lib64/libboost_thread.so.1.82.0  /vectorized/lib64/libboost_atomic.so.1.82.0  /vectorized/lib64/libboost_chrono.so.1.82.0  /vectorized/lib64/libboost_date_time.so.1.82.0  /vectorized/lib64/libboost_container.so.1.82.0  /vectorized/lib64/libboost_exception.a  /vectorized/lib64/libcares.so  /vectorized/lib64/libcryptopp.so  /vectorized/lib64/libfmtd.so.8.1.1  -Wl,--as-needed  /vectorized/lib64/liblz4.so  -fsanitize=address  -fsanitize=undefined  -fno-sanitize=vptr  /vectorized/lib64/libabsl_cord.so.2308.0.0  /vectorized/lib64/libabsl_cordz_info.so.2308.0.0  /vectorized/lib64/libabsl_cord_internal.so.2308.0.0  /vectorized/lib64/libabsl_cordz_functions.so.2308.0.0  /vectorized/lib64/libabsl_cordz_handle.so.2308.0.0  /vectorized/lib64/libabsl_crc_cord_state.so.2308.0.0  /vectorized/lib64/libabsl_crc32c.so.2308.0.0  /vectorized/lib64/libabsl_crc_internal.so.2308.0.0  /vectorized/lib64/libabsl_crc_cpu_detect.so.2308.0.0  /vectorized/lib64/libabsl_str_format_internal.so.2308.0.0  /vectorized/lib64/libabsl_raw_hash_set.so.2308.0.0  /vectorized/lib64/libabsl_hash.so.2308.0.0  /vectorized/lib64/libabsl_bad_optional_access.so.2308.0.0  /vectorized/lib64/libabsl_city.so.2308.0.0  /vectorized/lib64/libabsl_bad_variant_access.so.2308.0.0  /vectorized/lib64/libabsl_low_level_hash.so.2308.0.0  /vectorized/lib64/libabsl_hashtablez_sampler.so.2308.0.0  /vectorized/lib64/libabsl_exponential_biased.so.2308.0.0  /vectorized/lib64/libabsl_synchronization.so.2308.0.0  /vectorized/lib64/libabsl_stacktrace.so.2308.0.0  /vectorized/lib64/libabsl_graphcycles_internal.so.2308.0.0  /vectorized/lib64/libabsl_kernel_timeout_internal.so.2308.0.0  /vectorized/lib64/libabsl_time.so.2308.0.0  /vectorized/lib64/libabsl_civil_time.so.2308.0.0  /vectorized/lib64/libabsl_time_zone.so.2308.0.0  /vectorized/lib64/libabsl_symbolize.so.2308.0.0  /vectorized/lib64/libabsl_strings.so.2308.0.0  /vectorized/lib64/libabsl_string_view.so.2308.0.0  /vectorized/lib64/libabsl_throw_delegate.so.2308.0.0  /vectorized/lib64/libabsl_strings_internal.so.2308.0.0  /vectorized/lib64/libabsl_int128.so.2308.0.0  /vectorized/lib64/libabsl_debugging_internal.so.2308.0.0  /vectorized/lib64/libabsl_malloc_internal.so.2308.0.0  /vectorized/lib64/libabsl_demangle_internal.so.2308.0.0  /vectorized/lib64/libabsl_base.so.2308.0.0  /vectorized/lib64/libabsl_raw_logging_internal.so.2308.0.0  /vectorized/lib64/libabsl_log_severity.so.2308.0.0  /vectorized/lib64/libabsl_spinlock_wait.so.2308.0.0  /vectorized/lib64/libroaring.so  -lunordered_dense && :
ld.lld: error: unable to find library -lunordered_dense

internal build is failing because cluster doesn't depend on v::container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it try to link against unordered_dense though, it's a header-only library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh because you put it in deps, it will try and link against it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe trying adding unordered_dense::unordered_dense instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that will only work for the OSS build. There is no cmake target for this in the superbuild so cmake tries to link against it, I think thats why the :: alias pattern emerged. I am sort of out of my depth here, I am not a CMake wizard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts: either need a conditional add library for oss or make a dummy target in vtools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those pointers, so I also have no idea what I am doing here but with: https://github.com/redpanda-data/vtools/compare/stephan/find-unordered-dense?expand=1

and

diff --git a/src/v/container/CMakeLists.txt b/src/v/container/CMakeLists.txt
index 7b260ee289..c524beffdf 100644
--- a/src/v/container/CMakeLists.txt
+++ b/src/v/container/CMakeLists.txt
@@ -1,7 +1,11 @@
 find_package(Roaring REQUIRED)
+find_package(unordered_dense REQUIRED)
+
 v_cc_library(
   NAME container
-  DEPS Roaring::roaring unordered_dense
+  DEPS
+    Roaring::roaring
+    unordered_dense::unordered_dense
 )

 add_subdirectory(tests)

it seems to work.

@dotnwat is this the right way to go?

We really need to get away from the dual build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks ok.

When it's not a header-only library we can usually get away with a little alias hack in the open-source build like this https://github.com/redpanda-data/redpanda/blob/dev/cmake/dependencies.cmake#L135-L137.

But given it's header-only I think FindUnorderedDense.cmake is the most reasonable thing. And that looks fine.

We really need to get away from the dual build.

100%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduces `chunked_hash_map` which is a type alias for the segmented
version of https://github.com/martinus/unordered_dense/ using our
`chunked_vector` as a backing array.

This allows us to use a hashmap in scenarios where the map may grow
large (i.e.: scale with partition/topic counts) similar to `chunked_vector`
and avoids us having to fall back to the suboptimal `absl::btree_map`.

Below are some numbers from our leader balanacer benchmark comparing the
different map types:

| test                                                                  |  iterations |      median |         mad |         min |         max |      allocs |       tasks |        inst |
| -                                                                     |           - |           - |           - |           - |           - |           - |           - |           - |
| lb.random_generator_movement_absl_node                                |         149 |     3.120ms |    97.624ns |     3.120ms |     3.122ms |       0.000 |       0.000 |  44663355.3 |
| lb.random_generator_movement_absl_flat                                |         152 |     3.090ms |   342.546ns |     3.089ms |     3.090ms |       0.000 |       0.000 |  44846754.1 |
| lb.random_generator_movement_absl_btree                               |          84 |     8.103ms |   884.595ns |     8.102ms |     8.105ms |       0.000 |       0.000 |  57163339.9 |
| lb.random_generator_movement_std_map                                  |         117 |     5.028ms |   676.256ns |     5.027ms |     5.030ms |       0.000 |       0.000 |  53678485.9 |
| lb.random_generator_movement_unordered_dense                          |         156 |     2.850ms |   576.237ns |     2.849ms |     2.851ms |       0.000 |       0.000 |  43115077.3 |
| lb.random_generator_movement_unordered_dense_segmented                |         149 |     3.111ms |   360.074ns |     3.111ms |     3.115ms |       0.000 |       0.000 |  46250454.0 |
| lb.random_generator_movement_unordered_dense_segmented_frag           |         151 |     3.063ms |   388.795ns |     3.063ms |     3.065ms |       0.000 |       0.000 |  45697581.1 |
| lb.random_generator_movement_unordered_dense_segmented_frag_absl_hash |         139 |     3.643ms |   444.705ns |     3.642ms |     3.643ms |       0.000 |       0.000 |  46476708.7 |

This patch further adds some missing typedefs to `fragmented_vector`
that are needed for it to work as the backing array for
`unordered_dense`.
Use a conditional hash adapter that supports `AbslHashValue` if the
class provides it.
At large partition counts (50k+) the `absl::node_hash_map` backing index
array would cause oversized allocs otherwise.

Switch to `chunked_hash_map`.
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 24, 2024

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rockwotj
Copy link
Contributor

@StephanDollberg let's update the advice on the wiki with this: https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/318275653/Memory+Management+in+Redpanda

@StephanDollberg StephanDollberg merged commit bbbf14d into dev Mar 26, 2024
19 checks passed
@StephanDollberg StephanDollberg deleted the stephan/lb-fragmented-hashmap branch March 26, 2024 10:12
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants