Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-2365: storage: increase size of offset key map fragment size #17879

Merged

Conversation

dotnwat
Copy link
Member

@dotnwat dotnwat commented Apr 16, 2024

With a fragmented_vector configured with 32KB fragments, the actual fragment will be the largest (given T) that doesn't exceed 32KB. For the offset key map the size of T=entry gave a fragment size of 20KB.

The default target size of the entire offset key map is roughly 128KB which works out to a achieved capacity of 8192 fragments when adding entries to the fragmented vector. Given that a vector is 24 bytes, this means that the allocation was 8192 * 24 ~ 190K allocation and this triggered the large allocation warning.

This PR changes the fragmented vector used in the offset key map to have
be a chunked_vector, which end up being around 80KB allocations when
fragment vector decides on the actual fragment size, so the final
allocation for the default configuration is around 45K.

Of course choosing a large map size would negate this solution. We don't really expect large sizes to be configured. In any case, this should not pose a problem is practice because we perform the entire allocation on bootup when memory is not fragmented. If we observe large configurations in the future we can continue to reduce the size (e.g. we could probably bit pack some data in the entry to get a "better" fit of T=entry into the target fragment size).

Fixes: #17860

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

rockwotj
rockwotj previously approved these changes Apr 16, 2024
@@ -179,7 +179,8 @@ class hash_key_offset_map : public key_offset_map {
hash_type::digest_type hash_key(const compaction_key&) const;

mutable hash_type hasher_;
large_fragment_vector<entry> entries_;
fragmented_vector<entry, 64 * 1024> entries_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider using chunked_vector that will automatically calculate the fragment size too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunked_vector provides a fragment size of 80KB for this case, so that would work. they both suffer the same issue when the offset map target size increases sufficiently. is there any advantage to switching? afaict chunk_vector helps when the first fragment may be small. this container will always have something like 128MB total.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair that the outer vector is so large in this case it overallocs. One way to fix that would be to write our own inner vector type, as we can save 8 bytes as we track the capacity separately. Honestly maybe even the size could be dropped as we track that too in aggregate and could compute the single fragment size when needed (which means we could just store an array of pointers).

The automatic size computation of chunked_vector seems superior to manually picking a fragment size, as I see no downsides to 80k vs 64k chunks (should improve the fragmentation minorly actually)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superior to manually picking a fragment size

yeh good point! switched to chunked_vector

With a fragmented_vector configured with 32KB fragments, the actual
fragment will be the largest (given T) that doesn't exceed 32KB. For the
offset key map the size of T=entry gave a fragment size of 20KB.

The default target size of the entire offset key map is roughly 128KB
which works out to a achieved capacity of 8192 fragments when adding
entries to the fragmented vector. Given that a vector is 24 bytes, this
means that the allocation was 8192 * 24 ~ 190K allocation and this
triggered the large allocation warning.

This PR changes the fragmented vector used in the offset key map to have
be a chunked_vector, which end up being around 80KB allocations when
fragment vector decides on the actual fragment size, so the final
allocation for the default configuration is around 45K.

Of course choosing a large map size would negate this solution. We don't
really expect large sizes to be configured. In any case, this should not
pose a problem is practice because we perform the entire allocation on
bootup when memory is not fragmented. If we observe large configurations
in the future we can continue to reduce the size (e.g. we could probably
bit pack some data in the entry to get a "better" fit of T=entry into
the target fragment size).

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/47837#018ee585-c6e1-4287-a5a4-c15086f9ce97:

"rptest.tests.write_caching_fi_test.WriteCachingFailureInjectionTest.test_unavoidable_data_loss"

@dotnwat
Copy link
Member Author

dotnwat commented Apr 16, 2024

#17886

@dotnwat dotnwat merged commit 0fbb87f into redpanda-data:dev Apr 16, 2024
15 of 18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Oversized allocation: 196608 bytes 
4 participants