-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-2365: storage: increase size of offset key map fragment size #17879
CORE-2365: storage: increase size of offset key map fragment size #17879
Conversation
src/v/storage/key_offset_map.h
Outdated
@@ -179,7 +179,8 @@ class hash_key_offset_map : public key_offset_map { | |||
hash_type::digest_type hash_key(const compaction_key&) const; | |||
|
|||
mutable hash_type hasher_; | |||
large_fragment_vector<entry> entries_; | |||
fragmented_vector<entry, 64 * 1024> entries_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consider using chunked_vector that will automatically calculate the fragment size too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunked_vector provides a fragment size of 80KB for this case, so that would work. they both suffer the same issue when the offset map target size increases sufficiently. is there any advantage to switching? afaict chunk_vector helps when the first fragment may be small. this container will always have something like 128MB total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fair that the outer vector is so large in this case it overallocs. One way to fix that would be to write our own inner vector type, as we can save 8 bytes as we track the capacity separately. Honestly maybe even the size could be dropped as we track that too in aggregate and could compute the single fragment size when needed (which means we could just store an array of pointers).
The automatic size computation of chunked_vector seems superior to manually picking a fragment size, as I see no downsides to 80k vs 64k chunks (should improve the fragmentation minorly actually)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superior to manually picking a fragment size
yeh good point! switched to chunked_vector
With a fragmented_vector configured with 32KB fragments, the actual fragment will be the largest (given T) that doesn't exceed 32KB. For the offset key map the size of T=entry gave a fragment size of 20KB. The default target size of the entire offset key map is roughly 128KB which works out to a achieved capacity of 8192 fragments when adding entries to the fragmented vector. Given that a vector is 24 bytes, this means that the allocation was 8192 * 24 ~ 190K allocation and this triggered the large allocation warning. This PR changes the fragmented vector used in the offset key map to have be a chunked_vector, which end up being around 80KB allocations when fragment vector decides on the actual fragment size, so the final allocation for the default configuration is around 45K. Of course choosing a large map size would negate this solution. We don't really expect large sizes to be configured. In any case, this should not pose a problem is practice because we perform the entire allocation on bootup when memory is not fragmented. If we observe large configurations in the future we can continue to reduce the size (e.g. we could probably bit pack some data in the entry to get a "better" fit of T=entry into the target fragment size). Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
4708320
to
4251593
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/47837#018ee585-c6e1-4287-a5a4-c15086f9ce97:
|
/backport v23.3.x |
With a fragmented_vector configured with 32KB fragments, the actual fragment will be the largest (given T) that doesn't exceed 32KB. For the offset key map the size of T=entry gave a fragment size of 20KB.
The default target size of the entire offset key map is roughly 128KB which works out to a achieved capacity of 8192 fragments when adding entries to the fragmented vector. Given that a vector is 24 bytes, this means that the allocation was 8192 * 24 ~ 190K allocation and this triggered the large allocation warning.
This PR changes the fragmented vector used in the offset key map to have
be a chunked_vector, which end up being around 80KB allocations when
fragment vector decides on the actual fragment size, so the final
allocation for the default configuration is around 45K.
Of course choosing a large map size would negate this solution. We don't really expect large sizes to be configured. In any case, this should not pose a problem is practice because we perform the entire allocation on bootup when memory is not fragmented. If we observe large configurations in the future we can continue to reduce the size (e.g. we could probably bit pack some data in the entry to get a "better" fit of T=entry into the target fragment size).
Fixes: #17860
Backports Required
Release Notes