Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] storage: use async serialization for large kvstore snapshot batches #11798

Merged
merged 8 commits into from
Jun 30, 2023

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Jun 30, 2023

Backports:

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • Reduced latency impact from storing metadata in certain scenarios where the number of partitions per shard is high

jcsp and others added 8 commits June 30, 2023 11:56
(cherry picked from commit 18dfe26)

Conflicts:
  src/v/storage/kvstore.cc

Due to:
* b980a5b
* ef7abec
This is useful in kvstore, which composes very large
batches for its snapshots, and can cause ~30ms reactor
stalls encoding them.

(cherry picked from commit e759243)
Direct equivalent of the regular from_iobuf, but
it returns a future and calls async_adl instead of adl.

(cherry picked from commit b697d9e)
kvstore creates very large batches when
the number of partitions per core is high, and
doing the encode/decode synchronously can
block the reactor for tens of milliseconds.

(cherry picked from commit 54c0a3f)
In tests we are seeing some cases where `last_offset` ends up being
larger than `base_offset`.

This results in trying to reserve a huge number of slots in the hashmap
which subsequently leads to crashes (see below).

Fix this by simply looking at the record_count in the header.

`last_offset` being smaller than `base_offset` is a result of
`.last_offset_delta` being negative. This looks valid to me as the
`record_batch_builder` initiliazes it to:

```
.last_offset_delta = static_cast<int32_t>(_records.size() - 1),
```

so it looks like a valid case for an empty batch. Please correct me if
this doesn't sound right.

Internally the absl hash map implementation rounds user given sizes up
to power of twos and also merges its control and element array. For
values close to `size_t` this doesn't work out anymore because of
overflow and then results in allocating a completely different number
(very small) compared to what it then tries to memset.

Fixes redpanda-data#10788

(cherry picked from commit 65024e6)
@BenPope BenPope added area/storage kind/backport PRs targeting a stable branch labels Jun 30, 2023
@BenPope BenPope added this to the v23.1.x-next milestone Jun 30, 2023
@BenPope BenPope requested review from jcsp and StephanDollberg June 30, 2023 11:48
@BenPope BenPope self-assigned this Jun 30, 2023
@BenPope BenPope requested a review from mmaslankaprv June 30, 2023 11:48
@StephanDollberg
Copy link
Member

Any of the conflicts one needs to pay attention to or was it just small stuff?

@BenPope
Copy link
Member Author

BenPope commented Jun 30, 2023

Any of the conflicts one needs to pay attention to or was it just small stuff?

Small stuff. Some argument reordering/tweaking and some new functionality for gc which I didn't add here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants