Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: lz4 compression buffer may be too small and prevent compaction #15024

Closed
dotnwat opened this issue Nov 18, 2023 · 0 comments · Fixed by #15059
Closed

storage: lz4 compression buffer may be too small and prevent compaction #15024

dotnwat opened this issue Nov 18, 2023 · 0 comments · Fixed by #15059
Labels
area/storage kind/bug Something isn't working
Milestone

Comments

@dotnwat
Copy link
Member

dotnwat commented Nov 18, 2023

Version & Environment

All versions.

What went wrong?

Observed this triggered in the housekeeping loop, presumably during compaction.

log_manager.cc:416 - Error processing housekeeping(): std::runtime_error
(lz4f_compressupdate error:ERROR_dstMaxSize_tooSmall)

In this case compaction doesn't continue.

What should have happened instead?

No error.

How to reproduce the issue?

The output buffer that we provide to LZ4F_compressBound is the minimum between the largest allowed allocation size (~128kb) and the computed worst case bound by LZ4F_compressBound for the input size. It appears that in some case 128kb was not large enough.

Well, it's not clear if 128kb wasn't large enough, because its 128kb - output_pointer, so the buffer is shrinking and at some point is copied into the final output as an accumulation and then reset. We'll need to look closer at the lz4 requirements.

Worst case scenario we allocate a compression scratch space at start-up (or used to for zstd) and could reuse that.

@dotnwat dotnwat added kind/bug Something isn't working area/storage labels Nov 18, 2023
@dotnwat dotnwat added this to the v23.2.17 milestone Nov 18, 2023
nvartolomei added a commit to nvartolomei/redpanda that referenced this issue Nov 20, 2023
Our lz4 frame compressor needs to take a lot of care to avoid monolithic
allocations which aren't friendly to seastar memory allocator. To
facilitate this the lz4 library provides a helpful method to compute the
worst case compressed output size for a given input size,
`LZ4F_compressBound`.

Before this change, in a very rare scenario, our code throws the
following exception: `lz4f_compressupdate
error:ERROR_dstMaxSize_tooSmall`.

In hindsight, the bug is obvious if you look at the follow code excerpt:

```
// Compress an input chunk and advance cursor.
...

// Compute the worst case compressed output size for the next chunk.
input_chunk_size = std::min(input_sz - input_cursor, max_input_chunk_size);
size_t next_output_size = LZ4F_compressBound(input_chunk_size, &prefs);

// Allocate new output buffer and reset cursor if the next ooutput won't
// fit in the current one buffer.
if (output_sz - output_cursor < next_output_size) {
  ...
}
```

Each time we compress the last chunk of a fragment, `input_chunk_size`
will equal to 0. Then we call the `LZ4F_compressBound(0, &prefs)` which
for our `prefs` always returns `65543`.

Side note/fun fact: The value is particularly large because lz4 does
internal data buffering up to 64K, so this value always covers that
potentially non-empty buffer. This also means that `LZ4F_compressBound`
very often return just 0. Data is only buffered internally, not written
to the output buffer.

We then check if the hypothetical output of `65543` would fit into our
own buffer, go at the beginning of the loop, move to the next non-empty
fragment, and call `LZ4F_compressUpdate` on its first chunk.

This is where the bug manifests itself. compressUpdate also does call
`LZ4F_compressBound` and throws an exception if the output buffer is
smaller than its return value. In our case, calling compressBound with a
value between 1 <= x <= 65536 always returns `65544`. Just 1 unit larger
than the input for a size of 0. If the output buffer has only 65543
bytes free, we get the `lz4f_compressupdate
error:ERROR_dstMaxSize_tooSmall` exception.

Note: I'm curious to learn why the value is 1 unit larger, but didn't
get to it yet.

This however is very rare in practice and almost impossible to catch in
our tests which use randomized inputs. In particular, these inputs don't
compress and lead to outputs of size `65544`, and our maximum output
buffer size is set to `128 * 1024`. After writing one compressed output
to a buffer this large, only 65528 bytes are free which is lower than
the `65543` returned by `LZ4_compressBound` with an input size of 0.

For the bug to manifest the following conditions must be met:

* The input iobuf must consist of 2 or more fragments.
* The prefix (P) byte length must be >= (input_chunk_size = 64K).
  Followed at least one more fragment.
* The prefix (P), when passed to `LZ4_compressUpdate`, must compress in
  such a way that the output buffer is left with exactly 65543 available
  bytes. This means finding the sequence of a specific length such that
  a flush is triggered at the right moment.

The fix simply moves the `LZ4F_compressBound` calculation right before
`LZ4F_compressUpdate`, guaranteeing that both methods are called with
the same value for `input_chunk_size`.

Fixes: redpanda-data#15024
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Nov 22, 2023
Our lz4 frame compressor needs to take a lot of care to avoid monolithic
allocations which aren't friendly to seastar memory allocator. To
facilitate this the lz4 library provides a helpful method to compute the
worst case compressed output size for a given input size,
`LZ4F_compressBound`.

Before this change, in a very rare scenario, our code throws the
following exception: `lz4f_compressupdate
error:ERROR_dstMaxSize_tooSmall`.

In hindsight, the bug is obvious if you look at the follow code excerpt:

```
// Compress an input chunk and advance cursor.
...

// Compute the worst case compressed output size for the next chunk.
input_chunk_size = std::min(input_sz - input_cursor, max_input_chunk_size);
size_t next_output_size = LZ4F_compressBound(input_chunk_size, &prefs);

// Allocate new output buffer and reset cursor if the next ooutput won't
// fit in the current one buffer.
if (output_sz - output_cursor < next_output_size) {
  ...
}
```

Each time we compress the last chunk of a fragment, `input_chunk_size`
will equal to 0. Then we call the `LZ4F_compressBound(0, &prefs)` which
for our `prefs` always returns `65543`.

Side note/fun fact: The value is particularly large because lz4 does
internal data buffering up to 64K, so this value always covers that
potentially non-empty buffer. This also means that `LZ4F_compressBound`
very often return just 0. Data is only buffered internally, not written
to the output buffer.

We then check if the hypothetical output of `65543` would fit into our
own buffer, go at the beginning of the loop, move to the next non-empty
fragment, and call `LZ4F_compressUpdate` on its first chunk.

This is where the bug manifests itself. compressUpdate also does call
`LZ4F_compressBound` and throws an exception if the output buffer is
smaller than its return value. In our case, calling compressBound with a
value between 1 <= x <= 65536 always returns `65544`. Just 1 unit larger
than the input for a size of 0. If the output buffer has only 65543
bytes free, we get the `lz4f_compressupdate
error:ERROR_dstMaxSize_tooSmall` exception.

Note: I'm curious to learn why the value is 1 unit larger, but didn't
get to it yet.

This however is very rare in practice and almost impossible to catch in
our tests which use randomized inputs. In particular, these inputs don't
compress and lead to outputs of size `65544`, and our maximum output
buffer size is set to `128 * 1024`. After writing one compressed output
to a buffer this large, only 65528 bytes are free which is lower than
the `65543` returned by `LZ4_compressBound` with an input size of 0.

For the bug to manifest the following conditions must be met:

* The input iobuf must consist of 2 or more fragments.
* The prefix (P) byte length must be >= (input_chunk_size = 64K).
  Followed at least one more fragment.
* The prefix (P), when passed to `LZ4_compressUpdate`, must compress in
  such a way that the output buffer is left with exactly 65543 available
  bytes. This means finding the sequence of a specific length such that
  a flush is triggered at the right moment.

The fix simply moves the `LZ4F_compressBound` calculation right before
`LZ4F_compressUpdate`, guaranteeing that both methods are called with
the same value for `input_chunk_size`.

Fixes: redpanda-data#15024
(cherry picked from commit b1e1576)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant