Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compaction index size overflow #7687

Merged
merged 7 commits into from
Dec 16, 2022

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Dec 10, 2022

As by default max compacted segment size is 5 GiB, compacted index size for large segments can oveflow 32-bit ints. This PR substitutes size and the number of keys in the footer for 64-bit ints. Old 32-bit fields are left for backwards compatibility with the code that doesn't check version number.

Fixes #7647

Additionally a more stringent footer version check is introduced that requires version to be equal to the current one (it is better to rebuild even if the version is greater as the index is probably incompatible anyway).

Backports Required

  • v22.3.x
  • v22.2.x
  • v22.1.x

Release Notes

Bug Fixes

  • Fix integer overflow in compaction index footer that could lead to hangs when trying to compact large segments.

@ztlpn ztlpn force-pushed the fix-7647-compaction-index-overflow branch 2 times, most recently from e2af2cb to fd4bcc9 Compare December 12, 2022 11:26
If the index is of different version than the current one, request a
rebuild. It is easier than dealing with compatibility issues.
@ztlpn ztlpn force-pushed the fix-7647-compaction-index-overflow branch from fd4bcc9 to 3ad52b8 Compare December 12, 2022 23:14
@ztlpn ztlpn marked this pull request as ready for review December 12, 2022 23:20
Comment on lines 134 to 135
ss::temporary_buffer<char> tmp = co_await in.read_exactly(
compacted_index::footer_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use read_iobuf_exactly ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if (tmp.size() != compacted_index::footer_size) {
if (buf.size_bytes() != compacted_index::footer_size) {
throw std::runtime_error(fmt::format(
"could not read enough bytes to parse "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the case we hit when loading old cluster's compaction index? If so, maybe we should make the message a bit clear to indicate that this isn't totally unexpected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is kind of a catch-all, but adding a note about a likely cause is a good idea.

@emaxerrno
Copy link
Contributor

@ztlpn thanks so much for this. i did a quick git blame on it ... and the original thinking was that segments would be 4G max, at some point along the way i forgot to add the assert. At the time of writing this, we were actually capping segments at 2G, so I figured why not 2X. it's all more bits now :) thanks for the fix.

jcsp
jcsp previously approved these changes Dec 16, 2022
@jcsp
Copy link
Contributor

jcsp commented Dec 16, 2022

Just to check we're thinking the same thing -- the regeneration of v1 indices is expected to happen on-demand during compaction, right? So we won't be doing some kind of mass rewrite when we start up...

As by default max compacted segment size is 5 GiB, compacted index size
for large segments can oveflow 32-bit ints. This commit substitutes size
and the number of keys in the footer for 64-bit ints. Old 32-bit fields are
left for backwards compatibility with the code that doesn't check
version number.

Fixes redpanda-data#7647
@ztlpn
Copy link
Contributor Author

ztlpn commented Dec 16, 2022

Just to check we're thinking the same thing -- the regeneration of v1 indices is expected to happen on-demand during compaction, right? So we won't be doing some kind of mass rewrite when we start up...

Good point, I did a quick code audit, looks like we only read compaction index during actual compaction.

@ztlpn
Copy link
Contributor Author

ztlpn commented Dec 16, 2022

test failure is #7816

@ztlpn ztlpn merged commit 4c6e2d6 into redpanda-data:dev Dec 16, 2022
@jcsp
Copy link
Contributor

jcsp commented Dec 16, 2022

@ztlpn before we merge a backport + release this to the field, perhaps it is worth adding a ducktape test in a followup PR that ensures this works end to end across the upgrade? (including the rewrites that we expect to happen via exceptions)

@ztlpn
Copy link
Contributor Author

ztlpn commented Dec 16, 2022

sure, will do

ztlpn added a commit to ztlpn/redpanda that referenced this pull request Feb 8, 2023
Clamp max compacted segment size to 1.5GiB to avoid compaction index
size overflow.

Fixes redpanda-data#7647

This is a workaround for older versions, proper fix is
redpanda-data#7687
ztlpn added a commit to ztlpn/redpanda that referenced this pull request Feb 8, 2023
Clamp max compacted segment size to 1.5GiB to avoid compaction index
size overflow.

Fixes redpanda-data#7647

This is a workaround for older versions, proper fix is
redpanda-data#7687
@ztlpn ztlpn deleted the fix-7647-compaction-index-overflow branch November 27, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: integer overflow in compacted_index_chunk_reader.cc
4 participants