Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use async compression in kafka client #15920

Merged
merged 2 commits into from
Jan 5, 2024

Conversation

michael-redpanda
Copy link
Contributor

Fixes: #15900

Utilize asynchronous compression in kafka client to reduce possibility of oversized allocation.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Improvements

  • Internal kafka client now uses asynchronous compression (when possible) to reduce possibility of oversized allocations and reactor stalls

@michael-redpanda michael-redpanda marked this pull request as ready for review January 2, 2024 21:35
@michael-redpanda michael-redpanda self-assigned this Jan 2, 2024
dotnwat
dotnwat previously approved these changes Jan 3, 2024
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

src/v/kafka/client/produce_partition.h Outdated Show resolved Hide resolved
src/v/kafka/client/produce_partition.h Outdated Show resolved Hide resolved
src/v/kafka/client/produce_partition.h Outdated Show resolved Hide resolved
@dotnwat dotnwat dismissed their stale review January 3, 2024 20:48

Removing review I realize after seeing Rob's review that I only reviewed the first commit.

graphcareful
graphcareful previously approved these changes Jan 3, 2024
Copy link
Contributor

@graphcareful graphcareful left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice !

src/v/kafka/client/produce_partition.h Outdated Show resolved Hide resolved
src/v/kafka/client/produce_partition.h Outdated Show resolved Hide resolved
graphcareful
graphcareful previously approved these changes Jan 3, 2024
@michael-redpanda
Copy link
Contributor Author

CI failure: #15950

@michael-redpanda michael-redpanda added this to the v23.3.2 milestone Jan 5, 2024
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good.

I wonder if it's worth raising a ticket to opportunistically replace uses of build() with build_async() where the call is already in an async scope.

@@ -31,6 +31,7 @@ class record_batch_builder {
std::optional<iobuf>&& value,
std::vector<model::record_header> headers);
virtual model::record_batch build() &&;
virtual ss::future<model::record_batch> build_async() &&;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like build() is ever overridden, can these both be made non-virtual to avoid confusion?

Comment on lines 38 to 39
ssx::spawn_with_gate(
_gate, [this]() -> ss::future<> { return try_consume(); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

Suggested change
ssx::spawn_with_gate(
_gate, [this]() -> ss::future<> { return try_consume(); });
ssx::spawn_with_gate(_gate, [this]() { return try_consume(); });

Comment on lines 75 to 80
if (!consumer_can_run()) {
co_return;
}

auto batch_record_count = _config.produce_batch_record_count();
auto batch_size_bytes = _config.produce_batch_size_bytes();
_consumer(co_await do_consume());
co_return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (also, what are you doing with this preview, github?):

Suggested change
if (!consumer_can_run()) {
co_return;
}
auto batch_record_count = _config.produce_batch_record_count();
auto batch_size_bytes = _config.produce_batch_size_bytes();
_consumer(co_await do_consume());
co_return;
if (consumer_can_run()) {
_consumer(co_await do_consume());
}

}

ss::future<> stop() {
try_consume(true);
co_await try_consume();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess rearming it and then cancelling it isn't too bad. In the case where produce_batch_delay == 0, it may be possible to run another round of try_consume.

To get the original behaviour:

Suggested change
co_await try_consume();
_timer.set_callback([]() {});
co_await try_consume();

}

/// \brief Validates that the size threshold has been met to trigger produce
inline bool threshold_met() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I don't think the inline helps you much here, it's implicitly inline. I suspect you're attempting to hint the compiler to inline the function body, there's a fair chance it will ignore you, as that's not what inline is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. I think this was muscle memory from previous work I've done.

///
/// Consumer can only run if one is not already running and there are
/// records available
inline bool consumer_can_run() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Same comment about inline as above.

@michael-redpanda
Copy link
Contributor Author

Looks pretty good.

I wonder if it's worth raising a ticket to opportunistically replace uses of build() with build_async() where the call is already in an async scope.

#15964

Added async builder for record_batch_builder that can be used
to perform asynchronous compression.

Signed-off-by: Michael Boquard <michael@redpanda.com>
Updated produce batcher to use async builder that utilizes
asynchronous compression.

Signed-off-by: Michael Boquard <michael@redpanda.com>
@michael-redpanda
Copy link
Contributor Author

Force push d4fc4ac:

  • Fixups from ben's review

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/43489#018cdac2-fb91-4f6e-b3b8-df08aec97500:

"rptest.tests.recovery_mode_test.DisablingPartitionsTest.test_disable"

@dotnwat dotnwat merged commit aa00c12 into redpanda-data:dev Jan 5, 2024
16 of 19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-15920-v23.1.x-521 remotes/upstream/v23.1.x
git cherry-pick -x 3a6a7ae6f811bf2237a412abd198c4f3c2871a86 d4fc4acb32d5caf3902cf397021536d7b59e6dc1

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-15920-v23.2.x-645 remotes/upstream/v23.2.x
git cherry-pick -x 3a6a7ae6f811bf2237a412abd198c4f3c2871a86 d4fc4acb32d5caf3902cf397021536d7b59e6dc1

Workflow run logs.

@piyushredpanda
Copy link
Contributor

Older branches will need manual cherrypicks, @michael-redpanda. Thanks for the fix, though!

@michael-redpanda
Copy link
Contributor Author

Older branches will need manual cherrypicks, @michael-redpanda. Thanks for the fix, though!

yes, sorry was prioritizing v23.3. Will return to this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Oversized allocation: 323584 bytes in compression::stream_zstd::do_compress
6 participants