-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Add produce microbenchmark #16386
Conversation
Also worth noting
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44518#018d5fa3-8b22-4521-8042-440ad19cd19a ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44691#018d795d-0167-4333-9982-6b3135201b0b |
std::vector<ss::future<>> dispatched; | ||
std::vector<ss::future<kafka::produce_response::partition>> produced; | ||
|
||
perf_tests::start_measuring_time(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused, this is now being run 10k times in different fibers for a single benchmark loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, this isn't going to work correctly.
Since my original motiviation for cranking up the concurrency was to get a useful flamegraph, we can just remove the concurrency and simplify things.
f5c97ee
to
1719c49
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/44691#018d795d-0172-4938-bcef-8d5ec0e678bf:
new failures in https://buildkite.com/redpanda/redpanda/builds/45193#018dcdd0-f5c3-4ceb-8c88-3000864ecbb1:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share a screenshot of the flamegraph with the produce bit zoomed in? Or share the whole file?
builder.add_raw_kv(iobuf{}, iobuf{}); | ||
} | ||
|
||
auto batch = std::move(builder).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will this 100 batches or one? I thought one request can only contain a single batch per partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this variable is poorly named -- it's one batch with 100 records.
Flamegraph without concurrency. |
produce_partition_fixture() { | ||
BOOST_TEST_CHECKPOINT("before leadership"); | ||
|
||
wait_for_controller_leadership().get0(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer get()
|
||
constexpr size_t num_records = 100; | ||
for (size_t i = 0; i < num_records; ++i) { | ||
builder.add_raw_kv(iobuf{}, iobuf{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put some data in the messages? Seems like something easy to make the test parameterizable on: the size of the payload.
dispatched.push_back(std::move(stages.dispatched)); | ||
produced.push_back(std::move(stages.produced)); | ||
|
||
co_await ss::when_all_succeed(dispatched.begin(), dispatched.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of the dispatched
vector: it only ever has one element it in? Removing it would simplify this a lot and also allow a pure coroutine style probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Same for produced
. Removing.
src/v/redpanda/tests/fixture.h
Outdated
@@ -789,7 +789,7 @@ class redpanda_thread_fixture { | |||
{ r.encode(writer, version) } -> std::same_as<void>; | |||
} | |||
kafka::request_context make_request_context( | |||
RequestType request, kafka::request_header& header, conn_ptr conn = {}) { | |||
RequestType& request, kafka::request_header& header, conn_ptr conn = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give some detail here? This doesn't strike me as a safe change without some additional information, since callers may now see the request they are passing in mutated, as opposed to having the copy mutated (leaving the caller non-the-wiser).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right that this is a risk. I checked the code paths and didn't see any modifications to the request to thought this would be safe (though admittedly this doesn't guard against future changes).
Changing to a reference was necessary because partition_produce_data
has a deleted default copy constructor due to its records
field, e.g. std::optional<kafka::produce_request_record_data> records{}' and unlike
kafka::fetch_requestit's not possible to copy a
kafka::produce_requestas-is. Would you prefer to see a user-defined copy constructor for
partition_produce_data` ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not modified, we could just make the parameter type const RequestType&
?
However, I think the second part of your reply is suggesting it is modified: do we move out of request
in order to avoid copying? "Moving out of" is a modification.
So I think if it is truly not modified, make it const
. If it is modified, we should try to leave the signature alone move it at the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course you're right, it is modified :)
Switched to doing a move at the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continued the thread about the parameter type change. It's the only outstanding issue as far as I'm aware.
This one is simliar to the fetch plan benchmark. Right now it only handles a single partition with RF=1 and doesn't use transactional / idempotent batches.
I'm getting a timeout for |
Sounds fine to me. Will need actioning from @piyushredpanda |
Backports Required
Release Notes