-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PESDLC-886 Tests to create 10k+ topics and do checks with low throughput #16463
Conversation
Several parallel threads used and timings collected. Will check how effective would be using several nodes
32 threads
|
5a52dcd
to
003b6f7
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44783#018d80c7-1164-4ffe-96ad-5655a2728fa1 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44853#018d862e-8274-4db4-bea0-5745c8eb4ea3 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45166#018dc8e0-bdcc-4681-96cc-7a7497c85f76 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45367#018de83f-ee42-4a68-aa70-910ba013438f |
What do these timings mean? Like p50=190 is the time to do what? Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify the purpose of this script? Is it for invesitgation purposes or will we also use the methods to create topics during testing?
Those timings are for sending single requests in parallel. Based on RPK and kafka client tools my understanding was that one one topic can be created at a time. That's why I created this ThreadPoolExecutor thing and tried to understand how much parallel requests is the most efficient. But the morning after, I've tried to send over all topics in a single request as python kafka.client supports it. And the creation timings are waaaay faster. |
4096 topics in single batch ~6.48 sec
8192 = <6.22 sec
11950 = ~12 sec
When creating topics near the cluster limit, BadLogLines being detected:
|
OK wow so ~200 seconds to create a topic when we are creating many in parallel, I guess we do not get much speedup from that (compared to 2 seconds when sent alone: i.e., we have 128x more parallelism but it got ~100x slower, for little gain). |
I am using this script to create topics at large numbers with randomized naming and controllable batch sizes (how many topics being sent to RP in single request). Goal of the test being created to check that RP can process topic creation in large numbers and not fail/crash or produce bad allocations when approaching topic/partition limits |
@savex wrote:
Very nice: an oversized allocation. These are exactly the type of issues we want to suss out here. Do you know how to decode the backtrace? |
No, had no chance yet. |
Here is the backtrace for recent logs on oversized allocation
|
Created issue: #16521 |
df44807
to
80be6bc
Compare
# Free node that used to create topics | ||
self.cluster.free_single(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we allocate an entire node for creating the topics? It seems like the ducktape driver node can create them just as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we do not be limited by the lack of clien's node resources and able to send requests as fast as possible. Client node has significantly less resources comparing to worker ones. When checking the 40k - it would be more significant.
ioclass.flush() | ||
|
||
|
||
class TopicSwarm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fun!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal was to build isolated tool with no complexity in mind.
efd3ed7
to
ee47a5a
Compare
0f11a77
to
45b4d0a
Compare
130ad97
to
9c90b3c
Compare
9c90b3c
to
0bd9804
Compare
/ci-repeat |
Script uses confluent_kafka to send requests to redpanda it has ability to skip topic name randomization along with ability to select prefix in topic script and skip randomization related checks if not needed
Test accounts for minimal node configuration if i3en.xlarge with 2 vcpus.
0bd9804
to
ba5776c
Compare
|
Single topic creation takes at least 1 sec. This PR utilizes several parallel threads and several nodes to create topics when using single topic creation method.
Also, it appears that kafka.client supports sending a lot of topic specs at the same time. And it is a lot faster.
8192 topics in a single request created in <9 sec.
Both methods retained in the code and is switchable via
use_kafka_batching
option.Test has produce/consume stage that produces X number of messages to all topics and selectively checks one for proper message count consumed and proper data consumed
Backports Required
Release Notes