-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PESDLC-901 Tests to create 10k+ topics and do checks with low throughput #16750
Conversation
Script uses confluent_kafka to send requests to redpanda it has ability to skip topic name randomization along with ability to select prefix in topic script and skip randomization related checks if not needed
Test accounts for minimal node configuration if i3en.xlarge with 2 vcpus.
ba5776c
to
d334b83
Compare
write_json(sys.stdout, {'timings': timings}) | ||
# Exit on threshold > 5 min | ||
# I.e. single topic creation takes more than 5 min | ||
if timings["creation-time-max"] > 300: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be nicer to proactively timeout after 5mins. Rather than this approach of waiting however long topic creation(for a batch or single topic) takes then erroring out if it takes longer then 5min. It's something that can be done in a follow up PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will update.
new failures in https://buildkite.com/redpanda/redpanda/builds/45416#018debaf-8194-48ed-9f59-b7b826781e9f:
new failures in https://buildkite.com/redpanda/redpanda/builds/45416#018dec80-3f72-4e34-a0a4-0b6c58d770ff:
|
) | ||
return None | ||
|
||
def _write_and_random_read_many_topics(self, num_topics, topic_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be nice to split these utility methods into separate classes to make them reusable in other tests. Not something we should focus on doing right now though.
return (sum(numpy.diff(sorted(numbers_list)) == 1) >= n) | ||
|
||
# Prepare librdkafka python client | ||
kclient = PythonLibrdkafka(self.redpanda) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably look this up myself, but does kclient
create a new producer/consumer every time you call kclient.get_producer()
and kclient.get_consumer()
? Also are these methods thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I found, they are thread safe and actually a python interface to librdkafka (confluent_kafka module for python)
numbers = [] | ||
# Message consuming loop | ||
try: | ||
consumer.subscribe([target_topic]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's possible to subscribe to all topics within a single consumer group. It may be nice to do that here and ensure it's possible to consume all numbers from all the topics we produced to. Or maybe creating multiple consumers within a single consumer group that has all the topics we produced to. It'll likely complicate the validation logic though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goal was to make this as simple as possible and that the runtime would be under 10 min.
topic_count = 11950 | ||
batch_size = 2048 | ||
topic_name_length = 200 | ||
num_partitions = 1 | ||
num_replicas = 3 | ||
use_kafka_batching = True | ||
topic_name_prefix = \ | ||
f"topic-swarm-create-p{num_partitions}-r{num_replicas}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets write a python dataclass for the config parameters that this test and test_many_topics_throughput
share and create a few default config profiles for the tests. One for the values above and another for 40k topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this, created a series of tickets to work on: https://redpandadata.atlassian.net/browse/PESDLC-886
topic_prefixes.append(topic_name_prefix) | ||
# Free node that used to create topics | ||
self.cluster.free_single(node) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to wait until Redpanda.healthy
returns True here. I.e, until all the topics/partitions we've created have their replicas and leaders assigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few comments.
Single topic creation takes at least 1 sec. This PR utilizes several parallel threads and several nodes to create topics when using single topic creation method.
Also, it appears that kafka.client supports sending a lot of topic specs at the same time. And it is a lot faster.
8192 topics in a single request created in <9 sec.
Both methods retained in the code and is switchable via use_kafka_batching option.
Test has produce/consume stage that produces X number of messages to all topics and selectively checks one for proper message count consumed and proper data consumed
Migrated from #16463
Backports Required
Release Notes