Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PESDLC-901 Tests to create 10k+ topics and do checks with low throughput #16750

Merged
merged 5 commits into from
Feb 28, 2024

Conversation

savex
Copy link
Contributor

@savex savex commented Feb 27, 2024

Single topic creation takes at least 1 sec. This PR utilizes several parallel threads and several nodes to create topics when using single topic creation method.

Also, it appears that kafka.client supports sending a lot of topic specs at the same time. And it is a lot faster.
8192 topics in a single request created in <9 sec.

Both methods retained in the code and is switchable via use_kafka_batching option.

Test has produce/consume stage that produces X number of messages to all topics and selectively checks one for proper message count consumed and proper data consumed

Migrated from #16463

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

    Script uses confluent_kafka to send requests to redpanda
    it has ability to skip topic name randomization along with
    ability to select prefix in topic script and skip
    randomization related checks if not needed
   Test accounts for minimal node configuration if i3en.xlarge
   with 2 vcpus.
@savex savex force-pushed the PESDLC-886-large-topics-numbers branch from ba5776c to d334b83 Compare February 27, 2024 16:34
write_json(sys.stdout, {'timings': timings})
# Exit on threshold > 5 min
# I.e. single topic creation takes more than 5 min
if timings["creation-time-max"] > 300:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be nicer to proactively timeout after 5mins. Rather than this approach of waiting however long topic creation(for a batch or single topic) takes then erroring out if it takes longer then 5min. It's something that can be done in a follow up PR though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will update.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 27, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/45416#018debaf-8194-48ed-9f59-b7b826781e9f:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/45416#018dec80-3f72-4e34-a0a4-0b6c58d770ff:

"rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile_with_override"

)
return None

def _write_and_random_read_many_topics(self, num_topics, topic_names):
Copy link
Contributor

@ballard26 ballard26 Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be nice to split these utility methods into separate classes to make them reusable in other tests. Not something we should focus on doing right now though.

return (sum(numpy.diff(sorted(numbers_list)) == 1) >= n)

# Prepare librdkafka python client
kclient = PythonLibrdkafka(self.redpanda)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably look this up myself, but does kclient create a new producer/consumer every time you call kclient.get_producer() and kclient.get_consumer()? Also are these methods thread-safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I found, they are thread safe and actually a python interface to librdkafka (confluent_kafka module for python)

numbers = []
# Message consuming loop
try:
consumer.subscribe([target_topic])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's possible to subscribe to all topics within a single consumer group. It may be nice to do that here and ensure it's possible to consume all numbers from all the topics we produced to. Or maybe creating multiple consumers within a single consumer group that has all the topics we produced to. It'll likely complicate the validation logic though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goal was to make this as simple as possible and that the runtime would be under 10 min.

Comment on lines +1253 to +1260
topic_count = 11950
batch_size = 2048
topic_name_length = 200
num_partitions = 1
num_replicas = 3
use_kafka_batching = True
topic_name_prefix = \
f"topic-swarm-create-p{num_partitions}-r{num_replicas}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets write a python dataclass for the config parameters that this test and test_many_topics_throughput share and create a few default config profiles for the tests. One for the values above and another for 40k topics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this, created a series of tickets to work on: https://redpandadata.atlassian.net/browse/PESDLC-886

topic_prefixes.append(topic_name_prefix)
# Free node that used to create topics
self.cluster.free_single(node)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to wait until Redpanda.healthy returns True here. I.e, until all the topics/partitions we've created have their replicas and leaders assigned.

Copy link
Contributor

@ballard26 ballard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few comments.

@savex savex changed the title PESDLC-886 Tests to create 10k+ topics and do checks with low throughput PESDLC-901 Tests to create 10k+ topics and do checks with low throughput Feb 27, 2024
@savex savex merged commit a06aa46 into dev Feb 28, 2024
17 checks passed
@savex savex deleted the PESDLC-886-large-topics-numbers branch February 28, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants