New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

PESDLC-901 Tests to create 10k+ topics and do checks with low throughput #16750

Merged

savex merged 5 commits into dev from PESDLC-886-large-topics-numbers

Feb 28, 2024

Contributor

savex commented Feb 27, 2024

Single topic creation takes at least 1 sec. This PR utilizes several parallel threads and several nodes to create topics when using single topic creation method.

Also, it appears that kafka.client supports sending a lot of topic specs at the same time. And it is a lot faster.
8192 topics in a single request created in <9 sec.

Both methods retained in the code and is switchable via use_kafka_batching option.

Test has produce/consume stage that produces X number of messages to all topics and selectively checks one for proper message count consumed and proper data consumed

Migrated from #16463

Backports Required

Release Notes

none

mergify bot mentioned this pull request

PESDLC-886 Tests to create 10k+ topics and do checks with low throughput #16463

Closed

7 tasks

savex marked this pull request as ready for review

February 27, 2024 16:24

savex requested review from travisdowns and ballard26

February 27, 2024 16:27

savex added 5 commits

February 27, 2024 10:33


          PESDLC-886 Script for topic batch creation

c77537a

    Script uses confluent_kafka to send requests to redpanda
    it has ability to skip topic name randomization along with
    ability to select prefix in topic script and skip
    randomization related checks if not needed


          PESDLC-886 Add unique topics handling to ProducerSwarm

138ae54


          PESDLC-886 Update rpk.list_topics with detailed mode

96b455a


          PESDLC-886 test_topic_swarm with remote topic creation script

941918a


          PESDLC-886 Topic throughput test

d334b83

   Test accounts for minimal node configuration if i3en.xlarge
   with 2 vcpus.

savex force-pushed the PESDLC-886-large-topics-numbers branch from ba5776c to d334b83 Compare

February 27, 2024 16:34

ballard26 reviewed

View reviewed changes

tests/rptest/remote_scripts/topic_operations.py

+                          write_json(sys.stdout, {'timings': timings})
+                          # Exit on threshold > 5 min
+                          # I.e. single topic creation takes more than 5 min
+                          if timings["creation-time-max"] > 300:

Contributor

ballard26 Feb 27, 2024

nit: It would be nicer to proactively timeout after 5mins. Rather than this approach of waiting however long topic creation(for a batch or single topic) takes then erroring out if it takes longer then 5min. It's something that can be done in a follow up PR though

Contributor Author

savex Feb 27, 2024

Agreed. Will update.

Collaborator

vbotbuildovich commented Feb 27, 2024 •

edited

new failures in https://buildkite.com/redpanda/redpanda/builds/45416#018debaf-8194-48ed-9f59-b7b826781e9f:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"

new failures in https://buildkite.com/redpanda/redpanda/builds/45416#018dec80-3f72-4e34-a0a4-0b6c58d770ff:

"rptest.tests.cpu_profiler_admin_api_test.CPUProfilerAdminAPITest.test_get_cpu_profile_with_override"

ballard26 reviewed

View reviewed changes

tests/rptest/scale_tests/many_partitions_test.py

+                          )
+                          return None
+                  def _write_and_random_read_many_topics(self, num_topics, topic_names):

Contributor

ballard26 Feb 27, 2024 •

edited

nit: It would be nice to split these utility methods into separate classes to make them reusable in other tests. Not something we should focus on doing right now though.

ballard26 reviewed

View reviewed changes

tests/rptest/scale_tests/many_partitions_test.py

+                          return (sum(numpy.diff(sorted(numbers_list)) == 1) >= n)
+                      # Prepare librdkafka python client
+                      kclient = PythonLibrdkafka(self.redpanda)

Contributor

ballard26 Feb 27, 2024

Should probably look this up myself, but does kclient create a new producer/consumer every time you call kclient.get_producer() and kclient.get_consumer()? Also are these methods thread-safe?

Contributor Author

savex Feb 27, 2024

From what I found, they are thread safe and actually a python interface to librdkafka (confluent_kafka module for python)

ballard26 reviewed

View reviewed changes

tests/rptest/scale_tests/many_partitions_test.py

+                      numbers = []
+                      # Message consuming loop
+                      try:
+                          consumer.subscribe([target_topic])

Contributor

ballard26 Feb 27, 2024

nit: it's possible to subscribe to all topics within a single consumer group. It may be nice to do that here and ensure it's possible to consume all numbers from all the topics we produced to. Or maybe creating multiple consumers within a single consumer group that has all the topics we produced to. It'll likely complicate the validation logic though.

Contributor Author

savex Feb 27, 2024

Goal was to make this as simple as possible and that the runtime would be under 10 min.

ballard26 reviewed

View reviewed changes

tests/rptest/scale_tests/many_partitions_test.py

Comment on lines +1253 to +1260

+                      topic_count = 11950
+                      batch_size = 2048
+                      topic_name_length = 200
+                      num_partitions = 1
+                      num_replicas = 3
+                      use_kafka_batching = True
+                      topic_name_prefix = \
+                          f"topic-swarm-create-p{num_partitions}-r{num_replicas}"

Contributor

ballard26 Feb 27, 2024

Lets write a python dataclass for the config parameters that this test and test_many_topics_throughput share and create a few default config profiles for the tests. One for the values above and another for 40k topics.

Contributor Author

savex Feb 27, 2024

Based on this, created a series of tickets to work on: https://redpandadata.atlassian.net/browse/PESDLC-886

ballard26 reviewed

View reviewed changes

tests/rptest/scale_tests/many_partitions_test.py

+                          topic_prefixes.append(topic_name_prefix)
+                      # Free node that used to create topics
+                      self.cluster.free_single(node)

Contributor

ballard26 Feb 27, 2024

We may want to wait until Redpanda.healthy returns True here. I.e, until all the topics/partitions we've created have their replicas and leaders assigned.

ballard26 approved these changes

View reviewed changes

Contributor

ballard26 left a comment

LGTM, just a few comments.

savex changed the title ~~PESDLC-886 Tests to create 10k+ topics and do checks with low throughput~~ PESDLC-901 Tests to create 10k+ topics and do checks with low throughput

savex merged commit a06aa46 into dev

17 checks passed

savex deleted the PESDLC-886-large-topics-numbers branch

February 28, 2024 17:41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment