Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Use multiple topics in OMBValidationTest.test_max_partitions #16000

Merged
merged 1 commit into from
Jan 10, 2024

Conversation

StephanDollberg
Copy link
Member

Many kafka client libraries (java, librdkafka) perform poorly when
producing to very high partition topics. This leads to one thread not
being fast enough to produce at a reasonable batch size.

In a tier 6 test we were seeing batch size slowly growing all the way to
85K. This isn't great because it means we are not running the test in a
stable state and this becomes more a client benchmark. Further lower
batch sizes are actually more demanding for RP itself.

To avoid this problem we split the total partition count up across
multiple topics and limit each to a max of 5000. Each producer will only
produce to one topic and hence it's easier on them.

This is similar to a pattern that we have already seen employed by
customers with similar high partition counts.

With this change batch size stays stable at 10K from the beginning of
the test. Total producer/consumer count might be slightly different
because of rounding but that shouldn't be an issue as the calculation
for the optimal producer count is fairly conservative already.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

MAX_PARTITIONS_PER_TOPIC = 5000
topics = math.ceil(tier_limits.max_partition_count / MAX_PARTITIONS_PER_TOPIC)

partitions_per_topic = tier_limits.max_partition_count // topics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you could do a ciel(... / topics) here as above instead of // so we are conservative with the rounding: if not exact we will test slightly more than the advertised number rather than slightly less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep makes sense, amended.

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit

@StephanDollberg StephanDollberg force-pushed the stephan/omb-validation-max-part-multiple-topics branch from 1b61a13 to 9e6fe57 Compare January 9, 2024 13:55
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 9, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/43594#018ceebf-09c3-49b2-90d6-cbb84c463174:

"rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage.cleanup_policy=delete"

new failures in https://buildkite.com/redpanda/redpanda/builds/43603#018cef30-8c94-4307-bb82-119903587126:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_upgrade"

new failures in https://buildkite.com/redpanda/redpanda/builds/43603#018cef30-8c9e-429b-95e6-a22815756a73:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_rollback"

new failures in https://buildkite.com/redpanda/redpanda/builds/43603#018cef30-8c98-4137-876a-9c810e8276b1:

"rptest.tests.cluster_features_test.FeaturesSingleNodeUpgradeTest.test_upgrade"
"rptest.tests.cluster_features_test.FeaturesNodeJoinTest.test_old_node_join"

new failures in https://buildkite.com/redpanda/redpanda/builds/43603#018cef59-05b6-4c6e-be89-6010edea1167:

"rptest.tests.cluster_features_test.FeaturesSingleNodeUpgradeTest.test_upgrade"
"rptest.tests.cluster_features_test.FeaturesNodeJoinTest.test_old_node_join"

new failures in https://buildkite.com/redpanda/redpanda/builds/43603#018cef59-05b3-4c89-81d2-0a09216476c1:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_upgrade"

new failures in https://buildkite.com/redpanda/redpanda/builds/43603#018cef59-05b0-44c6-bf8f-6864f88acb21:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_rollback"

Many kafka client libraries (java, librdkafka) perform poorly when
producing to very high partition topics. This leads to one thread not
being fast enough to produce at a reasonable batch size.

In a tier 6 test we were seeing batch size slowly growing all the way to
85K. This isn't great because it means we are not running the test in a
stable state and this becomes more a client benchmark. Further lower
batch sizes are actually more demanding for RP itself.

To avoid this problem we split the total partition count up across
multiple topics and limit each to a max of 5000. Each producer will only
produce to one topic and hence it's easier on them.

This is similar to a pattern that we have already seen employed by
customers with similar high partition counts.

With this change batch size stays stable at 10K from the beginning of
the test. Total producer/consumer count might be slightly different
because of rounding but that shouldn't be an issue as the calculation
for the optimal producer count is fairly conservative already.
@StephanDollberg StephanDollberg force-pushed the stephan/omb-validation-max-part-multiple-topics branch from 9e6fe57 to e89fea7 Compare January 9, 2024 16:16
@StephanDollberg
Copy link
Member Author

Linted

@StephanDollberg
Copy link
Member Author

Those failures are all unrelated to this test (caused by adding the 24.1 tag).

@piyushredpanda piyushredpanda merged commit f3eb08b into dev Jan 10, 2024
14 of 17 checks passed
@piyushredpanda piyushredpanda deleted the stephan/omb-validation-max-part-multiple-topics branch January 10, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants