Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: use 10 nodes in OMB validation tests #16028

Merged

Conversation

travisdowns
Copy link
Member

Drop the assumed number of nodes from 12 to 10 in OMB validation tests as that's the default number of nodes in duck.py and also the number we want to standardize on for all HTT tests.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

Drop the assumed number of nodes from 12 to 10 in OMB validation
tests as that's the default number of nodes in duck.py and also the
number we want to standardize on for all HTT tests.
Copy link
Contributor

@ballard26 ballard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming there is still enough swarm workers to reach the desired connection count for T5+ then this LGTM.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 9, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/43609#018cefe3-f837-465d-b05b-e63c4f633639:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_rollback"

new failures in https://buildkite.com/redpanda/redpanda/builds/43609#018cefe3-f82d-4d82-b86b-a89ab72fbdcd:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_upgrade"

new failures in https://buildkite.com/redpanda/redpanda/builds/43609#018cefe3-f831-45a0-863c-224246594d66:

"rptest.tests.cluster_features_test.FeaturesSingleNodeUpgradeTest.test_upgrade"
"rptest.tests.cluster_features_test.FeaturesNodeJoinTest.test_old_node_join"

new failures in https://buildkite.com/redpanda/redpanda/builds/43609#018ceff4-fb26-4ae8-a327-7a343371e19a:

"rptest.tests.cluster_features_test.FeaturesSingleNodeUpgradeTest.test_upgrade"
"rptest.tests.cluster_features_test.FeaturesNodeJoinTest.test_old_node_join"

new failures in https://buildkite.com/redpanda/redpanda/builds/43609#018ceff4-fb23-4aa9-a514-ab050ad6f0b3:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_upgrade"

new failures in https://buildkite.com/redpanda/redpanda/builds/43609#018ceff4-fb20-4ab9-9792-3b713d725002:

"rptest.tests.cluster_features_test.FeaturesMultiNodeUpgradeTest.test_rollback"

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 9, 2024

@@ -320,14 +325,14 @@ def test_max_partitions(self):
self.redpanda,
"ACK_ALL_GROUP_LINGER_1MS_IDEM_MAX_IN_FLIGHT",
(workload, validator),
num_workers=10,
num_workers=self.CLUSTER_NODES - 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OMB make use of uneven nodes in "ensemble" mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it assigns any "odd" worker to the consumer side, which I guess is good for us since we use 3:1 and 2:1 ratios, so maybe consumers are more likely to be overloaded (not in the many-partitions case though I guess due to the issue with high CPU in the drain loop there).

	int numberOfProducerWorkers = extraConsumerWorkers ? (workers.size() + 2) / 3 : workers.size() / 2;
	List<List<String>> partitions = Lists.partition(Lists.reverse(workers), workers.size() - numberOfProducerWorkers);
	this.producerWorkers = partitions.get(1);
	this.consumerWorkers = partitions.get(0);

(AFAIK we don't set extraConsumerWorkers)

@travisdowns
Copy link
Member Author

Assuming there is still enough swarm workers to reach the desired connection count for T5+ then this LGTM.

There should be, I tested with 10 nodes when I was doing T5 but not necessarily with the exact same machine type as we are suggesting changing to.

Overall it seems like since we are changing the machine type and count, and all the tests are currently disabled in CI, we will have to run (through BK I guess, not manually) at some point all the various tiers again but for now we can stuff in as many possibly-invalidating changes as possible so that we only have to do that once at the end.

@travisdowns
Copy link
Member Author

/ci-repeat

Copy link
Member

@andrewhsu andrewhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@travisdowns
Copy link
Member Author

travisdowns commented Jan 12, 2024

@travisdowns
Copy link
Member Author

/ci-repeat

@travisdowns travisdowns merged commit 5e38dec into redpanda-data:dev Jan 12, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants