Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] - Topic Operator Impact of Batch Size and Linger Settings on Kafka Topic Operations #10138

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

see-quick
Copy link
Member

@see-quick see-quick commented May 21, 2024

Type of change

  • Enhancement / new feature

Description

This PR focuses on exploring the impact of different configurations on the efficiency of creating, modifying, and deleting Kafka topics. I've played around with a range of batch sizes and linger durations to see how they affect performance across different scales of topic counts.

Based on this graph (KRaft):

image

One can see that I have tried multiple configurations with batch sizes and linger settings stretching from 1ms to 2000ms. Moreover, the range of topics which I tested was from 50 to 1000 to see some pattern if such configuration scaling well or if there are some problems....(that could be viewed on each curve). This could help us understand the capabilities of UTO with various settings and scale with the best configuration.

I have also implemented the way how we create the events. Currently, we are doing it sequentially and now I have modified it and used ExecutorService to manage and process batches concurrently. More on that in the Javadoc...

[1] - #10050 (review)

Checklist

  • Write tests
  • Make sure all tests pass

@see-quick see-quick self-assigned this May 21, 2024
@see-quick
Copy link
Member Author

/packit test --labels performance-topic-operator-capacity

@see-quick see-quick added this to the 0.42.0 milestone May 21, 2024
@see-quick
Copy link
Member Author

@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testCapacityCreateAndUpdateTopics --env=STRIMZI_USE_KRAFT_IN_TESTS=true

@strimzi-ci
Copy link

▶️ Build started - check Jenkins for more info. ▶️

@strimzi-ci
Copy link

❌ Test Summary ❌

TEST_PROFILE: performance
GROUPS:
TEST_CASE: TopicOperatorPerformance#testCapacityCreateAndUpdateTopics
TOTAL: 6
PASS: 0
FAIL: 6
SKIP: 0
BUILD_NUMBER: 79
OCP_VERSION: 4.15
BUILD_IMAGES: false
FIPS_ENABLED: false
PARALLEL_COUNT: 5
EXCLUDED_GROUPS: loadbalancer,nodeport,olm
ENV_VARIABLES: STRIMZI_USE_KRAFT_IN_TESTS=true

❗ Test Failures ❗

  • testCapacityCreateAndUpdateTopics[6] 1000, 2000 in io.strimzi.systemtest.performance.TopicOperatorPerformance

Re-run command:
@strimzi-ci run tests --profile=performance --testcase=io.strimzi.systemtest.performance.TopicOperatorPerformance#testCapacityCreateAndUpdateTopics[6] 1000, 2000

@see-quick
Copy link
Member Author

@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testCapacityCreateAndUpdateTopics --env=STRIMZI_USE_KRAFT_IN_TESTS=true

@strimzi-ci
Copy link

▶️ Build started - check Jenkins for more info. ▶️

@strimzi-ci
Copy link

❌ Test Summary ❌

TEST_PROFILE: performance
GROUPS:
TEST_CASE: TopicOperatorPerformance#testCapacityCreateAndUpdateTopics
TOTAL: 6
PASS: 0
FAIL: 6
SKIP: 0
BUILD_NUMBER: 80
OCP_VERSION: 4.15
BUILD_IMAGES: false
FIPS_ENABLED: false
PARALLEL_COUNT: 5
EXCLUDED_GROUPS: loadbalancer,nodeport,olm
ENV_VARIABLES: STRIMZI_USE_KRAFT_IN_TESTS=true

❗ Test Failures ❗

  • testCapacityCreateAndUpdateTopics[6] 1000, 2000 in io.strimzi.systemtest.performance.TopicOperatorPerformance

Re-run command:
@strimzi-ci run tests --profile=performance --testcase=io.strimzi.systemtest.performance.TopicOperatorPerformance#testCapacityCreateAndUpdateTopics[6] 1000, 2000

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @see-quick, thanks for working on this.

In order to simulate a busy shared cluster and possibly catch some edge cases, I think we should try to include all 3 kind of topic events (creations, updates and deletes) and run them in parallel.

In my custom test, I'm taking the number of events I want to test as input, then I divide them by 3 to get the number of tasks I have to run in parallel (you would have 1/2 spare events that you can simply consume as noop, that's fine). Each task executes topic creation, update (partition increase and config change) and deletion serially. Wdyt?

@see-quick
Copy link
Member Author

see-quick commented May 27, 2024

Hi @see-quick, thanks for working on this.

In order to simulate a busy shared cluster and possibly catch some edge cases, I think we should try to include all 3 kind of topic events (creations, updates and deletes) and run them in parallel.

In my custom test, I'm taking the number of events I want to test as input, then I divide them by 3 to get the number of tasks I have to run in parallel (you would have 1/2 spare events that you can simply consume as noop, that's fine). Each task executes topic creation, update (partition increase and config change) and deletion serially. Wdyt?

Okay, but that way we would not be able to see the upper bound (i.e., how many KafkaTopics is TO able to handle in creation and modification). Maybe such information is not so important....and if we make all these three operations what is our termination condition? Do we want to create a specific number of topics (e.g., 1000) and see how TO is performing on different configurations? What are then the most important OUT metrics to check? Also, should we execute these tasks incrementally and divide them into batches (i.e., every 100 KafkaTopics?) as we do capacity or should we run all 1000 topics at once?

@fvaleri
Copy link
Contributor

fvaleri commented May 27, 2024

we would not be able to see the upper bound (i.e., how many KafkaTopics is TO able to handle in creation and modification)

I think the objective here is not see the upper bound, but to assess performance in a fixed size of events. For example, I'm running test with the following batch of events: 50, 100, 150, ..., 1000. That way, you see how it scales, by simply putting the end-to-end reconciliation (we only care about this one here) time on a line graph, and you can compare with a previous implementation on the very same graph.

With e2e reconciliation time in seconds I mean the time from creation/update to ready, or deletion duration. This is how an example graph looks like (note: we only need number, then you can generate the graph with whatever tool you prefer):

Screenshot from 2024-05-27 11-01-57

@see-quick
Copy link
Member Author

@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true

1 similar comment
@see-quick
Copy link
Member Author

@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true

@strimzi-ci
Copy link

▶️ Build started - check Jenkins for more info. ▶️

@strimzi-ci
Copy link

Systemtests Failed (no tests results are present)

@see-quick
Copy link
Member Author

@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true

@strimzi-ci
Copy link

▶️ Build started - check Jenkins for more info. ▶️

@strimzi-ci
Copy link

✔️ Test Summary ✔️

TEST_PROFILE: null
GROUPS: null
TEST_CASE: null
TOTAL: 120
PASS: 120
FAIL: 0
SKIP: 0
BUILD_NUMBER: 82
OCP_VERSION: null
BUILD_IMAGES: false
FIPS_ENABLED: null
PARALLEL_COUNT: null
ENV_VARIABLES: STRIMZI_USE_KRAFT_IN_TESTS=true

@see-quick
Copy link
Member Author

see-quick commented May 30, 2024

we would not be able to see the upper bound (i.e., how many KafkaTopics is TO able to handle in creation and modification)

I think the objective here is not see the upper bound, but to assess performance in a fixed size of events. For example, I'm running test with the following batch of events: 50, 100, 150, ..., 1000. That way, you see how it scales, by simply putting the end-to-end reconciliation (we only care about this one here) time on a line graph, and you can compare with a previous implementation on the very same graph.

With e2e reconciliation time in seconds I mean the time from creation/update to ready, or deletion duration. This is how an example graph looks like (note: we only need number, then you can generate the graph with whatever tool you prefer):

Screenshot from 2024-05-27 11-01-57

So I have tried 6 configurations here:

a) with internal metric - strimzi max reconciliation

image

b) with external metric - duration of all operations (i.e., create, modify and delete + readiness)

image

@see-quick see-quick changed the title [performance] - TopicOperator capacity create & modify [performance] - Topic Operator Impact of Batch Size and Linger Settings on Kafka Topic Operations May 30, 2024
@see-quick see-quick marked this pull request as ready for review May 30, 2024 09:10
@see-quick see-quick requested a review from a team May 30, 2024 21:26
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@see-quick nice work.

I left some improvement suggestions, but the base logic is there.

I would also try with BS 100 and LMS 10.

@see-quick
Copy link
Member Author

@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true

@strimzi-ci
Copy link

▶️ Build started - check Jenkins for more info. ▶️

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@see-quick see-quick requested a review from a team June 19, 2024 08:20
Copy link
Contributor

@henryZrncik henryZrncik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

Comment on lines +53 to +58
LOGGER.info("Resolved performance log directory: {} for use case '{}'. Max batch size: {}, Max linger time: {}, " +
"Clients enabled: {}, Number of topics: {}, Process type: {}",
topicOperatorUseCasePathDir, useCaseName, maxBatchSize, maxBatchLingerMs, clientsEnabled,
numberOfTopics.equals("0") ? "not included" : numberOfTopics, processType == null ? "not included" : processType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, should it be on info level?
Also, the name of the directory is self explaining, so maybe printing the full path would be sufficient? That way you would not need the processType there and checking if it's null or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, should it be on info level?

I can change it to debug...but I don't see any problem with having it to INFO cause it's only printed once... :)

}

private static void performCreationWithWait(int start, int end, ExtensionContext currentContext, TestStorage testStorage) {
ResourceManager.setTestContext(currentContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the test context is set here, isn't it something that RM should handle by itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this method is invoked in the new Thread. Therefore if you do not set context you would end up in the NPE because a new Thread does not hold the state of the ExtensionContext, and so you need to set it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I would add a comment about this there, why it's needed. (I know that for you it makes sense, but when someone else will go through the code, it will be similarly confusing for him as it was for me)

Comment on lines +261 to +197
if (warmUpTasksToProcess != 0) {
// boundary between tests => less likelihood that tests would influence each other
LOGGER.info("Cooling down");
try {
Thread.sleep(30_000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine that you are giving some space between the tests, but I'm not sure I understand how they can influence each other. You have deployed one Kafka cluster and you are just deploying the KafkaTopics with different configurations, so UTO would have some kind of issue with that? The magical number there is not helping much as well.
And Cooling down is not much saying TBH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as @fvaleri suggested with a smaller checking interval would it make sense? I am also not sure how much it helps cause mine is a little bit different from yours (i.e., you have a shared Kafka cluster where you do creation/modification and deletion) but in this case, we deploy a brand new Kafka cluster. So if that still makes sense I can make it to 10 seconds or 5 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's needed, fine let's have some lower duration of this sleep. But I would definitely change the log message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You named it TopicOperatorPerformanceUtils, but it seems that you are working a lot of with threading. Should it be some kind of object or different handler rather than "utils"?

Copy link
Member Author

@see-quick see-quick Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure...there not so much going on in that class. Only starting/stopping executor :) If you think it needs a separate handling class I could create that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "utils" makes it a bit weird. Because of the stuff you are doing in there. But maybe it's just me, WDYT @strimzi/system-test-contributors

Comment on lines +744 to +764
private static Stream<Arguments> provideConfigurationsForTestSystemScalability() {
final int seed = 50;
final int limit = 20; // up to 1000 topics to create

// this means we would invoke create/update/delete operations x times in one iteration
int[] batchEventSizes = IntStream.iterate(seed, n -> n + seed).limit(limit).toArray();
return Stream.of(
new Object[][]{
{"100", "100"}, // Default configuration
{"100", "10"} // Default batch size, with lower linger time
}
)
.flatMap(config -> Arrays.stream(batchEventSizes)
.mapToObj(numberOfTopics -> {
int eventPerTask = 3;
int numberOfTasks = numberOfTopics / eventPerTask;
int spareEvents = numberOfTopics % eventPerTask;

return Arguments.of(config[0], config[1], numberOfTasks, spareEvents);
}));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned offline, it's fine that you are doing these things, but what about having some smaller configuration that will be run for every pipeline and then some possibility to users that will run the tests (IMHO it will be mainly @fvaleri) to actually configure the stuff with what the tests should be run.
With this, it will run for a long time and in case that the test fails, you will, again, have to re-run all the tests? That doesn't apply just for this test, but for all other perf tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. IMO, the full test is something to run on demand, mainly to check some big change to the Topic Operator. This is something every maintainer or contributor can do while developing or reviewing, not just me :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah well, I agree that this should do every maintainer or contributor. But I know how it ends actually. I think it would be more user-friendly if these kind of tests could be more configurable. It's also better for debugging the issues, where you would just pass the config to the test and would not have to exclude the testcases or waiting a long time before you come to the point which interests you.

That way you will be able to remove the cool downs and other stuff that are present in the suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cooling down is not bad, but we can reduce to 10 or 5 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Lukas. Why do we have this pipeline that seems to be quite complex, AFAIU it is not easy to trigger just a specific configuration but you have to run all of the tests and wait...it doesn't sounds great. We have troubles for executing acceptance tests or regression on TF so having one huge pipeline with this execution time is not much usable from my POV.

Why we don't have something a little bit shorter? Like scenario where you will create 500 topics and check it will be finished in some reasonable time? Or do we already have this kind of baseline that could be executed as part of regression? Kafka has it's benchmarks tests and AFAIR these doesn't take 10+hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objective here is to test the TO scalability, so you need to scale with increasing number of topic events. This test doesn't take that long when running locally, but I see your concerns when running on CI pipeline with less resources. In order to make it faster we can maybe run a shorter sequence of batches (i.e. 200, 400, 800, 1000). This takes around 2 minutes on my laptop. Would that work? I'm also not sure on which pipeline this would run and how it can be triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess even 1-2 hours are fine, but when I saw the pipeline running in Jenkins last week, it timeouted after 22h which is crazy. The timeout on TF is not set to 30h which is crazy as well. If it takes only 2 minutes, why we need 30h timeout? Regarding resources....30GB RAM and 8CPU is not so small I would say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it takes only 2 minutes, why we need 30h timeout?

It's around two minutes with the changes I proposed in my previous comment.

I actually didn't review the pipeline config as I don't know it very well, but I guess it was tuned for the initial implementation a needs to be revised now.

@see-quick knows more for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous commits, we have like 7 different configurations and each configuration tried to go from 50 to 1000 events (increasing per 50). Now we have only 2 configurations. :) So execution time is now not a problem.

As I mentioned offline, it's fine that you are doing these things, but what about having some smaller configuration that will be run for every pipeline and then some possibility to users that will run the tests (IMHO it will be mainly @fvaleri) to actually configure the stuff with what the tests should be run.
With this, it will run for a long time and in case that the test fails, you will, again, have to re-run all the tests? That doesn't apply just for this test, but for all other perf tests.

Okay, so we can decrease everything to pick only two configurations... 👍 as we have now this scalability test WDYT?

systemtest/tmt/tests/strimzi/main.fmf Outdated Show resolved Hide resolved
@see-quick see-quick force-pushed the topic-oeprator-perf-capacity branch from 0fba37c to 003f5ff Compare June 24, 2024 09:09
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
@see-quick see-quick force-pushed the topic-oeprator-perf-capacity branch from 3b5b10f to 0a40e7d Compare June 26, 2024 07:11
@scholzj scholzj modified the milestones: 0.42.0, 0.43.0 Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants