feat: (re-)enable parallel processing in the work applier #234

michaelawyu · 2025-08-29T16:51:53Z

Description of your changes

This PR (re-)enables parallel processing in the work applier.

I have:

Run make reviewable to ensure this PR is ready for review.

How has this code been tested

See notes.

Special notes for your reviewer

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

michaelawyu · 2025-08-29T16:52:41Z

~~Due to a test complication; no tests are submitted in this PR at the moment. Will move additional tests to this PR soon.~~

codecov · 2025-08-29T17:20:24Z

Codecov Report

❌ Patch coverage is 94.06780% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pkg/controllers/workapplier/process.go	80.55%	6 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

--------- Signed-off-by: Zhiying Lin <zhiyingl456@gmail.com> Signed-off-by: michaelawyu <chenyu1@microsoft.com>

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

Signed-off-by: Zhiying Lin <zhiyingl456@gmail.com> Signed-off-by: michaelawyu <chenyu1@microsoft.com>

…sync (kubefleet-dev#231) * Minor fixes Signed-off-by: michaelawyu <chenyu1@microsoft.com> * Minor fixes Signed-off-by: michaelawyu <chenyu1@microsoft.com> * Revert the timeout change in kubefleet-dev#99 Signed-off-by: michaelawyu <chenyu1@microsoft.com> * Use eventually block Signed-off-by: michaelawyu <chenyu1@microsoft.com> * Experimental Signed-off-by: michaelawyu <chenyu1@microsoft.com> * Revert experimental changes Signed-off-by: michaelawyu <chenyu1@microsoft.com> --------- Signed-off-by: michaelawyu <chenyu1@microsoft.com>

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

Signed-off-by: Wantong Jiang <wantjian@microsoft.com> Signed-off-by: michaelawyu <chenyu1@microsoft.com>

michaelawyu · 2025-09-01T14:21:20Z

Note: to control PR size and avoid conflicts with other on-going changes in the work applier, only unit tests are added to this RP. Additional integration tests and E2E tests will be added later.

zhiying-lin · 2025-09-03T09:05:36Z

pkg/controllers/workapplier/waves.go

+	// Note (chenyu1): the waves below are based on the Helm resource installation
+	// order (see also the Helm source code). Similar objects are grouped together
+	// to achieve best performance.
+	defaultWaveNumberByResourceType = map[string]waveNumber{


we also defined the apply order in https://github.com/kubefleet-dev/kubefleet/blob/main/pkg/controllers/placement/resource_selector.go#L47

the orders are not consistent. for example, the ingressclasses.

Can we try to merge them? so that it's always consistent

Hi Zhiying! Yeah, I was a bit concerned about this too when composing the PR. The reason why we did it separately in this PR was that, if we do not group the resource types but process them invididually, there's a high chance that in each batch there's only 1-2 objects, which kind of defeats the purpose of doing the parallelization.

The PR also kinds of renders the apply order sorting we did on the hub cluster side redundant (we still need to sort the resources for stability reasons, but they do not have to be done in a specific order -> similar to how the work generator sorts enveloped objects).

At this moment I am leaning on grouping the resource types for larger batch sizes, but I do not have a particularly strong opinion on the subject matter -> happy to discuss about the topic further.

As for the apply order part on the hub cluster, will submit another PR to keep things clean after this one is merged.

(A side note: another reason I am a bit reluctant to change the apply order on the hub cluster side was that it might trigger a rollout on existing workloads, as it will register as a new resource snapshot IIRC? -> we might need to be a bit careful on this 😞)

these two orders serve different purposes. The hub one was the apply order before this PR but now it's more to keep the uniqueness for a list of resources (or we don't think there is any change) while the member side is more on the real apply order. Probably should change the name of the hub side list to reflect the fact that it's no longer the order that member will apply.

zhiying-lin

discussed offline, will update the applyOrder comment in a separate PR.

michaelawyu · 2025-09-05T13:52:59Z

discussed offline, will update the applyOrder comment in a separate PR.

Will do 👌

ryanzhang-oss · 2025-09-08T20:57:48Z

pkg/controllers/workapplier/waves.go

+	// Note (chenyu1): the waves below are based on the Helm resource installation
+	// order (see also the Helm source code). Similar objects are grouped together
+	// to achieve best performance.
+	defaultWaveNumberByResourceType = map[string]waveNumber{


these two orders serve different purposes. The hub one was the apply order before this PR but now it's more to keep the uniqueness for a list of resources (or we don't think there is any change) while the member side is more on the real apply order. Probably should change the name of the hub side list to reflect the fact that it's no longer the order that member will apply.

ryanzhang-oss · 2025-09-08T21:42:09Z

pkg/controllers/workapplier/waves.go

+	// Pre-allocate the map; 7 is the total count of default wave numbers, though
+	// not all wave numbers might be used.
+	waveByNum := make(map[waveNumber]*bundleProcessingWave, 7)
+


ni: shouldn't this be a bundleProcessingWave array instead? One can precreate all the bundleProcessingWave so there is no need for the getOrAddWave function anymore (but we can keep to save memory). I am not sure why we need to sort the map to produce an array at the end.

michaelawyu added 2 commits August 29, 2025 16:51

Minor fixes

b487f81

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

(Re-)enabled parallel processing in the work applier

588e5a9

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

Added unit tests

504bf47

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

michaelawyu changed the title ~~feat: (re-)enable parallel processing in the work applier [WIP]~~ feat: (re-)enable parallel processing in the work applier Sep 1, 2025

zhiying-lin and others added 6 commits September 2, 2025 00:19

test: add RP apply strategy tests (kubefleet-dev#226)

04b072c

--------- Signed-off-by: Zhiying Lin <zhiyingl456@gmail.com> Signed-off-by: michaelawyu <chenyu1@microsoft.com>

test: add taint and toleration tests for RP (kubefleet-dev#228)

afed79a

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

fix: validate if the resource should be placed (kubefleet-dev#229)

9239b27

Signed-off-by: Zhiying Lin <zhiyingl456@gmail.com> Signed-off-by: michaelawyu <chenyu1@microsoft.com>

test: add RP test for join and leave flow (kubefleet-dev#233)

da7297d

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

test: add e2e tests for RP for resource selection (kubefleet-dev#236)

bd289f1

Signed-off-by: Wantong Jiang <wantjian@microsoft.com> Signed-off-by: michaelawyu <chenyu1@microsoft.com>

michaelawyu marked this pull request as ready for review September 1, 2025 14:20

Merge branch 'main' into fix/envelope-e2e-naming-error

6674090

zhiying-lin reviewed Sep 3, 2025

View reviewed changes

zhiying-lin approved these changes Sep 5, 2025

View reviewed changes

michaelawyu merged commit 683a7b0 into kubefleet-dev:main Sep 5, 2025
12 checks passed

michaelawyu mentioned this pull request Sep 8, 2025

feat: update the comments regarding resource sort orders in CRP/RP controller and work generator #243

Merged

1 task

ryanzhang-oss reviewed Sep 8, 2025

View reviewed changes

michaelawyu mentioned this pull request Sep 9, 2025

test: add additional integration tests for waved work applier manifest processing #244

Merged

1 task

feat: (re-)enable parallel processing in the work applier #234

feat: (re-)enable parallel processing in the work applier #234

Uh oh!

Conversation

michaelawyu commented Aug 29, 2025

Description of your changes

How has this code been tested

Special notes for your reviewer

Uh oh!

michaelawyu commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

michaelawyu commented Sep 1, 2025

Uh oh!

zhiying-lin Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

michaelawyu Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

michaelawyu Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

michaelawyu Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

michaelawyu Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanzhang-oss Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

zhiying-lin left a comment

Choose a reason for hiding this comment

Uh oh!

michaelawyu commented Sep 5, 2025

Uh oh!

Uh oh!

ryanzhang-oss Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ryanzhang-oss Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelawyu commented Aug 29, 2025 •

edited

Loading

codecov bot commented Aug 29, 2025 •

edited

Loading

michaelawyu Sep 3, 2025 •

edited

Loading

michaelawyu Sep 3, 2025 •

edited

Loading