Skip to content

Conversation

@michaelawyu
Copy link
Collaborator

Description of your changes

This PR (re-)enables parallel processing in the work applier.

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

See notes.

Special notes for your reviewer

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
@michaelawyu
Copy link
Collaborator Author

michaelawyu commented Aug 29, 2025

Due to a test complication; no tests are submitted in this PR at the moment. Will move additional tests to this PR soon.

@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 94.06780% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/controllers/workapplier/process.go 80.55% 6 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
@michaelawyu michaelawyu changed the title feat: (re-)enable parallel processing in the work applier [WIP] feat: (re-)enable parallel processing in the work applier Sep 1, 2025
zhiying-lin and others added 6 commits September 2, 2025 00:19
---------

Signed-off-by: Zhiying Lin <zhiyingl456@gmail.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: Zhiying Lin <zhiyingl456@gmail.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
…sync (kubefleet-dev#231)

* Minor fixes

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

* Minor fixes

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

* Revert the timeout change in kubefleet-dev#99

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

* Use eventually block

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

* Experimental

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

* Revert experimental changes

Signed-off-by: michaelawyu <chenyu1@microsoft.com>

---------

Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: Wantong Jiang <wantjian@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
@michaelawyu michaelawyu marked this pull request as ready for review September 1, 2025 14:20
@michaelawyu
Copy link
Collaborator Author

Note: to control PR size and avoid conflicts with other on-going changes in the work applier, only unit tests are added to this RP. Additional integration tests and E2E tests will be added later.

// Note (chenyu1): the waves below are based on the Helm resource installation
// order (see also the Helm source code). Similar objects are grouped together
// to achieve best performance.
defaultWaveNumberByResourceType = map[string]waveNumber{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also defined the apply order in https://github.com/kubefleet-dev/kubefleet/blob/main/pkg/controllers/placement/resource_selector.go#L47

the orders are not consistent. for example, the ingressclasses.

Can we try to merge them? so that it's always consistent

Copy link
Collaborator Author

@michaelawyu michaelawyu Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Zhiying! Yeah, I was a bit concerned about this too when composing the PR. The reason why we did it separately in this PR was that, if we do not group the resource types but process them invididually, there's a high chance that in each batch there's only 1-2 objects, which kind of defeats the purpose of doing the parallelization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR also kinds of renders the apply order sorting we did on the hub cluster side redundant (we still need to sort the resources for stability reasons, but they do not have to be done in a specific order -> similar to how the work generator sorts enveloped objects).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment I am leaning on grouping the resource types for larger batch sizes, but I do not have a particularly strong opinion on the subject matter -> happy to discuss about the topic further.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the apply order part on the hub cluster, will submit another PR to keep things clean after this one is merged.

Copy link
Collaborator Author

@michaelawyu michaelawyu Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A side note: another reason I am a bit reluctant to change the apply order on the hub cluster side was that it might trigger a rollout on existing workloads, as it will register as a new resource snapshot IIRC? -> we might need to be a bit careful on this 😞)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two orders serve different purposes. The hub one was the apply order before this PR but now it's more to keep the uniqueness for a list of resources (or we don't think there is any change) while the member side is more on the real apply order. Probably should change the name of the hub side list to reflect the fact that it's no longer the order that member will apply.

Copy link
Collaborator

@zhiying-lin zhiying-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, will update the applyOrder comment in a separate PR.

@michaelawyu
Copy link
Collaborator Author

discussed offline, will update the applyOrder comment in a separate PR.

Will do 👌

@michaelawyu michaelawyu merged commit 683a7b0 into kubefleet-dev:main Sep 5, 2025
12 checks passed
// Note (chenyu1): the waves below are based on the Helm resource installation
// order (see also the Helm source code). Similar objects are grouped together
// to achieve best performance.
defaultWaveNumberByResourceType = map[string]waveNumber{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two orders serve different purposes. The hub one was the apply order before this PR but now it's more to keep the uniqueness for a list of resources (or we don't think there is any change) while the member side is more on the real apply order. Probably should change the name of the hub side list to reflect the fact that it's no longer the order that member will apply.

Comment on lines +118 to +121
// Pre-allocate the map; 7 is the total count of default wave numbers, though
// not all wave numbers might be used.
waveByNum := make(map[waveNumber]*bundleProcessingWave, 7)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ni: shouldn't this be a bundleProcessingWave array instead? One can precreate all the bundleProcessingWave so there is no need for the getOrAddWave function anymore (but we can keep to save memory). I am not sure why we need to sort the map to produce an array at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants