feat: fleet agent deployment configures tolerations from cluster CR #1154

rajiteh · 2022-12-03T06:39:33Z

Adding capability for tolerations in the agent #703
EPIC: Managed Fleet Agents should support cluster-specific overrides #712
fleet-agent deployment does not have minimum availability after taints are applied to dedicated workers rancher#35834

This PR introduces the ability for fleet admins to configure tolerations that must be included in the fleet agent deployment by setting them in each cluster CR's spec.

A new configuration field .agentTolerations provide a way to define a list of corev1.Toleration objects.

This solves the problem of not being able to use Fleet in clusters which all the nodes are tainted (no schedulable nodes for agent deployment)

Additional Information

This solution let's the user/system that manages fleet Cluster resources to configure the tolerations for that particular cluster's agent deployment.

Tradeoff

In agent initiated registration flow, the initial agent deployment will need to replicate the same set of tolerations.
ie: if using helm then we would have to set .kubectl.tolerations and fleetAgent.tolerations fields

However, it can be argued that one could simply set a catch all toleration - operator: Exists in the agent initiated flow which guarantees that the agent will get scheduled in the downstream cluster. Once it has communicated with the fleet controller, the new bundle pushed out by the controller will contain the proper tolerations as defined by the Cluster CR which schedule the pods in the appropriate nodes.

Potential improvement

manno · 2023-01-10T14:34:22Z

Hey thanks. I guess this will not conflict with the existing tolerations in https://github.com/rancher/fleet/blob/master/pkg/agent/manifest.go#L234-L244?

rajiteh · 2023-01-16T18:01:40Z

pkg/agent/manifest.go

@@ -93,6 +94,10 @@ func Manifest(namespace string, agentScope string, opts ManifestOptions) []runti
 	propagateDebug, _ := strconv.ParseBool(os.Getenv("FLEET_PROPAGATE_DEBUG_SETTINGS_TO_AGENTS"))
 	debug := logrus.IsLevelEnabled(logrus.DebugLevel) && propagateDebug
 	dep := agentDeployment(namespace, DefaultName, image, opts.AgentImagePullPolicy, DefaultName, false, debug)
+
+	// additional tolerations
+	dep.Spec.Template.Spec.Tolerations = append(dep.Spec.Template.Spec.Tolerations, opts.AgentTolerations...)


@manno it will preserve the tolerations defined by the agentDeployment func and append anything that the user sent. However, it is possible for the user defined tolerations to override the default ones if they have the same key, but I don't see why someone would do that intentionally as the default ones are rancher namespaced.

thardeck · 2023-01-19T09:26:43Z

@rajiteh could you provide ci tests for your pr? We try to improve our code base and having automated tests for new non trivial code changes helps a lot.
We have e2e (/e2e/) and integration tests (/integrationtests/) in place so you can see how we do it. If you have questions let us know.

Thanks in advance.

rajiteh · 2023-01-26T04:09:00Z

@thardeck @manno it was a little too tough for me to grok how the e2e stuff worked in this repo, I can't seem to figure out if there is any code that tests agent deployment behaviour and I'm too lost to come up with my own. I decided to add some unit tests to verify the functionality instead. Let me know if that's enough for now. Thanks!

manno · 2023-01-27T14:33:37Z

@rajiteh I don't think we have anything to really check agent deployment. Our E2E tests mostly apply Gitrepos and use kubectl to check the outcome.
The only test that comes close is our (basic) "installation test: https://github.com/rancher/fleet/blob/master/.github/workflows/fleet-upgrade.yml#L103

feat: fleet agent configures tolerations from cluster

226b867

rajiteh requested a review from a team as a code owner December 3, 2022 06:39

kkaempf added area/toleration kind/enhancement labels Dec 6, 2022

kkaempf added the kind/good-first-issue label Dec 16, 2022

manno previously approved these changes Jan 16, 2023

View reviewed changes

Mario Manno and others added 2 commits January 16, 2023 11:48

Merge branch 'master' into agent_tolerations

5cf0f4f

Merge branch 'master' into agent_tolerations

90c721a

rajiteh commented Jan 16, 2023

View reviewed changes

rajiteh dismissed manno’s stale review via f5fd0a3 January 26, 2023 03:54

test: test agent tolerations

e8a7f9b

rajiteh force-pushed the agent_tolerations branch from f5fd0a3 to e8a7f9b Compare January 26, 2023 04:05

manno approved these changes Jan 27, 2023

View reviewed changes

manno merged commit 9ad958d into rancher:master Jan 27, 2023

manno mentioned this pull request Mar 8, 2023

Adding capability for tolerations in the agent #703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fleet agent deployment configures tolerations from cluster CR #1154

feat: fleet agent deployment configures tolerations from cluster CR #1154

rajiteh commented Dec 3, 2022 •

edited

manno commented Jan 10, 2023

rajiteh Jan 16, 2023

thardeck commented Jan 19, 2023 •

edited

rajiteh commented Jan 26, 2023

manno commented Jan 27, 2023

feat: fleet agent deployment configures tolerations from cluster CR #1154

feat: fleet agent deployment configures tolerations from cluster CR #1154

Conversation

rajiteh commented Dec 3, 2022 • edited

Additional Information

Tradeoff

Potential improvement

manno commented Jan 10, 2023

rajiteh Jan 16, 2023

Choose a reason for hiding this comment

thardeck commented Jan 19, 2023 • edited

rajiteh commented Jan 26, 2023

manno commented Jan 27, 2023

rajiteh commented Dec 3, 2022 •

edited

thardeck commented Jan 19, 2023 •

edited