Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fleet agent deployment configures tolerations from cluster CR #1154

Merged
merged 4 commits into from Jan 27, 2023

Conversation

rajiteh
Copy link
Contributor

@rajiteh rajiteh commented Dec 3, 2022

Related:

This PR introduces the ability for fleet admins to configure tolerations that must be included in the fleet agent deployment by setting them in each cluster CR's spec.

A new configuration field .agentTolerations provide a way to define a list of corev1.Toleration objects.

This solves the problem of not being able to use Fleet in clusters which all the nodes are tainted (no schedulable nodes for agent deployment)

Additional Information

This solution let's the user/system that manages fleet Cluster resources to configure the tolerations for that particular cluster's agent deployment.

Tradeoff

In agent initiated registration flow, the initial agent deployment will need to replicate the same set of tolerations.
ie: if using helm then we would have to set .kubectl.tolerations and fleetAgent.tolerations fields

However, it can be argued that one could simply set a catch all toleration - operator: Exists in the agent initiated flow which guarantees that the agent will get scheduled in the downstream cluster. Once it has communicated with the fleet controller, the new bundle pushed out by the controller will contain the proper tolerations as defined by the Cluster CR which schedule the pods in the appropriate nodes.

Potential improvement

@manno
Copy link
Member

manno commented Jan 10, 2023

Hey thanks. I guess this will not conflict with the existing tolerations in https://github.com/rancher/fleet/blob/master/pkg/agent/manifest.go#L234-L244?

manno
manno previously approved these changes Jan 16, 2023
@@ -93,6 +94,10 @@ func Manifest(namespace string, agentScope string, opts ManifestOptions) []runti
propagateDebug, _ := strconv.ParseBool(os.Getenv("FLEET_PROPAGATE_DEBUG_SETTINGS_TO_AGENTS"))
debug := logrus.IsLevelEnabled(logrus.DebugLevel) && propagateDebug
dep := agentDeployment(namespace, DefaultName, image, opts.AgentImagePullPolicy, DefaultName, false, debug)

// additional tolerations
dep.Spec.Template.Spec.Tolerations = append(dep.Spec.Template.Spec.Tolerations, opts.AgentTolerations...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manno it will preserve the tolerations defined by the agentDeployment func and append anything that the user sent. However, it is possible for the user defined tolerations to override the default ones if they have the same key, but I don't see why someone would do that intentionally as the default ones are rancher namespaced.

@thardeck
Copy link
Contributor

thardeck commented Jan 19, 2023

@rajiteh could you provide ci tests for your pr? We try to improve our code base and having automated tests for new non trivial code changes helps a lot.
We have e2e (/e2e/) and integration tests (/integrationtests/) in place so you can see how we do it. If you have questions let us know.

Thanks in advance.

@rajiteh
Copy link
Contributor Author

rajiteh commented Jan 26, 2023

@thardeck @manno it was a little too tough for me to grok how the e2e stuff worked in this repo, I can't seem to figure out if there is any code that tests agent deployment behaviour and I'm too lost to come up with my own. I decided to add some unit tests to verify the functionality instead. Let me know if that's enough for now. Thanks!

@manno manno merged commit 9ad958d into rancher:master Jan 27, 2023
@manno
Copy link
Member

manno commented Jan 27, 2023

@rajiteh I don't think we have anything to really check agent deployment. Our E2E tests mostly apply Gitrepos and use kubectl to check the outcome.
The only test that comes close is our (basic) "installation test: https://github.com/rancher/fleet/blob/master/.github/workflows/fleet-upgrade.yml#L103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants