Adding capability for tolerations in the agent #703

MbolotSuse · 2022-01-24T15:55:42Z

Related to rancher/rancher#34159 . When the fleet controller is deployed, it deploys the fleet agent (both in the cluster which it is deployed in and in downstream clusters which use fleet). Currently, neither of these deployments allow specification of tolerations, making them difficult/impossible to use in a tainted cluster.

Goals of this PR:

For the agent deployed in the same cluster as the controller, use the same tolerations which were specified for the controller
For any agents deployed in the downstream cluster, allow custom specification of the tolerations. This custom specification is stored, like the agent env vars, on the spec of the imported fleet cluster.

MbolotSuse · 2022-02-08T17:16:55Z

Update: Was able to test this and fix a few bugs. I would now consider this PR ready for review.

Tests:

Build fleet-controller and push to test registry.
Deploy a 3 node cluster and taint each node (node 1: priority=low:NoSchedule, node 2: priority=mid:NoSchedule, node 3: priority=high:NoSchedule)
Add a key: Priority, operator: Equals, value: High, effect: NoSchedule toleration to the chart for both fleet and the gitjob job
Install fleet-crd using helm -n fleet-system install --create-namespace --wait fleet-crd . in the charts/fleet-crd directory.
Install fleet using helm -n fleet-system install --create-namespace --wait fleet . in the charts/fleet directory.
Ensure that all pods successfully scheduled using k get pods -A
Create a GitRepo using the following description:

apiVersion: fleet.cattle.io/v1alpha1                                                                                                                                                  
kind: GitRepo                                                                                                                                                                         
metadata:                                                                                                                                                                             
  name: test                                                                                                                                                                          
  namespace: fleet-local                                                                                                                                                             
spec:                                                                                                                                                                                
  branch: master                                                                                                                                                                     
  paths:                                                                                                                                                                             
  - simple                                                                                                                                                                          
  repo: https://github.com/rancher/fleet-examples

Ensure that the import job successfully ran. One note: the resources that were deployed (from fleet-examples) had no tolerations and was not able to be scheduled. I'm not concerned about this since they weren't defined with tolerations in the fleet-examples, so I consider this behavior to be acceptable.

MbolotSuse · 2022-02-08T19:24:55Z

Overview of Changes:

Changed the definition of a fleet Cluster (CRD) to include a field for agentTolerations (values are Tolerations as defined by the K8s api)
Added a value to the fleet-controller config map defining the tolerations which were set on the fleet controller. This was done so that the various parts of the code which need these tolerations don't have to attempt to look up the deployment
Added a variable to the basic.Deployment function that allows a calling function to pass in additional tolerations (there is 1 pre-set that needs to be used).
Removed references to the cattle.io/os toleration, since that should now be passed in by the values set in the chart.
For the local cluster, set the AgentTolerations to what was defined for the fleet-controller.
Passed in no additional tolerations for the controller deployment created by the operator (modules/cli/controllermanifest/template.go) I'm unsure if this functionality is currently in use .

Changing agent deployments to use tolerations defined on the fleet cluster, and changing the local cluster creation to use the tolerations from the controller deployment.

aiyengar2 · 2022-04-07T22:53:00Z

re:

For the agent deployed in the same cluster as the controller, use the same tolerations which were specified for the controller

I'm not sure if there might be use cases where a fleet-agent needs to have different tolerations than fleet? I'm not too concerned about this though.

re:

For any agents deployed in the downstream cluster, allow custom specification of the tolerations. This custom specification is stored, like the agent env vars, on the spec of the imported fleet cluster.

This is where my primary concern is with the approach of this PR; I think that the specification of tolerations should happen in the downstream cluster, not the local cluster (where the fleet cluster object lives) since the current approach only works well if you are using Manager-Initiated registration (which is what Rancher uses).

In Agent-Initiated registration, where the Cluster object is auto-created on seeing a ClusterRegistrationToken, there's undefined behavior since the Fleet Agent is deployed and managed via a Helm chart in the downstream cluster; in this case, these fields on the spec of the cluster in the management cluster would need to be effectively ignored right?

manno · 2023-03-08T10:48:14Z

Support for tolerations was added in PR #1154

MbolotSuse requested review from kinarashah and aiyengar2 January 24, 2022 15:55

MbolotSuse changed the title ~~[WIP] Adding capability for tolerations in the agent~~ Adding capability for tolerations in the agent Jan 25, 2022

MbolotSuse force-pushed the tolerations branch from 571a091 to dc4dfa6 Compare February 8, 2022 18:16

MbolotSuse force-pushed the tolerations branch from dc4dfa6 to 5d3795a Compare February 8, 2022 19:52

Enhancing ability to set tolerations

d0a4767

Changing agent deployments to use tolerations defined on the fleet cluster, and changing the local cluster creation to use the tolerations from the controller deployment.

MbolotSuse force-pushed the tolerations branch from 5d3795a to d0a4767 Compare February 8, 2022 20:11

aiyengar2 requested review from prachidamle and removed request for kinarashah February 14, 2022 18:35

rajiteh mentioned this pull request Dec 3, 2022

feat: fleet agent deployment configures tolerations from cluster CR #1154

Merged

kkaempf added kind/enhancement area/toleration labels Dec 6, 2022

kkaempf added this to the 2023-Q2-v2.7x milestone Feb 8, 2023

manno mentioned this pull request Feb 8, 2023

Add tolerations cusomization in values for Rancher helm chart rancher/rancher#34159

Closed

manno closed this Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding capability for tolerations in the agent #703

Adding capability for tolerations in the agent #703

MbolotSuse commented Jan 24, 2022

MbolotSuse commented Feb 8, 2022

MbolotSuse commented Feb 8, 2022

aiyengar2 commented Apr 7, 2022

manno commented Mar 8, 2023 •

edited

Loading

Adding capability for tolerations in the agent #703

Adding capability for tolerations in the agent #703

Conversation

MbolotSuse commented Jan 24, 2022

MbolotSuse commented Feb 8, 2022

MbolotSuse commented Feb 8, 2022

aiyengar2 commented Apr 7, 2022

manno commented Mar 8, 2023 • edited Loading

manno commented Mar 8, 2023 •

edited

Loading