New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use node-controller cluster role for node-lifecycle and cloud-node-lifecycle controller #72764
Use node-controller cluster role for node-lifecycle and cloud-node-lifecycle controller #72764
Conversation
Rules: []rbacv1.PolicyRule{ | ||
rbacv1helpers.NewRule("get", "list", "update", "delete", "patch").Groups(legacyGroup).Resources("nodes").RuleOrDie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the non-cloud node controller never deletes node objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to, but in #70344 we split the node deletion logic into a separate controller so it can be easily ported into the cloud controller manager.
@@ -206,16 +206,31 @@ func buildControllerRoles() ([]rbacv1.ClusterRole, []rbacv1.ClusterRoleBinding) | |||
}, | |||
}) | |||
addControllerRole(&controllerRoles, &controllerRoleBindings, rbacv1.ClusterRole{ | |||
ObjectMeta: metav1.ObjectMeta{Name: saRolePrefix + "node-controller"}, | |||
ObjectMeta: metav1.ObjectMeta{Name: saRolePrefix + "node-lifecycle-controller"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will orphan the existing clusterrole on upgraded clusters. what is the reason for the rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, was wondering what happens there. So my justification for this is:
- this is confusing cause the controller was always called
node-lifecycle
and not justnode
. node-controller
has access to delete nodes, so leaving this would give it access to delete nodes even though it no longer needs it.
I'm not sure what the implications of orphaned cluster roles are so I'll defer to your judgment here (worse case we can keep the node-controller
role). Any reason why we can't support a way to clean up orphaned controllers? Would that not work because of version skew between components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we can't support a way to clean up orphaned controllers? Would that not work because of version skew between components?
We take an extremely conservative position on permissions management and upgrades. Roles and permissions are never automatically removed or tightened by an upgrade, to ensure that no component that is currently working is ever broken by an upgrade removing its permission.
New clusters would only have the current set of default roles/bindings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know! If orphaning the node-controller
role is still an option I'd like to go down that path. Otherwise, we can continue to use the node-controller
role here unchanged for the node lifecycle controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt do you have a strong preference here? Options are:
- orphan
node-controller
clusterrole and replace with new rolenode-lifecycle-controller
- keep
node-controller
but gives delete access to node lifecycle controller even though it doesn't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of these, I'd favor Option 2 so we don't leave an unused role floating around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @cheftako |
Would it really be the end of the world to just use the |
It's already reused across controllers, e.g. see the ipam controller. |
126f9b3
to
ea6a0fd
Compare
@mtaufen no objections from me, updated PR accordingly :) |
/retest |
@@ -126,6 +126,7 @@ func startNodeLifecycleController(ctx ControllerContext) (http.Handler, bool, er | |||
ctx.InformerFactory.Core().V1().Pods(), | |||
ctx.InformerFactory.Core().V1().Nodes(), | |||
ctx.InformerFactory.Apps().V1().DaemonSets(), | |||
// cloud node lifecycle controller uses existing cluster role from node-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment correct? is this the cloud node lifecycle controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!
ea6a0fd
to
8fd10d4
Compare
…lifecycle controller
8fd10d4
to
426714c
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
In #70344 we split
node-lifecycle-controller
into two separate controllers:node-lifecycle-controller
andcloud-node-lifecycle
controller (see the PR for reasons). As a follow up we should have separate controller roles. This also fixes a bug (#72499) where nodes that do not exist are not deleted in the API server if--use-service-account-credentials
is set.Which issue(s) this PR fixes:
Fixes # #72499
Special notes for your reviewer:
This change also removes the need for the
node-controller
service account & role. Is there a deprecation process for bootstrap policies or a mechanism to clean them up?Does this PR introduce a user-facing change?:
/assign @mtaufen @liggitt