Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use node-controller cluster role for node-lifecycle and cloud-node-lifecycle controller #72764

Conversation

andrewsykim
Copy link
Member

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
In #70344 we split node-lifecycle-controller into two separate controllers: node-lifecycle-controller and cloud-node-lifecycle controller (see the PR for reasons). As a follow up we should have separate controller roles. This also fixes a bug (#72499) where nodes that do not exist are not deleted in the API server if --use-service-account-credentials is set.

Which issue(s) this PR fixes:
Fixes # #72499

Special notes for your reviewer:
This change also removes the need for the node-controller service account & role. Is there a deprecation process for bootstrap policies or a mechanism to clean them up?

Does this PR introduce a user-facing change?:

Add bootstrap service account & cluster roles for node-lifecycle-controller, cloud-node-lifecycle-controller, and cloud-node-controller.  

/assign @mtaufen @liggitt

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jan 10, 2019
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 10, 2019
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 10, 2019
Rules: []rbacv1.PolicyRule{
rbacv1helpers.NewRule("get", "list", "update", "delete", "patch").Groups(legacyGroup).Resources("nodes").RuleOrDie(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the non-cloud node controller never deletes node objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to, but in #70344 we split the node deletion logic into a separate controller so it can be easily ported into the cloud controller manager.

@@ -206,16 +206,31 @@ func buildControllerRoles() ([]rbacv1.ClusterRole, []rbacv1.ClusterRoleBinding)
},
})
addControllerRole(&controllerRoles, &controllerRoleBindings, rbacv1.ClusterRole{
ObjectMeta: metav1.ObjectMeta{Name: saRolePrefix + "node-controller"},
ObjectMeta: metav1.ObjectMeta{Name: saRolePrefix + "node-lifecycle-controller"},
Copy link
Member

@liggitt liggitt Jan 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will orphan the existing clusterrole on upgraded clusters. what is the reason for the rename?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, was wondering what happens there. So my justification for this is:

  1. this is confusing cause the controller was always called node-lifecycle and not just node.
  2. node-controller has access to delete nodes, so leaving this would give it access to delete nodes even though it no longer needs it.

I'm not sure what the implications of orphaned cluster roles are so I'll defer to your judgment here (worse case we can keep the node-controller role). Any reason why we can't support a way to clean up orphaned controllers? Would that not work because of version skew between components?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we can't support a way to clean up orphaned controllers? Would that not work because of version skew between components?

We take an extremely conservative position on permissions management and upgrades. Roles and permissions are never automatically removed or tightened by an upgrade, to ensure that no component that is currently working is ever broken by an upgrade removing its permission.

New clusters would only have the current set of default roles/bindings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! If orphaning the node-controller role is still an option I'd like to go down that path. Otherwise, we can continue to use the node-controller role here unchanged for the node lifecycle controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt do you have a strong preference here? Options are:

  1. orphan node-controller clusterrole and replace with new role node-lifecycle-controller
  2. keep node-controller but gives delete access to node lifecycle controller even though it doesn't need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of these, I'd favor Option 2 so we don't leave an unused role floating around.

Copy link
Contributor

@mtaufen mtaufen Jan 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing cause the controller was always called node-lifecycle and not just node

For history's sake, it was actually called the node controller until @cheftako split things up in #57492

@fedebongio
Copy link
Contributor

/cc @cheftako

@mtaufen
Copy link
Contributor

mtaufen commented Jan 10, 2019

Would it really be the end of the world to just use the node-controller role for both node-lifecycle-controller and cloud-node-lifecycle-controller?

@mtaufen
Copy link
Contributor

mtaufen commented Jan 10, 2019

It's already reused across controllers, e.g. see the ipam controller.

@andrewsykim andrewsykim force-pushed the cloud-node-lifecycle-controller-rbac branch from 126f9b3 to ea6a0fd Compare January 11, 2019 00:44
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2019
@andrewsykim
Copy link
Member Author

@mtaufen no objections from me, updated PR accordingly :)

@andrewsykim andrewsykim changed the title Create separate controller roles for node controllers Use node-controller cluster role for node-lifecycle and cloud-node-lifecycle controller Jan 11, 2019
@liggitt
Copy link
Member

liggitt commented Jan 14, 2019

/retest

@@ -126,6 +126,7 @@ func startNodeLifecycleController(ctx ControllerContext) (http.Handler, bool, er
ctx.InformerFactory.Core().V1().Pods(),
ctx.InformerFactory.Core().V1().Nodes(),
ctx.InformerFactory.Apps().V1().DaemonSets(),
// cloud node lifecycle controller uses existing cluster role from node-controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment correct? is this the cloud node lifecycle controller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

@andrewsykim andrewsykim force-pushed the cloud-node-lifecycle-controller-rbac branch from ea6a0fd to 8fd10d4 Compare January 14, 2019 19:59
@andrewsykim andrewsykim force-pushed the cloud-node-lifecycle-controller-rbac branch from 8fd10d4 to 426714c Compare January 14, 2019 20:00
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 14, 2019
@liggitt
Copy link
Member

liggitt commented Jan 14, 2019

/lgtm
/approve
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2019
@k8s-ci-robot k8s-ci-robot merged commit 3b0b74f into kubernetes:master Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants