Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create system:cluster-autoscaler account & role and introduce it to C… #64503

Merged
merged 4 commits into from Jun 12, 2018

Conversation

@kgolab
Copy link
Contributor

kgolab commented May 30, 2018

What this PR does / why we need it:

This PR adds cluster-autoscaler ClusterRole & binding, to be used by the Cluster Autoscaler (kubernetes/autoscaler repository).
It also updates GCE scripts to make CA use the cluster-autoscaler user account.

User account instead of Service account is chosen to be more in line with kube-scheduler.

Which issue(s) this PR fixes:

Fixes issue 383 from kubernetes/autoscaler.

Special notes for your reviewer:

This PR might be treated as a security fix since prior to it CA on GCE was using system:cluster-admin account, assumed due to default handling of unsecured & unauthenticated traffic over plain HTTP.

Release note:

A cluster-autoscaler ClusterRole is added to cover only the functionality required by Cluster Autoscaler and avoid abusing system:cluster-admin role.

action required: Cloud providers other than GCE might want to update their deployments or sample yaml files to reuse the role created via add-on.
@aleksandra-malinowska

This comment has been minimized.

Copy link
Contributor

aleksandra-malinowska commented May 30, 2018

/ok-to-test

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented May 30, 2018

/retest

@@ -440,6 +440,31 @@ func ClusterRoles() []rbacv1.ClusterRole {
rbacv1helpers.NewRule(Read...).Groups(legacyGroup).Resources("persistentvolumeclaims", "persistentvolumes").RuleOrDie(),
},
},
{
// a role for the Cluster Autoscaler
ObjectMeta: metav1.ObjectMeta{Name: "system:cluster-autoscaler"},

This comment has been minimized.

@liggitt

liggitt May 30, 2018

Member

where does the cluster autoscaler component live? this seems like it would be better defined in a manifest defined in https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

This comment has been minimized.

@kgolab

kgolab May 31, 2018

Author Contributor

At least in GCE & GKE the CA lives on master node and is started from the manifest file that resides in kubernetes/kubernetes repository and is part of this PR - please have a look at cluster-autoscaler.manifest.

This comment has been minimized.

@liggitt

liggitt May 31, 2018

Member

since the component is deployment-specific, the permissions for the autoscaler should also be part of the deployment. for an example of what this could look like for GCE/GKE, see the permissions GCE/GKE grants to kubelets:

# prep addition kube-up specific rbac objects
setup-addon-manifests "addons" "rbac/kubelet-api-auth"
setup-addon-manifests "addons" "rbac/kubelet-cert-rotation"

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gce:beta:kubelet-certificate-bootstrap
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gce:beta:kubelet-certificate-bootstrap
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubelet
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gce:beta:kubelet-certificate-rotation
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gce:beta:kubelet-certificate-rotation
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gce:beta:kubelet-certificate-bootstrap
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- "certificates.k8s.io"
resources:
- certificatesigningrequests/nodeclient
verbs:
- "create"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gce:beta:kubelet-certificate-rotation
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- "certificates.k8s.io"
resources:
- certificatesigningrequests/selfnodeclient
- certificatesigningrequests/selfnodeserver
verbs:
- "create"

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 7, 2018

I've moved the ClusterRole & ClusterRoleBinding creation to an add-on yaml, as requested by @liggitt.

I've some doubts though about the mixed life-cycle of this solution: CA is created from a manifest located in /etc/kubernetes/manifests/ directory while the role is added independently via kube-addon-manager.
This usually results in 2 (sometimes 3) extra iterations of CrashLoopBackOff compared to previous solution and delays the CA start on a freshly created cluster by about a minute.
@mwielgus, do we need to make something about it or can we live with this change in start-up behaviour?

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 7, 2018

/assign @zmerlynn

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 7, 2018

/test pull-kubernetes-node-e2e

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 8, 2018

Since this change is basically a security fix for Cluster Autoscaler and is limited to GCE/GKE starting scripts, could we merge this to 1.11 still? The current implementation should not affect global Kubernetes release as far as I can see.

From the initial description:
Fixes issue 383 from kubernetes/autoscaler.

I've tested this PR by setting up GCE & GKE clusters, going through few basic scale-up and scale-down scenarios and verifying CA logs that there are no "access forbidden" messages.
The only "access forbidden" messages are logged during the initial start-up when CA tries to list the nodes before addon manager creates ClusterRole - see also previous comment.

/cc @jberkus

@dims

This comment has been minimized.

Copy link
Member

dims commented Jun 8, 2018

- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
# read-only access to cluster state, mostly around services & controllers

This comment has been minimized.

@wojtek-t

wojtek-t Jun 8, 2018

Member

nit: remove "mostly around services & controllers" - there are so many objects here...

This comment has been minimized.

@kgolab

kgolab Jun 8, 2018

Author Contributor

Done.

- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]

This comment has been minimized.

@wojtek-t

wojtek-t Jun 8, 2018

Member

merge with the first entry in this group - both are for the same verbs and apiGroup

This comment has been minimized.

@kgolab

kgolab Jun 8, 2018

Author Contributor

Merged, thanks for noticing.

- apiGroups: [""]
resources: ["services", "replicationcontrollers"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps", "extensions"]

This comment has been minimized.

@wojtek-t

wojtek-t Jun 8, 2018

Member

Do we really need "extensions"? Didn't we already migrated everything to "apps" api group?

This comment has been minimized.

@liggitt

liggitt Jun 8, 2018

Member

No, the extensions group is very much alive :(

This comment has been minimized.

@wojtek-t

wojtek-t Jun 8, 2018

Member

Well - I know that the group itself is alive and available to users.
I'm just asking if cluster-autoscaler/scheduler is still using it (I thought we migrated usages of scheduler and cluster-autoscaler to already use apps api group - the underlying resource in the end is exactly the same).

This comment has been minimized.

@kgolab

kgolab Jun 8, 2018

Author Contributor

I can see code like Extensions().V1beta1().ReplicaSets() or ExtensionsV1beta1().DaemonSets in ClusterAutoscaler so I guess we're still using extensions.

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Jun 8, 2018

@kgolab - please squash commit before I will approve it.

@kgolab kgolab force-pushed the kgolab:kg-ca-rbac branch from 1c9084d to c70b554 Jun 8, 2018

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 8, 2018

Squashed.

@wojtek-t wojtek-t self-assigned this Jun 8, 2018

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Jun 8, 2018

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 11, 2018

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jun 11, 2018

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 11, 2018

/test pull-kubernetes-e2e-gce

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 11, 2018

/test pull-kubernetes-node-e2e

@@ -2006,6 +2006,7 @@ function start-cluster-autoscaler {

local params="${AUTOSCALER_MIG_CONFIG} ${CLOUD_CONFIG_OPT} ${AUTOSCALER_EXPANDER_CONFIG:---expander=price}"
params+=" --kubeconfig=/etc/srv/kubernetes/cluster-autoscaler/kubeconfig"
sed -i -e "s@{{srv_kube_path}}@/etc/srv/kubernetes@g" "${src_file}"

This comment has been minimized.

@mikedanese

mikedanese Jun 11, 2018

Member

There's no need to parameterize this.

This comment has been minimized.

@kgolab

kgolab Jun 11, 2018

Author Contributor

OK, reverted back to hard-coded value.

@kgolab kgolab force-pushed the kgolab:kg-ca-rbac branch from 0291eaf to 9e2fa69 Jun 11, 2018

@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 11, 2018

/test pull-kubernetes-node-e2e

@mikedanese mikedanese added lgtm and removed do-not-merge/hold labels Jun 11, 2018

@jberkus

This comment has been minimized.

Copy link

jberkus commented Jun 11, 2018

escalating priority so this can merge

/priority critical-urgent
/remove-priority important-soon

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jun 11, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@kgolab @mikedanese @wojtek-t @zmerlynn

Pull Request Labels
  • sig/autoscaling: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help
@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jun 11, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jun 12, 2018

Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit ec43466 into kubernetes:master Jun 12, 2018

16 of 18 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce-big
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation kgolab authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Jun 12, 2018

@kgolab - next time please squash commits before merging.

@zparnold

This comment has been minimized.

Copy link
Member

zparnold commented Jun 13, 2018

Hello there! @kgolab I'm Zach Arnold working on Docs for the 1.11 release. This PR was identified as one needing some documentation in the https://github.com/kubernetes/website repo around your contributions (thanks by the way!) When you have some time, could you please modify/add/remove the relevant content that needs changing in our documentation repo? Thanks! Please let me or my colleague Misty know (@zparnold/@Misty on K8s Slack) if you need any assistance with the documentation.

k8s-github-robot pushed a commit that referenced this pull request Jun 13, 2018

Kubernetes Submit Queue
Merge pull request #65014 from kgolab/automated-cherry-pick-of-#64503…
…-upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #64503: Create system:cluster-autoscaler account & role and

Cherry pick of #64503 on release-1.10.

#64503: Create system:cluster-autoscaler account & role and

k8s-github-robot pushed a commit that referenced this pull request Jun 15, 2018

Kubernetes Submit Queue
Merge pull request #65015 from kgolab/automated-cherry-pick-of-#64503…
…-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #64503: Create system:cluster-autoscaler account & role and

Cherry pick of #64503 on release-1.9.

#64503: Create system:cluster-autoscaler account & role and
@kgolab

This comment has been minimized.

Copy link
Contributor Author

kgolab commented Jun 19, 2018

Hi Zach,
@zparnold , as far as I understand the documentation update has been required only for the initial version of this PR, when system-level ClusterRole has been created in bootstraping code. For that there was PR 8814.
Since then the solution has been reworked to be based on add-ons and as such does not fit to docs/reference/access-authn-authz/rbac/

Please let me know if you meant something different or just know other place which should be updated. Thanks!

damemi pushed a commit to damemi/kubernetes that referenced this pull request Jun 27, 2018

Kubernetes Submit Queue
Merge pull request kubernetes#65308 from kgolab/kg-cleanup-kubepath
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove unused srv_kube_path variable

**What this PR does / why we need it**:

Clean-up of an unused script variable, as discussed with @mikedanese after [a comment in PR 64503](kubernetes#64503 (comment)).

**Release note**:

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.