Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create KEP to harden the default RBAC discovery clusterrolebindings #720

Merged
merged 1 commit into from Feb 4, 2019

Conversation

dekkagaijin
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/pm size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 23, 2019
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the KEP. I think this is very likely to break any clients out there that are depending on access to these APIs, and the rollout needs very careful consideration.

Can you include more details about how these APIs might have been used before, and how to reconcile those use cases?

### Risks and Mitigations

1. If a user upgrades a cluster which has already limited access to the `GET /healthz`, `/version` or `/version/` APIs, auto-reconciliation will re-enable universal access via `system:public-info-viewer`
1. We'll explicitly call this out as an `ACTION REQUIRED` with a sample `kubectl apply` which can be used to effectively disable the `system:public-info-viewer` ClusterRoleBinding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would those roles be reset everytime the apiserver restarts though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the bootstrap code (https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/rbac/rest/storage_rbac.go), I think it does reconciliation which would mean that modifications would be clobbered on restart. @liggitt can probably give a more definitive answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the rbac.authorization.kubernetes.io/autoupdate is set to true

1. If a user upgrades a cluster which has already limited access to the `GET /healthz`, `/version` or `/version/` APIs, auto-reconciliation will re-enable universal access via `system:public-info-viewer`
1. We'll explicitly call this out as an `ACTION REQUIRED` with a sample `kubectl apply` which can be used to effectively disable the `system:public-info-viewer` ClusterRoleBinding
1. Some use-cases might require the existing permissions to be preserved for unauthenticated calls
1. In the release notes, we can include an easy 'escape hatch' to re-enable unauthenticated access to the APIs, e.g. `kubectl create clusterrolebinding public-discovery --clusterrole=system:discovery --group=system:unauthenticated`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to follow the deprecation policy, and roll out over multiple releases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guidance accepted...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the proposed approach lets upgraded clusters work as-is without any behavior change, I think this can be implemented immediately.


### Risks and Mitigations

1. If a user upgrades a cluster which has already limited access to the `GET /healthz`, `/version` or `/version/` APIs, auto-reconciliation will re-enable universal access via `system:public-info-viewer`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about downgrades? Any concerns there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downgrades would result in a redundant binding (i.e. system:public-info-viewer being a strict subset of the permissions granted by system:discovery). This can also be called out, and remedied with a one-line kubectl delete

keps/NEXT_KEP_NUMBER Outdated Show resolved Hide resolved
1. If a user upgrades a cluster which has already limited access to the `GET /healthz`, `/version` or `/version/` APIs, auto-reconciliation will re-enable universal access via `system:public-info-viewer`
1. We'll explicitly call this out as an `ACTION REQUIRED` with a sample `kubectl apply` which can be used to effectively disable the `system:public-info-viewer` ClusterRoleBinding
1. Some use-cases might require the existing permissions to be preserved for unauthenticated calls
1. In the release notes, we can include an easy 'escape hatch' to re-enable unauthenticated access to the APIs, e.g. `kubectl create clusterrolebinding public-discovery --clusterrole=system:discovery --group=system:unauthenticated`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the proposed approach lets upgraded clusters work as-is without any behavior change, I think this can be implemented immediately.


### Non-Goals

Disabling unauthenticated and/or anonymous access to all APIs by default. There are several non-sensitive and legitimate use-cases for unauthenticated calls, such as loadbalancer or kubelet liveliness checks and `kubectl version` that we don't wish to break.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or kubelet liveliness checks

What is this use case?

kubectl version

If you're using kubectl, you're probably authenticated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have an apiserver Pod managed by a kubelet, running an https liveness check on :443/healthz, I don't think there's a way to make that authenticated... right?

I can't remember why there was a push to keep /version widely open. @liggitt?

@dekkagaijin
Copy link
Contributor Author

In the interest of getting a review from non-Googlers, and since you have context with the CVE, PTAL @deads2k

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 29, 2019
@deads2k
Copy link
Contributor

deads2k commented Jan 29, 2019

At a first read, this seems like it would produce severe usability issues. Instead of trying to limit access to discovery out of fear, let's fix the discovery call itself. The idea of caching it instead of proxying it has merit. The permissions changes being proposed here do not address our underlying causes.

Any distribution that wishes to do this can do it already. I'd rather see progress on our underlying issues than trying to segregate a class of user when we know that clusters today are multi-tenant and have multiple classes of authenticated users. If we fix discovery proxying, we can still do something like this, but we'll have made progress on the actual problem.

@liggitt
Copy link
Member

liggitt commented Jan 29, 2019

At a first read, this seems like it would produce severe usability issues.

Can you give examples?

Instead of trying to limit access to discovery out of fear, let's fix the discovery call itself. The idea of caching it instead of proxying it has merit. The permissions changes being proposed here do not address our underlying causes.

I'm in favor of doing both. An unauthenticated escalation is categorically worse than an authenticated one, and is simpler to protect against.

If we fix discovery proxying, we can still do something like this, but we'll have made progress on the actual problem.

I don't understand the "we can still do something like this" statement. Are you proposing doing this, but after changing discovery proxying?

@deads2k
Copy link
Contributor

deads2k commented Jan 29, 2019

If we break this KEP into pieces, I see two assertions

  1. We had a CVE and discovery was a problem
  2. We should have a server that can effectively deny unauthenticated access.

To the first I would say that there are better solutions for managing potential discovery problems that don't require changing permission. The notion of caching discovery for instance.

To the second, I would say that if we want to explore the idea of having a server with more controlled unauthenticated access, let's create a KEP about that. This appears to be starting at the conclusion of such a KEP.

I see three different levels of unauthenticated access restriction.

  1. Unauthenticated controlled via permissions. This is what we have today.
  2. Unauthenticated not allowed to perform any API actions. This is easy, but probably impractical.
  3. Unauthenticated can be bound, but isn't bound to any permissions by default. In this case, we probably want to control who can bind unauthenticated. OpenShift rolebindingrestrictions is one possible implementation of this.

This KEP appears to be a middle ground that I'm not sure anyone is at. It remains very easy for an unprivileged user to grant unauthenticated access, but that same user cannot make it possible for unauthenticated to actually use the permission that is granted. it's possible, just not easy. Rather than introduce this asymmetry without consideration of how to build out a usable system, let's consider the problem as a whole and figure out how to create a usable system that satisfies the need.

@liggitt I think I answered your question

@dekkagaijin
Copy link
Contributor Author

Current guidance explicitly discourages users from modifying the default ClusterRoleBindings, which allow unrestricted access to these APIs.

If I removed language re: the CVE and expounded further on why we considered removing anonymous auth, unbinding system:unauthenticated by default, or otherwise semantically forking the behavior of discovery to be more problematic than the proposed solution, would that be acceptable?

@deads2k
Copy link
Contributor

deads2k commented Jan 29, 2019

Current guidance explicitly discourages users from modifying the default ClusterRoleBindings, which allow unrestricted access to these APIs.

If I removed language re: the CVE and expounded further on why we considered removing anonymous auth, unbinding system:unauthenticated by default, or otherwise semantically forking the behavior of discovery to be more problematic than the proposed solution, would that be acceptable?

I'm still looking for something that explains the considering of the different levels of unauthenticated restriction. The core of this proposal creates a situation where unauthenticated permissions can still be granted by namespace level admins, creating a case where someone has permission, but can't easily use it. How about considering the problem of unauthenticated access more completely (like I described above)? I don't see case as a good outcome, let's give them the tools to make effective use of changes to default policy before we changing it in a way that promotes weird/bad states.

@liggitt
Copy link
Member

liggitt commented Jan 29, 2019

To the first I would say that there are better solutions for managing potential discovery problems that don't require changing permission. The notion of caching discovery for instance.

I'm not opposed to also pursuing caching discovery, but I consider the default exposure of discovery to unauthenticated users problematic. When the current policy was determined, custom resources did not exist, so knowing types in a cluster was not considered confidential. It now can be. This addresses that issue, and limits the blast radius for discovery-related handler errors to authenticated users. I agree more can (and should) be done there, but I think this is an important and worthwhile step.

It remains very easy for an unprivileged user to grant unauthenticated access

s/unprivileged/low privilege/?

I see three different levels of unauthenticated access restriction.

  • Unauthenticated controlled via permissions. This is what we have today.

Yes, and we're reconsidering what those permissions should be. The project's posture has always been that the default permissions granted are ones we consider safe (both in terms of escalation and in terms of information disclosure), and individual clusters were free to open up those permissions wider if they wanted. Recent events proved the discovery APIs could be an escalation path, and they (by design) expose information that custom resources make more and more cluster-specific and potentially confidential.

This KEP appears to be a middle ground that I'm not sure anyone is at.

I disagree. The most common question I was asked over the last two months was "how do I limit anonymous access safely"? My only answer was to tell them to override the default configuration and detach that aspect of their cluster policy from auto-reconciliation, which seems backwards to me. Clusters that want to open themselves up to anonymous API users are certainly free to do so, but should not be the default.

@deads2k
Copy link
Contributor

deads2k commented Jan 29, 2019

The idea that the presence of CRDs is sensitive suggests levels of discovery access and this proposal doesn't address that. It creates two tiers, but there's no reason to believe there are only two. Claiming this solves that is spurious.

Is it so hard to consider how to control unauthenticated access that we're trying to avoid it? There are existent examples of how to control exactly that. This really appears to address a single narrow case that produces the potential for conflict desires.

@deads2k
Copy link
Contributor

deads2k commented Jan 29, 2019

spoke via slack. If the intro to this KEP can clearly indicate what is and is not solved here like

  1. this does not protect the cluster from unauthenticated access - namespace admins can still grant that access
  2. this does not protect the definition of CRDs beyond "you're authenticated" - if your CRDs are confidential, this doesn't give you a way to partition access
  3. this does not address discovery bugs - the CVE would still have happened. This would help with some default actions, but it's nothing special. The mitigations can do the same thing. To fix discovery, we should fix discovery (maybe cache and serve instead of proxying).
  4. this does create the potential to have access and have difficulty using it - since namespace admins can grant access, you can end up in states where unauthenticated has access, but can't use normal client tools.
  5. this does prevent previously working configurations from working on new installs - if you install a side-by-side cluster and move workloads instead of updating, this can break you.

Then I'd feel better about taking this approach. RBAC was built for additive permissions, but we need to be sure that people aren't confused about what this actually protects them from and the challenges with usability.

@dekkagaijin
Copy link
Contributor Author

I clarified & elaborated on the goals, non-goals, risks, and mitigations

Copy link

@mayakacz mayakacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally makes sense to me. Would be good to hear of anyone relying on those permissions today, e.g., for permission checks for system:basic-user.

@dekkagaijin
Copy link
Contributor Author

@liggitt done!

@liggitt
Copy link
Member

liggitt commented Feb 1, 2019

/lgtm
/approve

/hold
for @deads2k

edit: manually tagging, the sig-auth-leads alias in this repo was outdated... updated in #801

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 1, 2019
@liggitt liggitt added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 1, 2019
@deads2k
Copy link
Contributor

deads2k commented Feb 4, 2019

I still find the distinction between the "ok" endpoints and "not-ok" endpoints highly artificial (really, why is access to the /version any less concerning that /), but at least the things it doesn't solve are listed near the top.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2019
@liggitt liggitt added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2019
@liggitt
Copy link
Member

liggitt commented Feb 4, 2019

rebase
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2019
@liggitt liggitt removed the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Feb 4, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dekkagaijin, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants