Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Authentication Config #1689

Closed
wants to merge 11 commits into from
Closed

Conversation

enj
Copy link
Member

@enj enj commented Apr 15, 2020

Signed-off-by: Monis Khan mok@vmware.com

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 15, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 15, 2020
@enj
Copy link
Member Author

enj commented Apr 15, 2020

/hold

To prevent accidental merge.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 15, 2020
Signed-off-by: Monis Khan <mok@vmware.com>
keps/sig-auth/1688-dynamic-authentication-config/README.md Outdated Show resolved Hide resolved

Authentication is Kubernetes is quite flexible and unopinionated. There is no
requirement for the API server to understand where an identity originates from.
The API server supports a variety of command line flags that enable authentication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my benefit, has this idea been discussed prior? What were the objections? It seems like webhook AuthN would have been a natural fit to start with an API instead of an API flag, but wasn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt and @mikedanese may be able to provide some history.

Looking at OIDC as an example, it was added in 2015 via kubernetes/kubernetes#10957

My guess is that it was probably just easier to start with in-tree functionality that was wired through a CLI flag.

@mikedanese mikedanese self-assigned this Apr 15, 2020
@mikedanese
Copy link
Member

Could this functionality be implemented entirely out of tree using a front-proxy configured by CRDs?

@enj
Copy link
Member Author

enj commented Apr 21, 2020

Could this functionality be implemented entirely out of tree using a front-proxy configured by CRDs?

Meaningfully? No.

By front-proxy I assume you mean impersonating proxy (and not like an aggregated API proxy).

The goal here is to allow end users that have cluster-admin access to Kubernetes API endpoints (possibly across different infrastructure providers) but not direct host access to the API server to have meaningful control over the authentication that the system uses (as a purely additive extension on whatever authn is already configured).

Some thoughts on the proxy approach:

  1. You must abuse the impersonation API
  2. You must run the proxy as cluster-admin
  3. Nothing is aware of this proxy because it is unrelated to the Kubernetes API. Thus you have to build some discovery mechanism. Ingress implementations vary across providers so have to handle the dynamic hostname or use some DNS configuration elsewhere.
  4. You must configure and provision TLS certs for this proxy
  5. Network traffic for some subset of requests must now flow through the proxy
  6. It is unclear how to handle requests incoming to the proxy that attempt to use impersonation
  7. You have to somehow tell some subset of your users to use the proxy (instead of the Kube API directly)
  8. Proxies exist [1][2] but they are not meaningful solutions (anything that uses impersonation is just a hack IMO)

With a built-in API:

  1. Network traffic continues to go to a single place (the Kubernetes API)
  2. No extra components to run unless you need a custom webhook or want to run your own OIDC server (instead of using something like gitlab.com)
  3. For x509 and oidc, the verification is 100% local to the API server (high performance, high trust)
  4. It always works across any Kubernetes cluster regardless of infrastructure provider
  5. It encourages the implementation of opinionated Kube authn stacks that can be used across infrastructure providers (i.e. folks could build apps that rely on some assumptions about the underlying authn stack because that authn stack is actually portable)

While I am all for implementations outside of core, this functionality is akin to RBAC and admission webhooks. RBAC is built-in and fully configurable via the Kube API. This has lead to its adoption and common usage. It is also a relatively limited API. It is effectively the equivalent of x509 and oidc in this proposal (i.e. super easy to use and rely upon but not very flexible). Admission webhooks allow the creation of arbitrary policy extension points. They are super flexible but put a high burden on the end user. They are the equivalent of webhook in this proposal.

Just as RBAC and admission webhooks would not be meaningful if built out of core, the same is true for this proposal. If one could not rely on RBAC and admission webhooks being consistently present across all infrastructure providers, they would be useless.

@ritazh
Copy link
Member

ritazh commented Apr 21, 2020

cc @weinong for review

@seh
Copy link

seh commented Apr 21, 2020

It's mentioned with some subtlety in the "Motivation" section, but this capability would unlock use of many hosted Kubernetes offerings by organizations that need their own authentication system. As it stands today, operators must choose between owning the full cluster provisioning and operation in order to retain control of the API server command-line flags, or giving up on their custom authentication system to buy into a hosted offering. Azure Kubernetes Engine is one counterexample, where it looks at least possible to jam in the right API server command-line flags, perhaps to the surprise of the maintainers.

I had heard in the past (from @liggitt, if I recall correctly) that we couldn't offer a subsystem like this, because it's not possible to govern who can configure it, because authentication is a prerequisite to determine who is even asking to configure it. Does the design mandate that any user with "cluster admin" permission can manipulate these AuthenticationConfig objects?

@enj
Copy link
Member Author

enj commented Apr 22, 2020

this capability would unlock use of many hosted Kubernetes offerings by organizations that need their own authentication system. As it stands today, operators must choose between owning the full cluster provisioning and operation in order to retain control of the API server command-line flags, or giving up on their custom authentication system to buy into a hosted offering.

I will try to distill this down into a sentence and add as an explicit goal.

it's not possible to govern who can configure it, because authentication is a prerequisite to determine who is even asking to configure it.

There is an inherent assumption that you have an existing identity that has cluster-admin access. This is trivial to create with a cert based user in the system:masters group (this handles both authn and authz). Every Kube cluster has to provide you with some baseline way to authenticate to it, otherwise you would never be able to do anything. Note that RBAC is the same way - something has to give you cluster-admin level access so that you can, for example, create new roles for custom resources.

Does the design mandate that any user with "cluster admin" permission can manipulate these AuthenticationConfig objects?

Any user with cluster-admin access can basically do anything. This API is not materially different from the cluster-admin "creating" identities using the impersonation API. Unlike impersonation, we can create stronger restrictions similar to the RBAC API. For example, write access to this API would require:

  1. A user in the system:masters group, or
  2. escalate verb on authenticationconfigs.authentication.k8s.io, or
  3. * verb on *.*

A cluster-admin can always run the following command:

kubectl create clusterrolebinding --group=system:unauthenticated --clusterrole=cluster-admin turn-off-security

Just as there is no meaningful way for the RBAC API to guard against a cluster-admin, the same is true for the AuthenticationConfig API.

@seh
Copy link

seh commented Apr 22, 2020

Thank you for clarifying. I now see that @micahhausler had asked a similar question earlier (#1689 (comment)). I missed that yesterday, and should have read more carefully to avoid repeating the same point.

I hope the Webhook support makes it in before too long. (It's commented out for now in the AuthenticationConfigSpec struct.)

Copy link
Member

@anguslees anguslees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fwiw, I'm mostly ok with this - with clarifications around failure recovery)

High level: "what would CIS say?" If we expect the CIS/other security audit recommendation to be "ensure your apiserver has the --disable-dynamic-auth flag", then there isn't much point going down this path. (I don't know the answer to this question.)

To prevent confusion with identities that are controlled by Kubernetes, the
`system:` prefix will be disallowed in the username and groups contained in the
`user.Info` object. A disallowed username will cause authentication to fail.
All disallowed groups will be filtered out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't put use this to assign users to system:masters group (ie: cluster-admin) ?
(sad face)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that you would create your own my-admins group and assign it cluster-admin via RBAC. There is no need to abuse system: identites.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already achieve (abuse) this via the CertificateSigningRequest API and there is no such restriction build in it. Why would the Dynamic Authentication Config API be different in this regard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already achieve (abuse) this via the CertificateSigningRequest API and there is no such restriction build in it. Why would the Dynamic Authentication Config API be different in this regard?

The latest version of the CSR API actively tries to prevent this kind of abuse.

In general, I believe it was a mistake for us to not tightly control what can assert a system: identity. Some things need to be reserved for Kube.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already achieve (abuse) this via the CertificateSigningRequest API and there is no such restriction build in it.

The CSR signer, under the control of the cluster operator, can choose not to sign a requested client cert.

// caBundle is a PEM encoded CA bundle used for client auth (x509.ExtKeyUsageClientAuth).
// +listType=atomic
// Required
CABundle []byte `json:"caBundle" protobuf:"bytes,1,opt,name=caBundle"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, please avoid the mistake of the aggregated apiserver api and make this caBundle a pointer to a separate Secret (or ConfigMap) and not inline. The rest of the AuthenticationConfig is likely to be static (eg: helm) manifest, but the caBundle contents will be unique for every install. This is way easier to generate (and rotate) if it is a separate object (eg: you could just use cert-manager to maintain a self-signed CA).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: This turned out to be the most annoying part of deploying an admission Webhook.

I vaguely recall hearing the reason why that CA certificates was defined statically, but I can't find the discussion now in Slack. I think it had something to do with not wanting the API server to have to watch the Secrets to reconfigure its Webhook client, though of course the API server already has to watch the Webhook configuration itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not plan to diverge from the approach used by existing APIs here. If they expand to allow references, I will mirror that here. Overall I am not too worried about the x509 case. It is not really an extension point I expect to see used in helm charts.

For webhook I agree that this adds pain, but again, I do not plan on diverging from the existing APIs here. Long term I would like Kube to provide first class support for a short-lived, auto rotated CA for the service network (and mint serving certs for all services that ask for one). Then the common case of "I just want to run this webhook on the cluster please make the TLS nonsense go away" would be painless.

cc @munnerz since cert-manager is mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this is the config for the x509 authentication type (i.e. authenticating the user request), not for authenticating the apiserver request to the authenticator, right?

I believe the concerns raised here are with regards to the latter situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this is the config for the x509 authentication type (i.e. authenticating the user request)

Correct. This CA bundle allows validation of client certs.

I believe the concerns raised here are with regards to the latter situation.

IIUC, the concern was "please let me separate my static config from dynamic config." For example, when configuring an OPA admission webhook that enforces policy across all resources, all ValidatingWebhookConfiguration across all clusters will basically look the same minus the CA bundle.

One could imagine a similar situation where most of the AuthenticationConfig object is static minus the CA bundle. I think it is far less likely and I must prefer being able to validate the CA bundle inline. I think there could be a case for allowing an empty CA bundle which would be ignored until filled in by some controller.

@enj
Copy link
Member Author

enj commented Apr 23, 2020

cc @JoshVanL @simonswine @mhrabovcin @mattbates @phanama @wallrj I see that you all contributed to jetstack/kube-oidc-proxy and thus may be interested in this KEP.

cluster-admin and impersonate the desired user [kube-oidc-proxy] [teleport].
Other than being a gross abuse of the impersonation API, this opens the API
server to escalation bugs caused by the proxy (such as improper handling of
incoming request headers). This proxy also intercepts network traffic which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this opens the API server to escalation bugs caused by the proxy (such as improper handling of incoming request headers)

Isn't this true with any authentication handler? I suppose the proposed Prefix requirement mitigates it to some degree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern I was raising here is that with an impersonation proxy your network diagram looks like:

[ user agent ] -- user creds over TLS --> [ impersonation reverse proxy ]  -- proxy credentials over TLS with impersonation headers --> [ API server ]

The reverse proxy has to handle the incoming request, authenticate it, sanitize it (ex: fail if incoming request has impersonation headers), and then pass it through to the API server with its own credentials. What happens if Kube ever adds new headers that have special meaning? Some headers may be benign from a security perspective and the proxy should pass them through. Others could be like impersonation and change the security parameters of the request. Having a proxy that is cluster-admin act a confused deputy is something I would like to avoid.

This network is far easier to protect IMO:

[ user agent ] -- user creds over TLS --> [ API server ]  -- possible webhook over TLS with user creds --> [ webhook ]


To prevent confusion with identities that are controlled by Kubernetes, the
`system:` prefix will be disallowed in the username and groups contained in the
`user.Info` object. A disallowed username will cause authentication to fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the credentials for a disallowed username weren't even forwarded to the dynamic authenticator. The only way of making this work that I can think of is if a static authenticator was able to recognize the format of the credential, but it was the wrong credential, then it could shortcircuit-deny the request. WDYT?

EDIT: alternatively, would it be possible to require clients authenticating with a dynamic authenticator to provide a username hint? (e.g. as a request header, only relevant for OIDC & webhook types). This way, the authentication could be skipped if the requested username doesn't match the prefix, and the request would be auto-denied if the authenticated username didn't match.

Copy link
Member Author

@enj enj May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the point that @mikedanese brought up in the sig-auth call around "I do not want to forward credentials that were meant for my static webhook to these dynamic webhooks when my static webhook has a transient failure and authentication fell through to the later authenticators." This is a valid concern and we should try to address it.

It would be nice if the credentials for a disallowed username weren't even forwarded to the dynamic authenticator.

This does assume that the API server knows in advance what the username will be. I think we can only know this for sure with certs and SA tokens.

The only way of making this work that I can think of is if a static authenticator was able to recognize the format of the credential, but it was the wrong credential, then it could shortcircuit-deny the request. WDYT?

The idea of adding short circuit deny logic into authentication makes me very nervous. I think the authn stack is much easier to reason about because we do not have to worry about "which authenticator owns a credential." I could also imagine cases where a user has a webhook today that validates tokens that look like service account JWTs. I do not want to accidentally break them.

would it be possible to require clients authenticating with a dynamic authenticator to provide a username hint? (e.g. as a request header, only relevant for OIDC & webhook types). This way, the authentication could be skipped if the requested username doesn't match the prefix, and the request would be auto-denied if the authenticated username didn't match.

I think there are some problems with such an approach:

  1. The client has to know the username it is trying to authenticate as. Some environments use UUIDs which make this type of flow very painful.
  2. How would the user tell kubectl to pass a such a header? Would we extend the exec credential plugin mechanism?
  3. If you have more than one webhook configured and they both use opaque tokens, how do you know which webhook to send the token to?

Here is my current thoughts on how we could make this safer:

  1. We require every AuthenticationConfig object to have a name with the format of domain:path_segment. Something like company.com:v1. We reserve [*.]k8s.io:* for any future Kube usage we come up with.
  2. We make no other changes to x509 or oidc because the validation for those authenticators is in-memory of the API server.
  3. For webhook, we only pass a token to a webhook for validation if it has the prefix <name_of_authentication_config>/. For example, if we see a token company.com:v1/D2tIR6OugyAi70do2K90TRL5A and we have a webhook configured with the name company.com:v1, we would strip the prefix (so that the webhook does not need to know its name) and pass D2tIR6OugyAi70do2K90TRL5A to the webhook. Note that this guarantees that we never pass a standard OIDC token to a webhook.

The above flow requires that a client be given the name of the webhook out of band in some manner and that it concatenate the prefix to the token (this could be done by an exec credential provider though we could add direct support to kubectl to help facilitate this).

I cannot think of any approach that does not involve the client providing us extra information to make the webhook case not leak tokens meant for one webhook to another webhook. Encoding it directly into the token as a prefix seems like the simplest approach that could be done today via exec plugins. Having to know the (probably static, maybe even well-known like github.com:kubernetes) name of the webhook authenticator seems okay to me.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also support a simple form of wildcard by allowing the client to send the token as company.com:*/D2tIR6OugyAi70do2K90TRL5A as a way of indicating "I am okay with sending this token to all company.com authenticators."

Not sure if that is a good idea, but it would be easy to implement.

Copy link
Member

@tallclair tallclair May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal makes sense to me. It requires tokens to be provisioned with the prefix though (unless the client adds them). Is that acceptable?

Copy link
Member Author

@enj enj May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires tokens to be provisioned with the prefix though (unless the client adds them). Is that acceptable?

Overall, I think this is acceptable because it could be made transparent to the end-user.

Authentication webhooks and exec plugins work together quite nicely to give a seamless experience. We also do not need to concern ourselves with oidc tokens (they just continue to work as they always have and are never sent to the dynamic webhooks). I see a few ways the token provsioner could handle the prefix requirement:

  1. The provisioner is aware of the prefix it needs to add (i.e. the prefix is static or the provsioner knows how the webhook was configured on a given cluster) and does so automatically; no client side changes are required
  2. The provisioner is unaware of the prefix (it could be an "old" provisioner or the name of the webhook is not stable across clusters) but the client adds it automatically via:
    1. Automatic config by the process that creates the kubeconfig file for the end-user
    2. Probing the cluster somehow for the information (i.e. the exec plugin knows how to access some public datastore that tells it what prefix to use on a given cluster)
    3. Manual config by the end-user (the user is informed out of band of the prefix information)

Only the last option leaks this implementation detail to end-users.

// caBundle is a PEM encoded CA bundle used for client auth (x509.ExtKeyUsageClientAuth).
// +listType=atomic
// Required
CABundle []byte `json:"caBundle" protobuf:"bytes,1,opt,name=caBundle"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this is the config for the x509 authentication type (i.e. authenticating the user request), not for authenticating the apiserver request to the authenticator, right?

I believe the concerns raised here are with regards to the latter situation.

GroupsClaim string `json:"groupsClaim" protobuf:"bytes,4,opt,name=groupsClaim"`
}

type PrefixConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a prefix, maybe use a glob pattern (please not a regex)? E.g. if I want to authenticate *@example.com or *@*.example.com

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a different use case than PrefixConfig which is meant to disambiguate the same username/group from different authenticators.

I think the use case is valid because it makes it significantly easier to use something like gitlab.com as your oidc IDP (i.e. maybe you do not want every user with a GitLab account to be able to authenticate to your cluster even if they are not authorized to do anything).

The approach I have been thinking around such a use case is:

// the asserted user must be a member of at least one of these groups to authenticate.
// set to ["*"] to disable this check.
// required (cannot be empty)
requiredGroups []string

Basically, a simple gate on group membership for the x509 and oidc authenticators. webhook can internally do anything to limit authentication based on other parameters (i.e. some form of authz check) so it matters a lot less in that case.

@mikedanese
Copy link
Member

Incomplete list of concerns:

  • Dynamic authentication webhook configuration builtin to kube-apiserver means that operators of traditional statically configured TokenReview webhooks need to consider that an outage may cause credentials presented to the kube-apiserver to redirect to $anywhere. This includes both credentials of end users and service accounts used by the operator for cluster management. This API means that no hosting provider can make a guarantee around the confidentiality of tokens presented to the kube-apiservers that they operate.
  • The extensibility of dynamic authentication webhook if built into the kube-apiserver is very limited. Any experimentation beyond the basic x509/bearer token/oidc mechanisms will require modification of the kube-apiserver or implementation in a front proxy. We have no path to support SAML, Kerberos, token binding, request signing, SPIFFE, v4, ALTS. We really need to focus our investment on extensibility in core.
  • Dynamic authentication built into the kube-apiserver substantially increases the bar for any pre-authorization DoS protection layer. The layer must now keep pace with changes to dynamic authentication configuration in the cluster, and keep pace with feature additions and API changes between versions of k8s.

If this API were used to configure an authenticating proxy, these issues would largely be avoided (in that they would be the responsibility of the proxy operator, not the cluster operator). The front proxy architecture is what we've been recommending to users for the past N years. If there are challenges with developing and operating an authentication proxy, we should identify and fix them.

@enj
Copy link
Member Author

enj commented May 13, 2020

Incomplete list of concerns

Can I get the complete list 😝

On a more serious note, please take the time to enumerate all concerns that you have. I cannot address them if I do not know what they are.

an outage may cause credentials presented to the kube-apiserver to redirect to $anywhere.

This is handled via the proposed change in #1689 (comment). PTAL.

The extensibility of dynamic authentication webhook if built into the kube-apiserver is very limited

I disagree. Authentication's extra field (set by a webhook) combined with webhook authorization and webhook admission give you a lot of control over the auth stack if you can set the CLI flags on the API server. All this KEP does it make it easier to use some of that functionality generically without CLI flag access.

Any experimentation beyond the basic x509/bearer token/oidc mechanisms will require modification of the kube-apiserver or implementation in a front proxy.

This is simply not true. The only requirement is that the final credential used for authentication be x509 / oidc / token. How you get that credential is completely open ended and well supported via exec plugins. For example, an exec plugin could perform a Kerberos challenge flow with some component and upon success be issued a short lived token for use against the Kube API. I think this is a very reasonable and desired approach. This is similar to how OpenShift supports a broad range of IDPs - no matter what IDPs you configure, the credential presented to the API server is always a bearer token. What this design avoids is forcing that "Kerberos component" to be a proxy for all requests.

We have no path to support

I would argue that these are orthogonal concerns to this KEP. If we did decide to support more authentication methods, we would have to configure them somehow. I am willing to bet that the structure of this API is far better suited to handle new authentication methods than a CLI flags based approach.

SAML, Kerberos

OpenShift, PKS, Dex, etc support these types of IDPs by giving the user a token upon successful authentication.

token binding, request signing, SPIFFE, v4, ALTS.

This KEP does not hinder any efforts in that space.

We really need to focus our investment on extensibility in core.

I would argue that this KEP does exactly that. If you could kubectl apply a complete auth stack, people would be far more willing to invest in this space, IMO. I am also personally signing up for all implementation, testing and maintenance of this feature for as long as I work on Kube, which is the best any single actor can do.

increases the bar for any pre-authorization DoS protection layer.

I do not see how this is any different from a front proxy approach. DoS via the proxy is just as serious as DoS directly against the API. Saying that this kind of DoS is "not your problem because its not your proxy" is hardly helpful. I would argue this is the realm of the on-going priority and fairness efforts.

The layer must now keep pace with changes to dynamic authentication configuration in the cluster, and keep pace with feature additions and API changes between versions of k8s.

I do not see this as a real problem. This KEP covers every "good" authn API we have today that has been developed over 5+ years. No one is actively trying to add net new authentication protocols to Kube. It is simply easier to issue short lived certs / tokens that are obtained using a user's desired mode of authentication. In terms of API velocity, auth in Kube is incredibly slow moving.

If this API were used to configure an authenticating proxy, these issues would largely be avoided (in that they would be the responsibility of the proxy operator, not the cluster operator).

This is not a solution. All that does is push the responsibility elsewhere and it introduces a whole set of new problems.

The front proxy architecture is what we've been recommending to users for the past N years.

And this was a mistake. It was a convenient response that allowed us to do nothing and push the burden onto end-users. This is a real problem that end-users face. The response to this proposed API change from the community has been positive, even from managed Kubernetes offerings. Dynamic control over authn is the third most requested feature in EKS per @micahhausler.

All of authentication, authorization, admission, etc could be implemented with a proxy. But no one would be satisfied with that approach. Why do we think that is okay for authentication? Would projects like OPA be nearly as successful in Kube if admission plugins could only be configured via the CLI? Note that I mentioned "an ecosystem" on the last SIG Auth call. I was not referring to an ecosystem of auth proxies. I was referring to an ecosystem of apps built on top of an opinionated Kuberentes native auth stack. If auth stacks were generically portable across Kube in the way admission plugins are, it would actually make sense to develop on top of them. Auth is just another extension point.

If there are challenges with developing and operating an authentication proxy, we should identify and fix them.

I have listed quite a few in #1689 (comment). They cannot be fixed because they are a by product of a proxy based architecture. It is also simply not possible to protect CRD based API with the same level of control as a built-in API.

keps/sig-auth/1688-dynamic-authentication-config/README.md Outdated Show resolved Hide resolved
keps/sig-auth/1688-dynamic-authentication-config/README.md Outdated Show resolved Hide resolved
-->

This change aims to add a new API Kubernetes REST API called
`AuthenticationConfig`. It is similar to the `ValidatingWebhookConfiguration`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bike-shedding: AuthenticationConfiguration for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯_(ツ)_/¯

keps/sig-auth/1688-dynamic-authentication-config/README.md Outdated Show resolved Hide resolved
enj added 10 commits May 15, 2020 00:30
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Monis Khan <mok@vmware.com>
@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label May 15, 2020
@enj
Copy link
Member Author

enj commented May 15, 2020

@mikedanese @tallclair the KEP is up to date with all comments now.

@dims
Copy link
Member

dims commented May 16, 2020

cc @liggitt

occur. Performance will also be increased by limiting the number of webhooks
that can be invoked on a given request.

It is the responsibility of the token issuer, the webhook, and the client to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a purely practical point of view, how would the token issuer even know the name of the AuthenticationConfig driving it? This model seems like it might work if you deployed a per-cluster authenticator and then wired the configuration across, but that seems antithetical to previous notion of authenticator webhooks as things that can exist outside of cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if a webhook exists outside of the cluster, it still needs to know what cluster is calling it. Otherwise you risk getting into scenarios where the token meant for one cluster can be replayed against another (and if the webhook and cluster have a 1:1 mapping they could just be hard coded with the name).

A simple approach would be to encode the name and cluster info into the request path. An exec plugin could also perform this by discovering what name it needs to use based on some public pre-auth metadata hosted on the cluster itself.

@deads2k
Copy link
Contributor

deads2k commented May 18, 2020

I'm generally opposed to this KEP.

  1. Authentication is already extensible via command line flags without recompiling a kube-apiserver binary. This gives cluster providers (the actor setting the flags) the ability to extend the authentication external to the kube-apiserver binary.
  2. This configuration is rarely changed, so there is no need to introduce additional risk by refactoring easy to reason about static initialization to dynamically reloaded configuration. This also means it doesn’t warrant introducing risk by allowing remote configuration of request interception.
  3. This configuration is highly privileged, which is another reason to avoid introducing risk by allowing remote configuration of request interception.
  4. To directly reply to "Dynamic control over authn is the third most requested feature in EKS” - there is no reason that EKS cannot expose an API to do exactly that. My understanding is that EKS (all other major hosted services) already curate the list of configuration options that they want to support.

@seh
Copy link

seh commented May 18, 2020

there is no reason that EKS cannot expose an API to do exactly that. My understanding is that EKS (all other major hosted services) already curate the list of configuration options that they want to support.

They can, but so far they won't. I know this from having tried for a long while.

What I expected you to say is that one could possibly build out this feature on top of the existing configuration approach, by implementing a Webhook server that in turn delegates to any number of other configured Webhook servers. Is that what @mikedanese was suggesting with the front proxy (#1689 (comment))?

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also opposed to accepting the proposal in its current form.

My recommendation would be to narrow this to improving the OIDC provider config, and for clusters that wanted to allow control of that config via a REST API, to use a CRD-based API paired with a config file writer.

Comment on lines +178 to +184
Configuring authentication via command line flags has some limitations. Some
forms of config are simply easier to specify and understand in a structured
manner such as via a Kubernetes API or file. The existing authentication flags
for OIDC and token webhook limit the user to only one of each of these types of
authentication modes as there is no way to specify the set of flags multiple
times [#71162]. These flag based configs also require an API server restart to
take effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these are current limitations, but they are resolveable without exposing authentication configuration via a REST API (structured config and config file reload are used elsewhere in the API server where justified)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt could the config file be loaded via a configmap + restart/sigup to yield similar dynamic behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the configmap bit would depend on whether the API server process was running in a pod that could have configmaps mounted to it (static pods can't), but the reloadable bit seems plausible... that's the alternative I described in #1689 (comment)

Comment on lines +268 to +272
#### Story 2

Bob creates an `AuthenticationConfig` object with `spec.type` set to `x509`. He
is then able to create a custom signer for use with the CSR API. It can issue
client certificates that are valid for authentication against the Kube API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to overlap significantly with the kubernetes.io/kube-apiserver-client signer

Comment on lines +285 to +291
There is a service running in Charlie's cluster: `metrics.cluster.svc`. This
service exposes some metrics about the cluster. The service is assigned the
`system:auth-delegator` role and uses the `tokenreviews` API to limit access to
the data (any bearer token that can be validated via the `tokenreviews` API is
sufficient to be granted access). Charlie uses his GitHub token to authenticate
to the service. The API server calls the dynamic authentication webhook and is
able to validate the GitHub token. Charlie is able to access the service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to encourage audienceless tokens like the legacy serviceaccount tokens we're trying to phase out. For auth to in-cluster services, credentials that cannot be replayed as API server credentials seem like a better thing to work toward.

Comment on lines +309 to +314
Frank is exploring different options for authentication in Kubernetes. He browses
various repos on GitHub. He finds a few projects that are of interest to him.
He is able to try out the functionality using `kubectl apply` to configure his
cluster to use the custom authentication stacks. He finds a solution that he
likes. He uses the appropriate `kubectl apply` command to update his existing
clusters to the new authentication stack.
Copy link
Member

@liggitt liggitt May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't personally find the ability to casually replace configured authenticators via kubectl apply a comforting thought, but maybe that's because I'm a paranoid admin :)

To prevent confusion with identities that are controlled by Kubernetes, the
`system:` prefix will be disallowed in the username and groups contained in the
`user.Info` object. A disallowed username will cause authentication to fail.
All disallowed groups will be filtered out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already achieve (abuse) this via the CertificateSigningRequest API and there is no such restriction build in it.

The CSR signer, under the control of the cluster operator, can choose not to sign a requested client cert.

-->

1. Create a new Kubernetes REST API that allows configuration of authentication
- x509
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed in addition to the CSR mechanism for obtaining kube-apiserver client certificates?

- Token Webhook
2. Changes made via the REST API should be active in the order of minutes without
requiring a restart of the API server
3. Allow the use of a custom authentication stack in hosted Kubernetes offerings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there evidence this is a thing most hosted Kubernetes offerings want and plan to enable? If this would not be enabled by default, or made part of conformance, or opted into broadly, I question whether it should exist as a built-in API at all.

Copy link

@dcberg dcberg May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the general concept trying to be solved but I am not in favor of the approach being proposed to achieve the goal. I would request that if this proposal is accepted that the feature is provided as an “opt-in” feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many many customers I talk to are looking to use k8s clusters from multiple providers and have consistent auth among them. This is something that comes up again and again with enterprise customers.

I think we should be looking at this from the point of view of the end users of the clusters (both cluster admins and app teams) and not hosting providers. After all, they are the real drivers of the success of Kubernetes.

@dcberg -- if this is something that users want and you don't enable it then you'll be at a competitive disadvantage. That is how this should work. Create the capability and then let the real users vote with their feet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Joe, thank you for putting it so well. In several organizations I've seen so far, the cloud provider's authentication system integrated into their Kubernetes offerings are of no use to us. They are merely an impediment, used only begrudgingly and thereafter ignored.

The cost is that these clusters wind up using very few principals with coarsely assigned permissions ("Fine, then every developer is a cluster administrator!"), for lack of finer-grained control that could be employed consistently across all the clusters hosted with different providers.

Audit logging is no longer accurate. RBAC is less useful. In all, the system becomes less secure and less trustworthy for important workload.

Copy link

@dcberg dcberg May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbeda I don’t disagree that there is a need to have common auth across clusters. While this proposal would not break a hosted solution, it may limit the use of other integrations with the provider.
My ask is that the feature has an “opt-in” mechanism to allow providers flexibility to expose the feature in such as a way to limit impact and/or to point users to an alternative approach.

2020-04-15: Initial KEP draft created
2020-05-15: KEP updated to address various comments

## Drawbacks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected a discussion of the increased security exposure of kubectl apply gaining the ability to add additional authentication sources

The proposed functionality will be gated behind a new feature flag called
`DynamicAuthenticationConfig`.

The proposed API will reside in the `authentication.k8s.io` group at version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be a distinct group if it was going to be opt-in (which I would expect) or be CRD-based (as described in my comment on the alternatives section).


TBD.

## Alternatives
Copy link
Member

@liggitt liggitt May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to focus on the OIDC provider options.

That could look like the following:

  1. allow the kube-apiserver to configure multiple oidc providers via a structured, reloadable config file
  2. Separately, create a CRD for custom resources that map to the items in the oidc config in a straightforward way
  3. Providers that want to expose the config via a REST API can install the CRD and the custom-resource -> OIDC config file process
  4. Providers that do not want to expose the config still benefit from multiple oidc provider capability and have no additional exposure and no new API surface to lock down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to Jordan's overall comment about "make OIDC better."

Overall, I do not understand how a REST API (I discuss two below) ends up being different than the impersonation API or the CSR API from a security perspective. If we give the cluster operator the same level of control, why does it matter?

As a thought experiment, if I was to propose a new UserTokenRequestAPI that worked like the CSR API's kubernetes.io/kube-apiserver-client signer, would that be okay? This would be a way for some privileged component to ask for a token for a particular user info that the Kube API would honor. The token itself would be opaque to the client. What ever component issued the token could choose not to based on whatever requirements the cluster-operator specifies. This handles the "the identity is rooted in a trust owned the API server / cluster operator." Such an API adds one layer of indirection but that could be hidden from the end user.


My recommendation would be to narrow this to improving the OIDC provider config, and for clusters that wanted to allow control of that config via a REST API, to use a CRD-based API paired with a config file writer.

1. allow the kube-apiserver to configure multiple oidc providers via a structured, reloadable config file

2. Separately, create a CRD for custom resources that map to the items in the oidc config in a straightforward way

3. Providers that want to expose the config via a REST API can install the CRD and the custom-resource -> OIDC config file process

4. Providers that do not want to expose the config still benefit from multiple oidc provider capability and have no additional exposure and no new API surface to lock down

So we would need to in k/k:

  1. Make a new file based API
  2. Wire that file based config through
  3. Reload that config

Outside of k/k:

  1. Create a new CRD
  2. Create new controllers to handle the CRD
  3. Create new admission plugins to restrict access to the CRD, but with far weaker security (they can be removed if you can delete the webhook)

For every single provider:

  1. Ask them to run all of those components co-located with their API servers
  2. Configure credentials and the correct authz for these components
  3. Set the right flags on the API server

This sounds an incredible amount of work in-tree, out-of-tree, and for every single provider. I understand the desire to build things out of core, but I do not think this moves us forward in any meaningful way.

Why would we instead not:

In k/k:

  1. Create a new dynamic OIDC config API, in a new group, disabled by default
  2. Wire the new API through
  3. Reload the API config

Outside of k/k:

No work

For every single provider:

  1. Ask them to pass a single flag to enable the API if they wish to support it

This is the same amount of work and changes in k/k. There is no work outside of k/k and the burden on the providers is as small as we can make it. Since its off by default, no security concerns are added. The provider would opt-in in the same way they would for the CRD based flow. They just would not have to do a large amount of provider specific work.


Is there evidence this is a thing most hosted Kubernetes offerings want and plan to enable? If this would not be enabled by default, or made part of conformance, or opted into broadly, I question whether it should exist as a built-in API at all.

From what I can tell, EKS and AKS would. GKE and OpenShift would not (though OpenShift already lets you control IDPs via a Kube REST API). Not sure about others. I suspect Linode, Digital Ocean, Rancher might. It would be enabled in VMware's products (duh).

If every hosted offering had a Kube API that let you configure auth in some way that they were comfortable with, then we would be fine as-is. The reality is that even when providers want to allow this, it is a lot of work to build all the tooling to wire this stuff through. They also tend to use non Kube APIs, which makes it hard to integrate with.

@lavalamp
Copy link
Member

We make extensibility features dynamic when it is expected that they will change over the life of a cluster.

I personally can't see ever wanting to change an authentication system once the cluster was doing something. (Also, an unwanted authn system change sounds like the one of the most thorough pwnings I can imagine.)

So, I think this is a candidate for static config file configuration, not dynamic API based configuration.

Access to the static config files is a higher privilege than "cluster admin". I think requiring the highest privilege to change this is good.

If the problem is just getting this right once, at cluster initialization, as I suspect, I'd look at making a config file + support in Cluster API.

If the problem is not that, and people want to change this dynamically, during the life of their cluster, more than once... I guess I'd want to see a long list of people who want that, it's really hard for me to believe.

@jbeda
Copy link
Contributor

jbeda commented May 21, 2020

@lavalamp I think you are conflating things. There is the need to have end users configure it in a standard way regardless of hosting provider that is different for when and how often it changes.

This is something that end users have hit their head against.

Allowing the apis/mechanisms for doing this in a provider-neutral way is a first step. It gives end users something to tool against and allows them to work with their providers to enable the capability.

Cluster API is great but it doesn't get to that need to work against managed clusters. In fact, cluster API is a great option for those customers (and there are many) that opt to run their own clusters so that they can get access to configure auth.

@lavalamp
Copy link
Member

@jbeda

There is the need to have end users configure it in a standard way regardless of hosting provider that is different for when and how often it changes. This is something that end users have hit their head against.

I've asked the people in my org who would know these things if they agree with this assertion or not. If you can provide clearer evidence for this assertion that would be helpful.

Note that static config is also a "standard way". The debate is actually about whether the config must be end-user accessible or not.

Allowing the apis/mechanisms for doing this in a provider-neutral way is a first step.

It is pointless if the providers wouldn't enable it. If this is to be an end user extensibility feature, it needs to universal. And therefore non-optional, required by conformance. Otherwise you won't actually get more portability, you get less. So, I wouldn't want to start down this road unless there's very clear demand from users. Right now it's not clear that there is (maybe it's clear to you but that doesn't really help the rest of us).

@seh
Copy link

seh commented May 21, 2020

I wouldn't want to start down this road unless there's very clear demand from users. Right now it's not clear that there is (maybe it's clear to you but that doesn't really help the rest of us).

The audience here is the administrator or operator deploying Kubernetes clusters. It may be a small group with a big impact. In a given organization, there may be one or two such people setting up dozens of clusters to be used by hundreds of people to run applications serving millions of people. Does that count as one user (the organization), or two (the administrators), or hundreds (the developers)?

When I've talked to others interested in Kubernetes about authentication, I've heard several times, "Oh, I didn't think you were allowed to configure things like that." In other words, increasingly, this configuration being inaccessible turns people off from thinking about it. They just assume that "Kubernetes does not have a way to add users."

These are all anecdotes. What counts as proof? What counts as enough?

@sftim
Copy link
Contributor

sftim commented May 21, 2020

Mostly an aside, could be something to bear in mind.

Can I create an AuthenticationConfig that lets me implement authorisation with the client identity masked / erased?

For example, using SAML it's relatively easy for an IdP to make an assertion about the bearer: they're a staff member, they have a role in Project 23 and another role in project 42, and they have the following 7 entitlements [entitlements go here].

Current ways of working with Kubernetes seem to me to infer that all authorization decisions are based around a known user identity, whereas that's not always the case. Sometimes you can, eg, take the ARN of an AWS IAM role, and put that in place of the username, or have a mapping.

It's good to retain the ability to authorized something even if you don't have a unique identifier for the caller, and never will.

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm picturing the same symptoms as https://github.com/kubernetes/enhancements/pull/1689/files#r428871572 and wondering how Kubernetes authnz looks a good few minor releases from now.

(and I hope this feedback is useful)

environments have no consistent way to expose this functionality [#166]. To get
around this limitation, users install authentication proxies that run as
cluster-admin and impersonate the desired user [kube-oidc-proxy] [teleport].
Other than being a gross abuse of the impersonation API, this opens the API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be useful to restate the intent for the impersonation API (which I think is to provide a model to allow specific identity A to request the ability to act as different identity B in some scope), so that the reader can draw their own conclusion about suitability, alternatives, etc.

@liggitt
Copy link
Member

liggitt commented May 21, 2020

My comment at #1689 (review) still reflects my current thinking, and I confirmed with David #1689 (comment) still reflects his.

Of the authentication mechanisms this KEP discusses, supporting multiple OIDC providers seems the most reasonable goal. However, I would characterize what this KEP proposes as cluster configuration API, not something that should be built into the kube-apiserver.

I considered the following questions:

  1. Are there gaps in Kubernetes components that prevent this from being built on top? In this case, there are, and I would welcome proposals addressing these gaps:

    • expand the kube-apiserver OIDC support to allow multiple OIDC providers (useful for a variety of reasons)
    • add support for updating the kube-apiserver OIDC config file without restarting the server (we already do similar things for files containing things like CA bundles and TLS certificates)
  2. With those gaps addressed, could this proposal to expose OIDC config as a REST API be built on top of the kube-apiserver outside kubernetes/kubernetes?

    • An OIDCProvider (strawman) CRD could be defined that maps easily to items in the kube-apiserver OIDC config file
    • OIDCProvider instances could be created via the REST API and transformed to the kube-apiserver OIDC config file

Some benefits of that approach:

  • the OIDCProvider API can be iterated on quickly
  • cluster operators that want to expose an API are not limited to exposing it via the kube-apiserver being configured (they could expose it in an underlay/config API)
  • cluster operators that don't want to expose an API can still benefit from multi-oidc-provider support

Since we are in agreement that this proposal is not acceptable, we will close this PR and look forward to reviewing proposals to fill in the capability gaps identified in the kube-apiserver.

@liggitt, @deads2k, and @mikedanese

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet