Skip to content

Conversation

@pbarker
Copy link
Contributor

@pbarker pbarker commented Dec 20, 2018

Adds proposal for outgoing webhook auth from the apiserver and its aggregates

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 20, 2018
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/pm labels Dec 20, 2018
@pbarker
Copy link
Contributor Author

pbarker commented Dec 20, 2018

/sig api-machinery
/kind api-change

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Dec 20, 2018

```go
type AuthInfo struct {
TokenRequest *authenticationv1.TokenRequestSpec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure an entire token request spec makes sense, for a few reasons:

  1. the target probably doesn't care about expiration or bound object, just audience

  2. it seems safer to make the audience an inherent part of the way the webhook is called (e.g. tokens obtained for calling https://mywebhook.com would have an audience of https://mywebhook.com, and tokens obtained for calling myservice in mynamespace would have an audience of https://myservice.mynamespace.svc, similar to how TLS validation for webhook calls is currently done)

Copy link
Contributor Author

@pbarker pbarker Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I was thinking the bound object would possibly be used to limit the exposure of a token, but probably isn't necessary

  2. Being prescriptive with the audience makes sense.

So, from the API perspective, we probably just want a boolean switch. I'm thinking it still makes sense to wrap in an an AuthInfo struct so we can potentially add other methods down the road.

type AuthInfo struct {
   ProvisionToken  bool
}
type ClientConfig struct {
   Name     string
   URL      string
   CABundle []byte
   AuthInfo *AuthInfo
   Service  *ClientConfigService
}

WDYT?

@pbarker pbarker changed the title Outgoing Webhook Auth API server authentication to webhooks Dec 21, 2018
@kfox1111
Copy link

Trying to understand the proposal. Each call to a webhook would require the webhook to call back to the api server for a token validation? Would this be a significant overhad in a chain of validators?

@pbarker
Copy link
Contributor Author

pbarker commented Jan 17, 2019

Would this be a significant overhad in a chain of validators?

could you explain this a bit more?

The general idea is to just provide a simple way to authenticate the apiserver, the immediate use case are the audit webhooks. These would only be a couple requests a minute.

@mbohlool
Copy link
Contributor

The connection between API Server and Webhooks are https. Currently Webhooks provide a CA in their config so API Server can authenticate them, we can do the reverse and API Servers can expose their CA so that the webhook can use it for https connection. I guess the problem need to be solved for that approach is multiple API Servers. But that can be fixed but the alternative suggestion of this KEP. They can be secret with ID of each API Server and we can provide the API Server's ID in AdmissionReview. The CAs can potentially be exposed using discovery APIs but that could be a little tricky.

@tsandall
Copy link

👋 dropping in with two questions. Apologies if the answers are obvious, I'm just trying to understand the implications of the proposal.

  1. Why does the proposal say that dynamic admission control and dynamic audit webhooks suffer from a lack of authentication? My understanding was that you could configure the apiserver with authentication credentials: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#authenticate-apiservers. What am I missing?

  2. What does it mean to perform a TokenReview? Does this mean the webhook server has to call back into the apiserver (or some central service) to authenticate the webhook caller?

@caesarxuchao
Copy link

My understanding was that you could configure the apiserver with authentication credentials:

@tsandall that's right. I think the missing features are

  • webhook servers cannot discover apiserver's CA.
  • users can't change the content of the kubeconfig file via an API.

@tsandall
Copy link

@caesarxuchao is there a reason that bearer tokens or client certificates could not be managed via an API like (or as part) webhook configurations? From my perspective, it's unclear why requiring webhook servers callback to the apiserver to authenticate (requests which the apiserver made) is desirable. Perhaps I'm misunderstanding the proposal.

@kfox1111
Copy link

@tsandall related to your question one, I think the issue is you have to manage all the secrets in each api-server aggregated together into one config file passed through from the host into the kube-apiserver container. This is inconvenient. It would be nice if the secret could be stored as part of the dynamic registration process or if there was some other way to automatically managing the trust establishment.

I think tokenreview is an api call back to the apiserver. So that means a validating web hook would go client -> apiserver -> validating webhook -> apiserver. which feels a bit heavy to me.

What about a simpler option where a secret could be specified in the clientConfig section of the ValidatingWebhookConfiguration, and that secret is passed to the hook? the hook server could read the same secret and validate it was passed. Should be very easy to implement, should probably be good enough security, and not have the extra hop increasing performance?

@tsandall
Copy link

@tsandall related to your question one, I think the issue is you have to manage all the secrets in each api-server aggregated together into one config file passed through from the host into the kube-apiserver container. This is inconvenient. It would be nice if the secret could be stored as part of the dynamic registration process or if there was some other way to automatically managing the trust establishment.

Agreed. Moving the "API" from a config file/command line argument into a first-class API seems natural.

I think tokenreview is an api call back to the apiserver. So that means a validating web hook would go client -> apiserver -> validating webhook -> apiserver. which feels a bit heavy to me.

Agreed.

What about a simpler option where a secret could be specified in the clientConfig section of the ValidatingWebhookConfiguration, and that secret is passed to the hook? the hook server could read the same secret and validate it was passed. Should be very easy to implement, should probably be good enough security, and not have the extra hop increasing performance?

I'm not sure what's best here. Embedding secrets inside the webhook configuration seems like it might be a problem? Alternatively, the webhook configuration could refer to a Secret?

@kfox1111
Copy link

Yeah, thats what I was thinking. Just a reference to a secret (name, namespace, key) rather then the secret data itself.

@kfox1111
Copy link

It is not universal yet I think, but you can get the ca out of the configmap cluster-info in the kube-public namespace.

@pbarker
Copy link
Contributor Author

pbarker commented Jan 18, 2019

So @kfox1111 @tsandall we have definitely explored the secret route here but there were a couple issues we couldn't get past yet.

  • No way to distinguish between apiservers
  • Assumes uniform permissions to all apiservers
  • Provisioning the credentials is nontrivial. Apiserver aggregates often live in separate namespaces and cannot share a secret, and asking to provision secrets for every api aggregate was seen as too much overhead.

Check out Jordans talk from above #658 (comment) he does a good job at laying out the space.

The solution presented is just intended to provide a simple means for cluster aware webhooks to authenticate. In our evaluation that was sufficient to solve the majority of use cases. We would like to solve the secret problem eventually but its looking like its going to be a complicated and sticky issue that may take awhile to resolve.

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 28, 2019
@maxsmythe
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2020
@palnabarun
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2020
@maxsmythe
Copy link
Contributor

/remove-lifecycle stale

Per above comment, letting webhooks authenticate the API server would be very helpful for hardening K8s security model.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2020

The `AuthInfo` struct will also be added to the [ClientConfig](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/auditregistration/v1alpha1/types.go#L134) in the auditregistration API.

The token audience would be that of the webhook name. The receiving server could verify that it is the intended
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For audience, please use the base address of the URL - not the webhook name. AFAIK this is the recommended use of audience.

```

The client will check and refresh the token when necessary. The server can now check the token using a
[TokenReview](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/authentication/types.go#L45).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to also allow for the tokens to use proper OIDC, if the K8S server is integrated with such a IDP.

The audience-based tokens mounted in Pods seem to also support OIDC in some cases, would be great to be consistent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this suggestion.

This can also solve the case where the webhook/audit-endpoint is not kubernetes-aware. From a quick glance, this seems to be a problem that can eventually be solved by Istio and SPIFFE. I'm guessing the author is trying to provide a better user experience that can quickly establish auth with a simple flag/config.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2020
@maxsmythe
Copy link
Contributor

/remove-lifecycle rotten

Still interested in this

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 9, 2020
@benhxy
Copy link

benhxy commented Feb 2, 2021

I think the workflow can be broken down into two parts: 1) token issuing/ secret provisioning, and 2) token validation.

Part 1 should be the core of this proposal.
Webhook side should be free to choose how to implement part 2. One way is to use TokenReview. Another way is to get a CA cert from kube-apiserver and verify the token locally (not sure if this path exists).

@yliaog
Copy link
Contributor

yliaog commented Mar 20, 2021

/cc

@k8s-ci-robot k8s-ci-robot requested a review from yliaog March 20, 2021 00:56
@liggitt
Copy link
Member

liggitt commented Apr 30, 2021

should this be closed in favor of #2510?

@ritazh
Copy link
Member

ritazh commented Jun 14, 2021

Closing in favor of #2510
/close

@k8s-ci-robot
Copy link
Contributor

@ritazh: Closed this PR.

In response to this:

Closing in favor of #2510
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.