Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: streaming audit endpoint #53455

Closed
tallclair opened this issue Oct 4, 2017 · 21 comments
Closed

FR: streaming audit endpoint #53455

tallclair opened this issue Oct 4, 2017 · 21 comments
Assignees
Labels
area/apiserver area/audit area/logging kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth.

Comments

@tallclair
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

Description:

Provide a pull-based stream for audit logging. This would look similar to a watch request to the API server, and provide a stream of audit events. I would not expect the stream to return any PREVIOUS events (i.e. no associated storage), it would only stream events that occurred after the stream was opened. This would be a new audit backend plugin.

The should also have some options to filter events. Minimally, there should be a namespaced version that only includes audit events tied to that namespace, so that RBAC can be used to grant subdivided access. Secondarily, it might be nice to be able request only events tied to a given user or resource.

Motivation:

The current audit backends (log & webhook) require very high privileges (file access to the master) or prior configuration (webhook). There are a number of use cases for dynamic and unprivileged access to the audit logs. Examples include:

  • audit2rbac - generate RBAC rules from audit logs. It would be nice to simply point this tool at a running cluster. With the namespace subdivision, an admin could grant developers working in a namespace access to their audit logs to use this tool, without granting the full cluster audit logs.
  • Debugging - related to the above point, API audit logs can be very useful for debugging, and this would provide an option to surface a subset of the logs to developers.
  • 3rd party audit plugins - This isn't a strong use case as the stream would be less reliable than the push-based webhook, but there might be other use cases for plugging in logging tools to the stream.

Challenges:

Audit logs can be very noisy, so there is a risk that providing this stream (and potentially multiple connections) could have a performance impact. Subdividing the audit logs (namespaces, resources, etc.) would help. We could also potentially restrict this to metadata-level, though that would negatively impact the debugging use case (maybe a request parameter?). Also, the stream should not have the reliability guarantees as the webhook & log backends, if the stream is congested events will be dropped.

/cc @crassirostris @liggitt @destijl @sttts @kubernetes/sig-auth-feature-requests

@tallclair tallclair added area/apiserver area/logging sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Oct 4, 2017
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 4, 2017
@mbohlool
Copy link
Contributor

mbohlool commented Oct 5, 2017

cc @caesarxuchao

@crassirostris
Copy link

crassirostris commented Dec 4, 2017

Another important use case (xref #56683):

In current e2e test audit events are read from the log file. However, there are a couple of problems:

  • In load tests, a lot of audit events are produced. Network problem when transferring GBs of data will cause the tests to fail
  • Log file can be rotated during the tests, which also triggers failure
  • Current implementation cannot be used in any environment, it's tightly coupled with the configuration in k8s.io/cluster/gce

loburm added a commit to loburm/kubernetes that referenced this issue Dec 4, 2017
Remove this tag once functionality from feature request kubernetes#53455 is implemented.
k8s-github-robot pushed a commit that referenced this issue Dec 4, 2017
Automatic merge from submit-queue (batch tested with PRs 55360, 56444, 56687, 56791, 56802). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add DisabledForLargeClusters tag to audit tests.

Remove this tag once functionality from feature request #53455 is implemented.

Fixes #56683.

```release-note
NONE
```
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2018
@shyamjvs
Copy link
Member

@crassirostris @loburm Can we turn back on the audit tests on large-scale now?

@crassirostris
Copy link

@shyamjvs Nope, this feature is a pre-requisite for enabling audit tests on large scale

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 28, 2018
@CaoShuFeng
Copy link
Contributor

/assign

@CaoShuFeng
Copy link
Contributor

CaoShuFeng commented Apr 17, 2018

Hi, @tallclair
I have some questions about the implemention detail:

Provide a pull-based stream for audit logging. This would look similar to a watch request to the API server, and provide a stream of audit events.

This is a plugin work as a server which needs to bind a port?

The should also have some options to filter events. Minimally, there should be a namespaced version that only includes audit events tied to that namespace, so that RBAC can be used to grant subdivided access.

This requires a authentication progress.
But I think authentication should not be as complex as the apiserver, we chose one authentication method in the first?

it might be nice to be able request only events tied to a given user or resource.

Totally agree!!!

@CaoShuFeng
Copy link
Contributor

Hi, I'd be happy to work on this if no one has startted it.

@CaoShuFeng
Copy link
Contributor

it might be nice to be able request only events tied to a given user or resource.

Since audit events are non-nanmespaced objects, how to use RBAC to implement access control for this stream?

@CaoShuFeng
Copy link
Contributor

Hi.
#64494

CaoShuFeng added a commit to CaoShuFeng/community that referenced this issue Jun 11, 2018
CaoShuFeng added a commit to CaoShuFeng/community that referenced this issue Jun 11, 2018
CaoShuFeng added a commit to CaoShuFeng/community that referenced this issue Jun 11, 2018
@CaoShuFeng
Copy link
Contributor

CaoShuFeng commented Jun 11, 2018

@tallclair @loburm
I made a KEP for this feature request. Please take a look.
Thanks.
kubernetes/community#2241

@tallclair
Copy link
Member Author

awesome, thanks!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 9, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tallclair
Copy link
Member Author

/remove-lifecycle rotten

@tallclair tallclair reopened this Jan 3, 2019
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 3, 2019
@tallclair tallclair added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Mar 6, 2019
@liggitt
Copy link
Member

liggitt commented Apr 3, 2019

given the HA implications, the inverse of this (dynamic audit webhook registration) seems preferable

@tallclair
Copy link
Member Author

Agreed. What I would actually like to see is an out-of-tree implementation of this. An application that is registered as a dynamic audit webhook and serves the audit stream (maybe with an in-memory history cache). It could either serve a custom API, or register itself as an extension API server.

@liggitt
Copy link
Member

liggitt commented Apr 19, 2019

Sounds good. Closing this here then.

/close

@k8s-ci-robot
Copy link
Contributor

@liggitt: Closing this issue.

In response to this:

Sounds good. Closing this here then.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/audit area/logging kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Projects
None yet
9 participants