KEP: pull based audit event stream #2241

CaoShuFeng · 2018-06-11T11:48:30Z

This tries to fix: kubernetes/kubernetes#53455

tallclair · 2018-06-11T18:23:17Z

/assign

tallclair · 2018-06-11T18:23:26Z

/sig auth

tallclair

Thanks, nice start!

tallclair · 2018-06-11T18:37:21Z

keps/0013-pull-based-audit-events-stream.md

+
+## Summary
+
+This proposal aims at providing a pull-based stream for audit logging. This would look similar to a watch request to the API server, and provide stream of audit events. The stream does not return any previous events (i.e. no associated storage), it would only stream events that occurred after the stream was opened. Two non resource-url endpoints are installed to provide the stream. '/audits/' for all audit events generated in the API server, and '/audits/{namespace}' for audits events in the specified namespace. And query parameters could be used to get subset of audit events.


Two non resource-url endpoints

I think these should be resource endpoints, even though they're "pseudo resources"

When implementing pseudo resources, there are dragons I heard.

What about implementing "real resources"?

For example, audit events in previous 10 minutes would be saved in etcd.
A new controller is introduced to delete old audit events(e.g. audit events 10 minutes ago)

This would have 2 benefits:

Aggregated API server could post it's audits to this endpoint.

Could work in HA environment. If audit events are not saved in etcd storage, an API server could only return audit events happened in it's own place.

Doubling the load for each request.

Doubling the load for each request.

😢 😢 😢
Yes, but I haven't figure out another way to implement this in HA environment.

I did some research about aggregation server. kube-aggregator could also aggregate audit events if the aggregated api-server has the same endpoint exposed.

But if the kube-apiserver is behind a load balancer, audit events in another kube-apiserver could not be returned, because no storage is involved.
Is that OK?

It is a major constraint, but might be acceptable depending on the environment. It just means that your auditing system has to know and query all HA instances of kube-apiserver.

tallclair · 2018-06-11T18:41:40Z

keps/0013-pull-based-audit-events-stream.md

+This proposal aims at providing a pull-based stream for audit logging. This would look similar to a watch request to the API server, and provide stream of audit events. The stream does not return any previous events (i.e. no associated storage), it would only stream events that occurred after the stream was opened. Two non resource-url endpoints are installed to provide the stream. '/audits/' for all audit events generated in the API server, and '/audits/{namespace}' for audits events in the specified namespace. And query parameters could be used to get subset of audit events.
+
+## Motivation
+The current audit backends (log & webhook) require very high privileges (file access to the master) or prior configuration (webhook). There are a number of use cases for dynamic and unprivileged access to the audit logs. Examples include:


Most of these are addressed by dynamic audit configuration. I think it's worth highlighting the different use cases for each here. In particular, I see the streaming endpoint as useful for short-term debugging usecases, e.g. where I run a client-side tool to pull events for a short period, and the push model is better for long-running usecases, but requires more setup.

Will modify it in the next commit.

tallclair · 2018-06-11T18:43:14Z

keps/0013-pull-based-audit-events-stream.md

+- Bandwidth limit in server scope or per-namespace.
+
+### Non-Goals
+- Access control at user/group level(e.g. allow user A to only get audits events from users A,B,C)


I would generalize this to include any granularity beyond namespaces (including users, resources, etc.)

tallclair · 2018-06-11T18:43:46Z

keps/0013-pull-based-audit-events-stream.md

+- Access control. Allow unprivileged users to get audits in their own namespace.
+- Bandwidth limit in server scope or per-namespace.
+
+### Non-Goals


Also custom per-stream policies should be out of scope (at least for the initial proposal). The audit stream should be based on the cluster-wide audit policy.

Will add it in the next commit

tallclair · 2018-06-11T18:44:43Z

keps/0013-pull-based-audit-events-stream.md

+
+## Proposal
+
+### Make audit backends dynamicly registerable


This seems out-of-scope for this proposal. #2188 covers this case, and isn't required for a streaming backend.

Will remove it.

tallclair · 2018-06-11T18:50:49Z

keps/0013-pull-based-audit-events-stream.md

+       username string
+       // groups in the audit event, if all specified groups exist in the audit event, then it will be sent to user.
+       // optional
+       groups []string


I thought you said username & groups were out of scope under "non goals"?

Will remove it.

tallclair · 2018-06-11T18:51:15Z

keps/0013-pull-based-audit-events-stream.md

+|/audits             |all audit events generated in the API server|username, group, namespace, apiGroup, resource, verbs|
+|/audits/{namespaces}|audits events in the specified namespace    |username, group, apiGroup, resource, verbs           |
+
+definition of query parameters


I don't think we should use any of these, at least not initially.

Will remove it.

tallclair · 2018-06-11T18:52:16Z

keps/0013-pull-based-audit-events-stream.md

+When the sent bytes exceeds the bandwidth, API server will try to truncate the audit event, and check the limit again. If bandwidth limit is still exceeded, the event would be droped.
+
+
+### User Stories


nit: move user-stories to before the proposal.

tallclair · 2018-06-11T18:53:31Z

keps/0013-pull-based-audit-events-stream.md

+#### Story 2
+As a developer, I will easily be able to get a subset of audit events. This will save a lot of time for debugging. I don't need to grep audit events from a couple of audit files.
+
+### Risks and Mitigations


Would be good to talk about SLO. In particular, I think these endpoints should be treated as debugging & development endpoints, not for actual secure audit logging.

Will add it.

tallclair · 2018-06-11T18:55:44Z

keps/0013-pull-based-audit-events-stream.md

+- Access control at user/group level(e.g. allow user A to only get audits events from users A,B,C)
+
+
+## Proposal


Can you add more implementation details? In particular, I'd like to see:

Is this a new audit backend?

How are events routed to the watchers?

How are the watch connections maintained? (Does apimachinery solve this?)

What about the aggregated API server case?

will the kube-aggregator aggregate audit events?

will the kube-aggregator aggregate audit events?

Old binaries will not support this.

I think new compiled aggregated apiserver could also support this endpoint in different api groups. If they support this, they can register this to kube-aggregator.

If we turn this into a resource endpoints (as @tallclair suggests above), it will be part of discovery. And in theory the aggregator could pick it up. But those api groups will conflict. So the same resource can only be used for one API server at a time. So this topic needs some more thought, to be at least extensible in the future.

We could store audit events into etcd storage, aggregated API server could post their audit events to kube-apiserver.

We only trace audit events in aggregator

Misread @tallclair suggestion. Do you have a WIP PR? Sounds reasonably easy to do, if there are no surprises.

Do you have a WIP PR? Sounds reasonably easy to do, if there are no surprises.

Yes there is. But it's really work in progress， especially the authorizer part.

kubernetes/kubernetes#64494

Hmm, I forgot about HA masters. I think this still doesn't address that. I guess this would need some pubsub/message solution - does anything else in kubernetes solve a similar problem, or does everything just rely on etcd?

Actually, this seems pretty similar to (non-audit) Events.

sttts · 2018-06-12T11:08:21Z

keps/0013-pull-based-audit-events-stream.md

+- Bandwidth limit in server scope or per-namespace.
+
+### Non-Goals
+- Access control at user/group level(e.g. allow user A to only get audits events from users A,B,C)


Can RBAC rules express this?

sttts · 2018-06-12T11:15:40Z

keps/0013-pull-based-audit-events-stream.md

+### Goals
+- Provide a new pull based audit event stream, which returns events that occurred after the stream was opened.
+- Allow users go get a subset of audit events.
+- Access control. Allow unprivileged users to get audits in their own namespace.


How do we make sure that the audit events do not leak information the user cannot access via "normal" resource queries otherwise (i.e. which are restricted via RBAC) ?

We can have such level restriction:
Allow user A to read audit events in namespace-1.

We can't have this:
Allow user A to read audit events about pods,services in namespace-1.
Or:
Allow user A to read audit events of user A, user B,user C.

Since we treat audit events as pseudo-resource. I think we can restrict the access control of audit events as other normal resources.

But we don't support resourceNames, since audit events don't have a name.

They could have a pseudo-name.

I was wondering whether we really want to invent another authz system for audit events or whether we can piggy pack on RBAC rules. I.e. when querying (via /audit or a real resource URL) I see all events of objects I could also read via normal APIs. Or we add another verb to RBAC called "AUDIT". Just thinking out loud.

/cc @kubernetes/sig-auth-misc @liggitt

. I.e. when querying (via /audit or a real resource URL) I see all events of objects I could also read via normal APIs.

We need to call Authorize function for each audit events, because the RBAC rules could be changed dynamically. Looks like this could cause more performance effect.

I think for an initial implementation it's acceptable to just authorize it at the audit resource level (i.e. I need read permissions on the audit/namespaces/foo resource), and document this as a something to watch out for. Then it's up to the admin / namespace admin to restrict audit permissions to those who should actually be able to read everything in the cluster/namespace.

Also, sensitive information (secrets) should be filtered from audit logs anyway.

Also, sensitive information (secrets) should be filtered from audit logs anyway.

if we add a mechanism for filtering object content for specific resources (e.g. not allowing or ignoring an audit policy that tries to include that info), let's ensure it doesn't hard-code the Secret resource in the internals of the audit handler... each API server should be able to determine what resources it considers sensitive.

I also don't think we should get in the game of partially sensitive info... apply has shown that data from one part of an object is likely to bleed into other parts.

I mean this should just be handled in the policy. e.g. https://github.com/kubernetes/kubernetes/blob/56d9d73f38489f09d55ccee442ed77169484b146/cluster/gce/gci/configure-helper.sh#L837-L839. The whole object body is excluded in this case.

sttts · 2018-06-12T11:16:08Z

/cc @soltysh

tallclair · 2018-06-19T23:45:41Z

@kubernetes/sig-api-machinery-feature-requests

lavalamp · 2018-06-20T01:30:54Z

On Tue, Jun 19, 2018, 5:24 PM Jordan Liggitt ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In keps/0013-pull-based-audit-events-stream.md <#2241 (comment)>: > +[Tools for generating]: https://github.com/ekalinin/github-markdown-toc + +## Summary + +This proposal aims at providing a pull-based stream for audit logging. This would look similar to a watch request to the API server, and provide stream of audit events. The stream does not return any previous events (i.e. no associated storage), it would only stream events that occurred after the stream was opened. Two non resource-url endpoints are installed to provide the stream. '/audits/' for all audit events generated in the API server, and '/audits/{namespace}' for audits events in the specified namespace. And query parameters could be used to get subset of audit events. + +## Motivation +The current audit backends (log & webhook) require very high privileges (file access to the master) or prior configuration (webhook). There are a number of use cases for dynamic and unprivileged access to the audit logs. Examples include: +- allow tenants to get audit events in their own namespace +- provide a subset of audit events, so it will be easier to query and analyze (e.g. kubernetes/kubernetes#56683) + + +### Goals +- Provide a new pull based audit event stream, which returns events that occurred after the stream was opened. +- Allow users go get a subset of audit events. +- Access control. Allow unprivileged users to get audits in their own namespace. Also, sensitive information (secrets) should be filtered from audit logs anyway. if we add a mechanism for filtering object content for specific resources (e.g. not allowing or ignoring an audit policy that tries to include that info), let's ensure it doesn't hard-code the Secret resource

I think we should mark fields as sensitive in the openAPI spec. Many places could use that info. in the internals of the audit handler... each API server should be able to

…

determine what resources it considers sensitive. I also don't think we should get in the game of partially sensitive info... apply has shown that data from one part of an object is likely to bleed into other parts. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#2241 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnglu5UUJTcODa7sxJhybcuILQ8cVwRks5t-ZYfgaJpZM4UijJ2> .

k8s-ci-robot · 2018-07-18T07:27:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: idvoretskyi

If they are not already assigned, you can assign the PR to them by writing /assign @idvoretskyi in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

keps/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

CaoShuFeng · 2018-07-19T01:21:07Z

@tallclair @sttts
I updated it, please take a look again.

liggitt · 2018-07-19T02:51:29Z

keps/0013-pull-based-audit-events-stream.md

+Two new resource endpoint '/apis/audit.k8s.io/v1beta1/events' and '/apis/audit.k8s.io/v1alpha1/namespaces/{namespace}/namespacedevents' are installed.
+Unlike other resource endpoints which usually serve resource objects from key-value storage, these endpoints read events from a audit backend and only support the WATCH verb. '/apis/audit.k8s.io/v1beta1/events' provides a stream of all audit events in the cluster. '/apis/audit.k8s.io/v1alpha1/namespaces/{namespace}/namespacedevents' provides a stream of all audit events in the namespace. A new API object NamespacedEvent is introduced for the namespaced endpoint. NamespacedEvent has the same definition with the audit Event object which we already have, but `metadata.namespace` element is required for it.
+
+The audit events returned to users are the events generated according to the audit policy file. If an event is omitted by the audit policy file, the event would not be sent to user by these endpoints. For aggregated API servers, those endpoints only supports metadata-level, because kube-apiserver only record these events at metadata level.


can you outline the data flow of audit information for a set of HA api servers, and how audit information from the various servers makes it to this handler, and then to the user?

Sorry, for a HA environment, users have to know and query all HA instances of kube-apiserver.
I think this would work like a headless services.

That doesn't work if the HA instances are behind a loadbalancer. I think it should be OK to document this as a known issue, and move forward with the current proposal to alpha. I do think HA support should be a blocker before going to beta. @liggitt @sttts WDYT?

As for solutions, I think either the audit events need to be put into a etcd circular buffer, or the apiserver that ends up serving the request makes proxy requests to the other apiservers. That assumes apiserver <-> apiserver connectivity though, which might not be viable.

I think we should at least have a sketch of how this would work both with multiple instances of the same apiserver and with multiple distinct servers for different API groups to decide if it seems feasible before proceeding.

For example, would satisfying this API require all apiservers to be forwarding their audit events to a collating sink, would the filters/verbosity of that forwarding be determined by active watchers on the audit pull API, etc

Having a pull mechanism with a collating sink sounds odd. Actually, if one really wants that, why not use the existing webhook push mechanism to send events to the sink and then pull from there?

In other words, I think having to pull all apiservers "manually" (and knowing about the server topology) is a fair compromise with a pull mechamism in HA setups.

why not use the existing webhook push mechanism to send events to the sink and then pull from there?

This could be a Aggregated API Server

Maybe we should try prototyping this out-of-tree using the push-based dynamic webhook?

prototyping this out-of-tree

This is what Aggregated API Server do, I think.

justaugustus · 2018-10-13T03:36:26Z

/kind kep

k8s-ci-robot · 2018-10-13T03:36:28Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

justaugustus · 2018-11-20T04:42:41Z

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

justaugustus · 2018-12-01T08:05:58Z

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

k8s-ci-robot · 2018-12-01T08:05:59Z

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Jun 11, 2018

k8s-ci-robot requested review from idvoretskyi and jbeda June 11, 2018 11:48

CaoShuFeng changed the title ~~pull based audit event stream~~ KEP: pull based audit event stream Jun 11, 2018

CaoShuFeng mentioned this pull request Jun 11, 2018

FR: streaming audit endpoint kubernetes/kubernetes#53455

Closed

pull based audit event stream

2ec9076

CaoShuFeng force-pushed the pull_based_audit_event_stream branch from 13cbc1c to 2ec9076 Compare June 11, 2018 12:05

k8s-ci-robot assigned tallclair Jun 11, 2018

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jun 11, 2018

tallclair reviewed Jun 11, 2018

View reviewed changes

sttts reviewed Jun 12, 2018

View reviewed changes

k8s-ci-robot requested a review from soltysh June 12, 2018 11:16

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 19, 2018

jdumars mentioned this pull request Jul 12, 2018

KEP #13: pull based audit event stream kubernetes-retired/architecture-tracking#9

Closed

update

b8901d2

liggitt reviewed Jul 19, 2018

View reviewed changes

k8s-ci-robot added the kind/kep label Oct 13, 2018

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 13, 2018

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 20, 2018

k8s-ci-robot closed this Dec 1, 2018


		## Summary

		This proposal aims at providing a pull-based stream for audit logging. This would look similar to a watch request to the API server, and provide stream of audit events. The stream does not return any previous events (i.e. no associated storage), it would only stream events that occurred after the stream was opened. Two non resource-url endpoints are installed to provide the stream. '/audits/' for all audit events generated in the API server, and '/audits/{namespace}' for audits events in the specified namespace. And query parameters could be used to get subset of audit events.

		When the sent bytes exceeds the bandwidth, API server will try to truncate the audit event, and check the limit again. If bandwidth limit is still exceeded, the event would be droped.


		### User Stories

		- Access control at user/group level(e.g. allow user A to only get audits events from users A,B,C)


		## Proposal

KEP: pull based audit event stream #2241

KEP: pull based audit event stream #2241

Conversation

CaoShuFeng commented Jun 11, 2018

tallclair commented Jun 11, 2018

tallclair commented Jun 11, 2018

tallclair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CaoShuFeng Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CaoShuFeng Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sttts commented Jun 12, 2018

tallclair commented Jun 19, 2018

lavalamp commented Jun 20, 2018 via email

k8s-ci-robot commented Jul 18, 2018

CaoShuFeng commented Jul 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CaoShuFeng Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justaugustus commented Oct 13, 2018

k8s-ci-robot commented Oct 13, 2018

justaugustus commented Nov 20, 2018

justaugustus commented Dec 1, 2018

k8s-ci-robot commented Dec 1, 2018

CaoShuFeng Jun 12, 2018 •

edited

Loading

CaoShuFeng Jun 12, 2018 •

edited

Loading

CaoShuFeng Jul 31, 2018 •

edited

Loading