Advanced Auditing 1.10 umbrella bug #58083

crassirostris · 2018-01-10T13:46:13Z

This is a continuation of the work on the Advanced Auditing feature

Umbrella issue for 1.11 release: #60392

API-related changes

~~Graduate API to stable~~
- Moved to 1.11
Integration with admission
- Add Annotations to Advanced Audit API by @CaoShuFeng
- ~~Add RBAC information to audit logs by @CaoShuFeng~~
  - Moved to 1.11
- ~~Add PodSecurityPolicy admission information to audit logs~~
  - Moved to 1.11
- ~~What about webhooks? We want to answer: Who injected this side-car container?~~
  - Moved to 1.11
Audit policy supports pattern matching
- @hzxuzhonghu is working on this in Audit support resource wildcard matching #55306

Bugfixes and improvements

Introduce buffering for the log backend to fix file writing bottleneck
- @crassirostris is taking over this work from @hzxuzhonghu in Introduce buffered audit backend #60076 and Audit use buffered backend #60237
Restore audit logging in the scalability tests
- @crassirostris is working on this in Re-enable audit logging in scale tests test-infra#7000
Audit rule matching logic incorrect when user attribute is nil
- @CaoShuFeng is working on this. PR: fix invalid match rules for advanced audit policy #59398
~~Let a user choose an output API version~~
- Moved to 1.11
~~Don't duplicate ResponseStatus: Don't duplicate status in audit events #60108~~
- Moved to 1.11
~~[Optional] Invalid api group for delete operations~~
- Moved to 1.11
~~[Optional] Always retry audit webhook~~
- Moved to 1.11

Policy changes

~~[Optional] GCE audit policy should be made re-usable by other setups~~
- Moved to 1.11
~~[Optional] Audit policy should be tested (e.g. that it includes all core resources, more context)~~
- Moved to 1.11

Documentation

@crassirostris is working on this in kubernetes/website#7679

Add a deprecation warning to the legacy audit system
Describe setting up auditing in multi-apiserver setup
Describe setting up throttling parameters depending on the Kubernetes deployment
Specify behavior of patch requests: audit.Event.RequestObject underspecified for patch requests #60110

To discuss

~~[Optional] Auditing federation setups~~
- Moved to 1.11
~~Auditing multi-apiserver setups~~
- Already supported at some degree. Additional features to follow after v1

/cc @sttts @soltysh @tallclair @ericchiang @CaoShuFeng @hzxuzhonghu @piosz

Feel free to add what's missing

tallclair · 2018-01-10T18:36:16Z

RE: integration with admission. I think it would be useful to see something similar for authn/z as well (i.e. which plugin authorized / authenticated the request, with the ability for the plugin to add arbitrary annotations (e.g. RBAC adds the role & binding used)). Not a requirement for 1.10 or GA, but nice to have.

jennybuckley · 2018-01-11T21:20:08Z

/remove-sig api-machinery

crassirostris · 2018-01-12T13:43:59Z

Audit logging is a shared effort between sig-auth and sig-api-machinery

tallclair · 2018-01-17T19:23:35Z

In another thread, we were discussing audit logging in very high QPS clusters (e.g. very large clusters). That feels like something that should be addressed, or at least well defined (e.g. audit logging is only supported up to XX QPS).

A suggestion I threw out is to implement graceful degradation of the audit infrastructure (probably in the open source). One way of doing this would be to introduce priority to the audit policy, and then when the buffers are full (or too many requests start erroring) drop the lower priority events before the higher priority ones. We already have a course priority defined through the metadata read/write and data read/write CAL classification. We could also define some hard audit QPS limits (although it might be possible by adjusting the current parameters), and start dropping lower priority events to stay under those QPS limits.

I'm not sure the full priority mechanism should block GA, but I think we should have some more reliable or well-defined behavior in this area.

crassirostris · 2018-01-18T12:05:51Z

(e.g. audit logging is only supported up to XX QPS).

It actually depends on the audit logging configuration. You can always support more QPS by bumping the audit logging throughput (e.g. the number of allowed QPS issued by webhook). It's a matter of policy in each cluster, not implementation per se

tallclair · 2018-01-19T01:51:35Z

Fair point... do we have any idea how cluster size maps to the defaults though? I.e. at what point should the cluster admin expect to need to raise those defaults or tighten the policy?

Automatic merge from submit-queue (batch tested with PRs 53895, 58013, 58466, 58531, 58535). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. return reason for allowed rbac authorizations includes the binding, role, and subject that allowed a request so audit can make use of it xref #56209 #58083 ### example reasons > allowed by ClusterRoleBinding "system:controller:cronjob-controller" of ClusterRole "system:controller:cronjob-controller" to ServiceAccount "cronjob-controller/kube-system" > allowed by RoleBinding "bob-viewer/default" of ClusterRole "view" to User "bob" ### perf impact ```shell go test ./plugin/pkg/auth/authorizer/rbac/ -run foo -bench . -benchmem ``` on master: ``` BenchmarkAuthorize/allow_list_pods-8 500000 2674 ns/op 1632 B/op 27 allocs/op BenchmarkAuthorize/allow_update_pods/status-8 500000 2858 ns/op 1632 B/op 27 allocs/op BenchmarkAuthorize/forbid_educate_dolphins-8 500000 2654 ns/op 1632 B/op 27 allocs/op ``` with this PR: ``` BenchmarkAuthorize/allow_list_pods-8 500000 2697 ns/op 1664 B/op 28 allocs/op BenchmarkAuthorize/allow_update_pods/status-8 500000 2873 ns/op 1680 B/op 29 allocs/op BenchmarkAuthorize/forbid_educate_dolphins-8 500000 2687 ns/op 1664 B/op 28 allocs/op ``` ```release-note NONE ```

sttts · 2018-01-22T11:22:16Z

What is missing for Auditing multi-apiserver setups?

As far as I see, we have the audit options in the generic apiserver library and have enabled it through

kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/recommended.go

Line 35 in 509df60

Audit *AuditOptions

in the RecommendedOptions. So the basic support exists.

We do not have

custom user-agent strings or a Component field that can either be kube-apiserver or e.g. service-catalog-apiserver. Or does this work through the kubeconfig passed to the webhook?
central configuration. For that we would need some kind of ComponentConfig for user provided API servers, or use a well-known ConfigMap in the kube-public namespace.

/cc @deads2k

tallclair · 2018-01-24T19:14:31Z

With advanced audit logging going to GA, we should also begin the deprecation process of legacy audit logging. I think that is probably just adding a deprecation warning here for now, and make sure the removal plan is documented along with the legacy audit docs.

tallclair · 2018-01-24T19:41:38Z

From sig-auth discussion:

Graduating to V1 should require:

Restore audit in the scalability tests (already planned for 1.10)
A story for how other components are expected to integrate into the audit system.

For (2), the idea is that going to V1 puts the burden of integration on new features (as opposed to audit developers), but first they need the necessary hooks to integrate. I think the annotations (#58143) is the last piece of this story.

W.r.t. multi-apiserver setups, I think that the 2 issues @sttts mentioned should not be blockers for audit going to v1. Unless we want to move the policy & configuration to a component config, and deprecate the flags approach. However, there is a bootstrapping problem that needs to be solved to configure audit logging through component config (so we might still want flags?).

smurfralf · 2018-02-05T14:28:19Z

Please also include fix for Issue #59331

crassirostris · 2018-02-13T15:34:21Z

I added points about documentation & deprecation to the list of items to do

Fair point... do we have any idea how cluster size maps to the defaults though? I.e. at what point should the cluster admin expect to need to raise those defaults or tighten the policy?

It depends on the expected QPS to the apiserver, which may be pretty irregular depending on the cluster setup and workloads. It seems it would be useful to add a paragraph to the documentation that would explain how these values should be set

custom user-agent strings or a Component field that can either be kube-apiserver or e.g. service-catalog-apiserver. Or does this work through the kubeconfig passed to the webhook?

AFAIU, it's possible to pass user agent through kubeconfig: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/rest/config.go#L83. However, it seems this behavior should be better documented

W.r.t. multi-apiserver setups, I think that the 2 issues @sttts mentioned should not be blockers for audit going to v1

Agree. I mentioned this in the issue text. 2everyone: feel free to counter if you don't think this is the right approach

tallclair · 2018-02-21T01:06:50Z

Is the intent still for this to reach stable in 1.10? Code freeze is coming up (02-26). There are a lot of issues still open (including 2 I just filed: #60108 and #60110).

At this point I'm inclined to suggest we bump GA out to 1.11 unless there are strong objections.

tpepper · 2018-02-21T18:28:33Z

@crassirostris and @tallclair pending your decision on 1.10 or not please make sure to scrub the set of issues referenced off this one and either add the approved and approved-for-milestone labels or remove the v1.10 milestone target.

crassirostris · 2018-02-25T17:06:44Z

I agree, it doesn't make sense to try ask reach stable in 1.10, too much work, too little done already

I created a new umbrella issue for audit logging in 1.11 and moved items we're not pursuing this release there

The rest I believe is doable, most of the things left are either in review or docs-related

crassirostris · 2018-02-25T17:40:50Z

@CaoShuFeng @tallclair @sttts @ericchiang What's your take on the annotations effort? Do you think it makes sense to push for 1.10, or it's better to postpone it until 1.11?

CaoShuFeng · 2018-02-26T02:05:22Z

@CaoShuFeng @tallclair @sttts @ericchiang What's your take on the annotations effort? Do you think it makes sense to push for 1.10, or it's better to postpone it until 1.11?

I am OK in either way.

crassirostris · 2018-02-26T16:07:27Z

Thanks!

@CaoShuFeng I think #58143 defintely should go to 1.11, there's too much work left. I'm moving it to the 1.11 umbrella issue

@tallclair @sttts WDYT about #58807? Is it close enough to completeness that we can ask for a couple of days for an exception?

crassirostris · 2018-02-28T00:24:19Z

It seems we don't have enough capacity to review #58807 and it has to be pushed to 1.11

jberkus · 2018-02-28T23:36:06Z

Should this issue be closed/moved to 1.11 then?

jberkus · 2018-03-02T19:51:59Z

oops, missing priority was my fault, moving back into milestone:

/priority critical-urgent
/remove-priority important-soon

liggitt · 2018-03-06T16:46:33Z

@tallclair has the work planned for 1.10 been completed at this point?

tallclair · 2018-03-06T16:52:11Z

Remaining work is resolving the test failures, #60719

crassirostris · 2018-03-06T17:17:22Z

@liggitt Besides test failures, docs only. I will create a PR shortly, my fault I didn't do this earlier

jberkus · 2018-03-08T00:45:33Z

Since the tests are now fixed, what remains with this tracker?

crassirostris · 2018-03-08T14:44:09Z

@jberkus Only docs: kubernetes/website#7679

k8s-github-robot · 2018-03-11T08:22:33Z

[MILESTONENOTIFIER] Milestone Issue Needs Attention

@crassirostris @kubernetes/sig-api-machinery-misc @kubernetes/sig-auth-misc

Action required: During code freeze, issues in the milestone should be in progress.
If this issue is not being actively worked on, please remove it from the milestone.
If it is being worked on, please add the status/in-progress label so it can be tracked with other in-flight issues.

Action Required: This issue has not been updated since Mar 8. Please provide an update.

Note: This issue is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required

Issue Labels

sig/api-machinery sig/auth: Issue will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/feature: New functionality.

Help

liggitt · 2018-03-13T04:15:24Z

doc PR is open and tagged for the 1.10 milestone. moving this issue to 1.11

crassirostris · 2018-03-13T14:47:04Z

Well, there's a separate issue for 1.11, so to avoid constant pinging from the bot and considering that docs PR is almost ready, I'll move this issue back to 1.10 and close

crassirostris added this to the v1.10 milestone Jan 10, 2018

k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 11, 2018

crassirostris added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 12, 2018

liggitt mentioned this issue Jan 19, 2018

return reason for allowed rbac authorizations #58531

Merged

tpepper mentioned this issue Feb 21, 2018

audit.Event.RequestObject underspecified for patch requests #60110

Closed

jberkus mentioned this issue Feb 21, 2018

1.10 Issue Burndown kubernetes/sig-release#86

Closed

crassirostris added the approved-for-milestone label Feb 25, 2018

k8s-ci-robot added status/approved-for-milestone and removed approved-for-milestone labels Feb 25, 2018

k8s-github-robot added the milestone/removed label Mar 2, 2018

k8s-github-robot removed this from the v1.10 milestone Mar 2, 2018

k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 2, 2018

jberkus removed the milestone/removed label Mar 2, 2018

jberkus added this to the v1.10 milestone Mar 2, 2018

k8s-github-robot added the milestone/needs-attention label Mar 2, 2018

crassirostris mentioned this issue Mar 8, 2018

Update Audit Logging documentation for 1.10 kubernetes/website#7679

Merged

liggitt modified the milestones: v1.10, v1.11 Mar 13, 2018

crassirostris closed this as completed Mar 13, 2018

crassirostris modified the milestones: v1.11, v1.10 Mar 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Auditing 1.10 umbrella bug #58083

Advanced Auditing 1.10 umbrella bug #58083

crassirostris commented Jan 10, 2018 •

edited

tallclair commented Jan 10, 2018

jennybuckley commented Jan 11, 2018

crassirostris commented Jan 12, 2018

tallclair commented Jan 17, 2018

crassirostris commented Jan 18, 2018

tallclair commented Jan 19, 2018

sttts commented Jan 22, 2018

tallclair commented Jan 24, 2018

tallclair commented Jan 24, 2018

smurfralf commented Feb 5, 2018

crassirostris commented Feb 13, 2018

tallclair commented Feb 21, 2018

tpepper commented Feb 21, 2018

crassirostris commented Feb 25, 2018

crassirostris commented Feb 25, 2018

CaoShuFeng commented Feb 26, 2018

crassirostris commented Feb 26, 2018

crassirostris commented Feb 28, 2018

jberkus commented Feb 28, 2018

jberkus commented Mar 2, 2018

liggitt commented Mar 6, 2018

tallclair commented Mar 6, 2018

crassirostris commented Mar 6, 2018

jberkus commented Mar 8, 2018

crassirostris commented Mar 8, 2018

k8s-github-robot commented Mar 11, 2018

liggitt commented Mar 13, 2018

crassirostris commented Mar 13, 2018

Advanced Auditing 1.10 umbrella bug #58083

Advanced Auditing 1.10 umbrella bug #58083

Comments

crassirostris commented Jan 10, 2018 • edited

API-related changes

Bugfixes and improvements

Policy changes

Documentation

To discuss

tallclair commented Jan 10, 2018

jennybuckley commented Jan 11, 2018

crassirostris commented Jan 12, 2018

tallclair commented Jan 17, 2018

crassirostris commented Jan 18, 2018

tallclair commented Jan 19, 2018

sttts commented Jan 22, 2018

tallclair commented Jan 24, 2018

tallclair commented Jan 24, 2018

smurfralf commented Feb 5, 2018

crassirostris commented Feb 13, 2018

tallclair commented Feb 21, 2018

tpepper commented Feb 21, 2018

crassirostris commented Feb 25, 2018

crassirostris commented Feb 25, 2018

CaoShuFeng commented Feb 26, 2018

crassirostris commented Feb 26, 2018

crassirostris commented Feb 28, 2018

jberkus commented Feb 28, 2018

jberkus commented Mar 2, 2018

liggitt commented Mar 6, 2018

tallclair commented Mar 6, 2018

crassirostris commented Mar 6, 2018

jberkus commented Mar 8, 2018

crassirostris commented Mar 8, 2018

k8s-github-robot commented Mar 11, 2018

liggitt commented Mar 13, 2018

crassirostris commented Mar 13, 2018

crassirostris commented Jan 10, 2018 •

edited