Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Auditing 1.10 umbrella bug #58083

Closed
11 tasks done
crassirostris opened this issue Jan 10, 2018 · 28 comments
Closed
11 tasks done

Advanced Auditing 1.10 umbrella bug #58083

crassirostris opened this issue Jan 10, 2018 · 28 comments
Labels
area/audit kind/feature Categorizes issue or PR as related to a new feature. milestone/needs-attention priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Milestone

Comments

@crassirostris
Copy link

crassirostris commented Jan 10, 2018

This is a continuation of the work on the Advanced Auditing feature

Umbrella issue for 1.11 release: #60392

API-related changes

Bugfixes and improvements

Policy changes

Documentation

@crassirostris is working on this in kubernetes/website#7679

To discuss

  • [Optional] Auditing federation setups
    • Moved to 1.11
  • Auditing multi-apiserver setups
    • Already supported at some degree. Additional features to follow after v1

/cc @sttts @soltysh @tallclair @ericchiang @CaoShuFeng @hzxuzhonghu @piosz

Feel free to add what's missing

@crassirostris crassirostris added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/feature Categorizes issue or PR as related to a new feature. sig/auth Categorizes an issue or PR as relevant to SIG Auth. area/audit labels Jan 10, 2018
@crassirostris crassirostris added this to the v1.10 milestone Jan 10, 2018
@tallclair
Copy link
Member

RE: integration with admission. I think it would be useful to see something similar for authn/z as well (i.e. which plugin authorized / authenticated the request, with the ability for the plugin to add arbitrary annotations (e.g. RBAC adds the role & binding used)). Not a requirement for 1.10 or GA, but nice to have.

@jennybuckley
Copy link

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 11, 2018
@crassirostris
Copy link
Author

Audit logging is a shared effort between sig-auth and sig-api-machinery

@crassirostris crassirostris added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 12, 2018
@tallclair
Copy link
Member

In another thread, we were discussing audit logging in very high QPS clusters (e.g. very large clusters). That feels like something that should be addressed, or at least well defined (e.g. audit logging is only supported up to XX QPS).

A suggestion I threw out is to implement graceful degradation of the audit infrastructure (probably in the open source). One way of doing this would be to introduce priority to the audit policy, and then when the buffers are full (or too many requests start erroring) drop the lower priority events before the higher priority ones. We already have a course priority defined through the metadata read/write and data read/write CAL classification. We could also define some hard audit QPS limits (although it might be possible by adjusting the current parameters), and start dropping lower priority events to stay under those QPS limits.

I'm not sure the full priority mechanism should block GA, but I think we should have some more reliable or well-defined behavior in this area.

@crassirostris
Copy link
Author

(e.g. audit logging is only supported up to XX QPS).

It actually depends on the audit logging configuration. You can always support more QPS by bumping the audit logging throughput (e.g. the number of allowed QPS issued by webhook). It's a matter of policy in each cluster, not implementation per se

@tallclair
Copy link
Member

Fair point... do we have any idea how cluster size maps to the defaults though? I.e. at what point should the cluster admin expect to need to raise those defaults or tighten the policy?

k8s-github-robot pushed a commit that referenced this issue Jan 20, 2018
Automatic merge from submit-queue (batch tested with PRs 53895, 58013, 58466, 58531, 58535). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

return reason for allowed rbac authorizations

includes the binding, role, and subject that allowed a request so audit can make use of it

xref #56209 #58083 

### example reasons

> allowed by ClusterRoleBinding "system:controller:cronjob-controller" of ClusterRole "system:controller:cronjob-controller" to ServiceAccount "cronjob-controller/kube-system"

> allowed by RoleBinding "bob-viewer/default" of ClusterRole "view" to User "bob"

### perf impact
```shell
go test ./plugin/pkg/auth/authorizer/rbac/ -run foo -bench . -benchmem
```
on master:
```
BenchmarkAuthorize/allow_list_pods-8         	  500000	      2674 ns/op	    1632 B/op	      27 allocs/op
BenchmarkAuthorize/allow_update_pods/status-8         	  500000	      2858 ns/op	    1632 B/op	      27 allocs/op
BenchmarkAuthorize/forbid_educate_dolphins-8          	  500000	      2654 ns/op	    1632 B/op	      27 allocs/op
```

with this PR:
```
BenchmarkAuthorize/allow_list_pods-8         	  500000	      2697 ns/op	    1664 B/op	      28 allocs/op
BenchmarkAuthorize/allow_update_pods/status-8         	  500000	      2873 ns/op	    1680 B/op	      29 allocs/op
BenchmarkAuthorize/forbid_educate_dolphins-8          	  500000	      2687 ns/op	    1664 B/op	      28 allocs/op
```


```release-note
NONE
```
@sttts
Copy link
Contributor

sttts commented Jan 22, 2018

What is missing for Auditing multi-apiserver setups?

As far as I see, we have the audit options in the generic apiserver library and have enabled it through

in the RecommendedOptions. So the basic support exists.

We do not have

  • custom user-agent strings or a Component field that can either be kube-apiserver or e.g. service-catalog-apiserver. Or does this work through the kubeconfig passed to the webhook?
  • central configuration. For that we would need some kind of ComponentConfig for user provided API servers, or use a well-known ConfigMap in the kube-public namespace.

/cc @deads2k

@tallclair
Copy link
Member

With advanced audit logging going to GA, we should also begin the deprecation process of legacy audit logging. I think that is probably just adding a deprecation warning here for now, and make sure the removal plan is documented along with the legacy audit docs.

@tallclair
Copy link
Member

From sig-auth discussion:

Graduating to V1 should require:

  1. Restore audit in the scalability tests (already planned for 1.10)
  2. A story for how other components are expected to integrate into the audit system.

For (2), the idea is that going to V1 puts the burden of integration on new features (as opposed to audit developers), but first they need the necessary hooks to integrate. I think the annotations (#58143) is the last piece of this story.

W.r.t. multi-apiserver setups, I think that the 2 issues @sttts mentioned should not be blockers for audit going to v1. Unless we want to move the policy & configuration to a component config, and deprecate the flags approach. However, there is a bootstrapping problem that needs to be solved to configure audit logging through component config (so we might still want flags?).

@smurfralf
Copy link

Please also include fix for Issue #59331

@crassirostris
Copy link
Author

I added points about documentation & deprecation to the list of items to do

Fair point... do we have any idea how cluster size maps to the defaults though? I.e. at what point should the cluster admin expect to need to raise those defaults or tighten the policy?

It depends on the expected QPS to the apiserver, which may be pretty irregular depending on the cluster setup and workloads. It seems it would be useful to add a paragraph to the documentation that would explain how these values should be set

custom user-agent strings or a Component field that can either be kube-apiserver or e.g. service-catalog-apiserver. Or does this work through the kubeconfig passed to the webhook?

AFAIU, it's possible to pass user agent through kubeconfig: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/rest/config.go#L83. However, it seems this behavior should be better documented

W.r.t. multi-apiserver setups, I think that the 2 issues @sttts mentioned should not be blockers for audit going to v1

Agree. I mentioned this in the issue text. 2everyone: feel free to counter if you don't think this is the right approach

@tallclair
Copy link
Member

Is the intent still for this to reach stable in 1.10? Code freeze is coming up (02-26). There are a lot of issues still open (including 2 I just filed: #60108 and #60110).

At this point I'm inclined to suggest we bump GA out to 1.11 unless there are strong objections.

@tpepper
Copy link
Member

tpepper commented Feb 21, 2018

@crassirostris and @tallclair pending your decision on 1.10 or not please make sure to scrub the set of issues referenced off this one and either add the approved and approved-for-milestone labels or remove the v1.10 milestone target.

@crassirostris
Copy link
Author

I agree, it doesn't make sense to try ask reach stable in 1.10, too much work, too little done already

I created a new umbrella issue for audit logging in 1.11 and moved items we're not pursuing this release there

The rest I believe is doable, most of the things left are either in review or docs-related

@crassirostris
Copy link
Author

@CaoShuFeng @tallclair @sttts @ericchiang What's your take on the annotations effort? Do you think it makes sense to push for 1.10, or it's better to postpone it until 1.11?

@CaoShuFeng
Copy link
Contributor

@CaoShuFeng @tallclair @sttts @ericchiang What's your take on the annotations effort? Do you think it makes sense to push for 1.10, or it's better to postpone it until 1.11?

I am OK in either way.

@crassirostris
Copy link
Author

Thanks!

@CaoShuFeng I think #58143 defintely should go to 1.11, there's too much work left. I'm moving it to the 1.11 umbrella issue

@tallclair @sttts WDYT about #58807? Is it close enough to completeness that we can ask for a couple of days for an exception?

@crassirostris
Copy link
Author

It seems we don't have enough capacity to review #58807 and it has to be pushed to 1.11

@jberkus
Copy link

jberkus commented Feb 28, 2018

Should this issue be closed/moved to 1.11 then?

@k8s-github-robot k8s-github-robot removed this from the v1.10 milestone Mar 2, 2018
@jberkus
Copy link

jberkus commented Mar 2, 2018

oops, missing priority was my fault, moving back into milestone:

/priority critical-urgent
/remove-priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 2, 2018
@jberkus jberkus added this to the v1.10 milestone Mar 2, 2018
@liggitt
Copy link
Member

liggitt commented Mar 6, 2018

@tallclair has the work planned for 1.10 been completed at this point?

@tallclair
Copy link
Member

Remaining work is resolving the test failures, #60719

@crassirostris
Copy link
Author

@liggitt Besides test failures, docs only. I will create a PR shortly, my fault I didn't do this earlier

@jberkus
Copy link

jberkus commented Mar 8, 2018

Since the tests are now fixed, what remains with this tracker?

@crassirostris
Copy link
Author

@jberkus Only docs: kubernetes/website#7679

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Issue Needs Attention

@crassirostris @kubernetes/sig-api-machinery-misc @kubernetes/sig-auth-misc

Action required: During code freeze, issues in the milestone should be in progress.
If this issue is not being actively worked on, please remove it from the milestone.
If it is being worked on, please add the status/in-progress label so it can be tracked with other in-flight issues.

Action Required: This issue has not been updated since Mar 8. Please provide an update.

Note: This issue is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required
Issue Labels
  • sig/api-machinery sig/auth: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/feature: New functionality.
Help

@liggitt
Copy link
Member

liggitt commented Mar 13, 2018

doc PR is open and tagged for the 1.10 milestone. moving this issue to 1.11

@liggitt liggitt modified the milestones: v1.10, v1.11 Mar 13, 2018
@crassirostris
Copy link
Author

Well, there's a separate issue for 1.11, so to avoid constant pinging from the bot and considering that docs PR is almost ready, I'll move this issue back to 1.10 and close

@crassirostris crassirostris modified the milestones: v1.11, v1.10 Mar 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/audit kind/feature Categorizes issue or PR as related to a new feature. milestone/needs-attention priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Projects
None yet
Development

No branches or pull requests