New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced Auditing 1.10 umbrella bug #58083
Comments
RE: integration with admission. I think it would be useful to see something similar for authn/z as well (i.e. which plugin authorized / authenticated the request, with the ability for the plugin to add arbitrary annotations (e.g. RBAC adds the role & binding used)). Not a requirement for 1.10 or GA, but nice to have. |
/remove-sig api-machinery |
Audit logging is a shared effort between sig-auth and sig-api-machinery |
In another thread, we were discussing audit logging in very high QPS clusters (e.g. very large clusters). That feels like something that should be addressed, or at least well defined (e.g. audit logging is only supported up to XX QPS). A suggestion I threw out is to implement graceful degradation of the audit infrastructure (probably in the open source). One way of doing this would be to introduce priority to the audit policy, and then when the buffers are full (or too many requests start erroring) drop the lower priority events before the higher priority ones. We already have a course priority defined through the metadata read/write and data read/write CAL classification. We could also define some hard audit QPS limits (although it might be possible by adjusting the current parameters), and start dropping lower priority events to stay under those QPS limits. I'm not sure the full priority mechanism should block GA, but I think we should have some more reliable or well-defined behavior in this area. |
It actually depends on the audit logging configuration. You can always support more QPS by bumping the audit logging throughput (e.g. the number of allowed QPS issued by webhook). It's a matter of policy in each cluster, not implementation per se |
Fair point... do we have any idea how cluster size maps to the defaults though? I.e. at what point should the cluster admin expect to need to raise those defaults or tighten the policy? |
Automatic merge from submit-queue (batch tested with PRs 53895, 58013, 58466, 58531, 58535). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. return reason for allowed rbac authorizations includes the binding, role, and subject that allowed a request so audit can make use of it xref #56209 #58083 ### example reasons > allowed by ClusterRoleBinding "system:controller:cronjob-controller" of ClusterRole "system:controller:cronjob-controller" to ServiceAccount "cronjob-controller/kube-system" > allowed by RoleBinding "bob-viewer/default" of ClusterRole "view" to User "bob" ### perf impact ```shell go test ./plugin/pkg/auth/authorizer/rbac/ -run foo -bench . -benchmem ``` on master: ``` BenchmarkAuthorize/allow_list_pods-8 500000 2674 ns/op 1632 B/op 27 allocs/op BenchmarkAuthorize/allow_update_pods/status-8 500000 2858 ns/op 1632 B/op 27 allocs/op BenchmarkAuthorize/forbid_educate_dolphins-8 500000 2654 ns/op 1632 B/op 27 allocs/op ``` with this PR: ``` BenchmarkAuthorize/allow_list_pods-8 500000 2697 ns/op 1664 B/op 28 allocs/op BenchmarkAuthorize/allow_update_pods/status-8 500000 2873 ns/op 1680 B/op 29 allocs/op BenchmarkAuthorize/forbid_educate_dolphins-8 500000 2687 ns/op 1664 B/op 28 allocs/op ``` ```release-note NONE ```
What is missing for As far as I see, we have the audit options in the generic apiserver library and have enabled it through
RecommendedOptions . So the basic support exists.
We do not have
/cc @deads2k |
With advanced audit logging going to GA, we should also begin the deprecation process of legacy audit logging. I think that is probably just adding a deprecation warning here for now, and make sure the removal plan is documented along with the legacy audit docs. |
From sig-auth discussion: Graduating to V1 should require:
For (2), the idea is that going to V1 puts the burden of integration on new features (as opposed to audit developers), but first they need the necessary hooks to integrate. I think the annotations (#58143) is the last piece of this story. W.r.t. multi-apiserver setups, I think that the 2 issues @sttts mentioned should not be blockers for audit going to v1. Unless we want to move the policy & configuration to a component config, and deprecate the flags approach. However, there is a bootstrapping problem that needs to be solved to configure audit logging through component config (so we might still want flags?). |
Please also include fix for Issue #59331 |
I added points about documentation & deprecation to the list of items to do
It depends on the expected QPS to the apiserver, which may be pretty irregular depending on the cluster setup and workloads. It seems it would be useful to add a paragraph to the documentation that would explain how these values should be set
AFAIU, it's possible to pass user agent through kubeconfig: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/rest/config.go#L83. However, it seems this behavior should be better documented
Agree. I mentioned this in the issue text. 2everyone: feel free to counter if you don't think this is the right approach |
@crassirostris and @tallclair pending your decision on 1.10 or not please make sure to scrub the set of issues referenced off this one and either add the approved and approved-for-milestone labels or remove the v1.10 milestone target. |
I agree, it doesn't make sense to try ask reach stable in 1.10, too much work, too little done already I created a new umbrella issue for audit logging in 1.11 and moved items we're not pursuing this release there The rest I believe is doable, most of the things left are either in review or docs-related |
@CaoShuFeng @tallclair @sttts @ericchiang What's your take on the annotations effort? Do you think it makes sense to push for 1.10, or it's better to postpone it until 1.11? |
I am OK in either way. |
Thanks! @CaoShuFeng I think #58143 defintely should go to 1.11, there's too much work left. I'm moving it to the 1.11 umbrella issue @tallclair @sttts WDYT about #58807? Is it close enough to completeness that we can ask for a couple of days for an exception? |
It seems we don't have enough capacity to review #58807 and it has to be pushed to 1.11 |
Should this issue be closed/moved to 1.11 then? |
oops, missing priority was my fault, moving back into milestone: /priority critical-urgent |
@tallclair has the work planned for 1.10 been completed at this point? |
Remaining work is resolving the test failures, #60719 |
@liggitt Besides test failures, docs only. I will create a PR shortly, my fault I didn't do this earlier |
Since the tests are now fixed, what remains with this tracker? |
@jberkus Only docs: kubernetes/website#7679 |
[MILESTONENOTIFIER] Milestone Issue Needs Attention @crassirostris @kubernetes/sig-api-machinery-misc @kubernetes/sig-auth-misc Action required: During code freeze, issues in the milestone should be in progress. Action Required: This issue has not been updated since Mar 8. Please provide an update. Note: This issue is marked as Example update:
Issue Labels
|
doc PR is open and tagged for the 1.10 milestone. moving this issue to 1.11 |
Well, there's a separate issue for 1.11, so to avoid constant pinging from the bot and considering that docs PR is almost ready, I'll move this issue back to 1.10 and close |
This is a continuation of the work on the Advanced Auditing feature
Umbrella issue for 1.11 release: #60392
API-related changes
Graduate API to stableAdd RBAC information to audit logs by @CaoShuFengAdd PodSecurityPolicy admission information to audit logsWhat about webhooks? We want to answer: Who injected this side-car container?Bugfixes and improvements
Let a user choose an output API versionDon't duplicate ResponseStatus: Don't duplicate status in audit events #60108[Optional] Invalid api group for delete operations[Optional] Always retry audit webhookPolicy changes
[Optional] GCE audit policy should be made re-usable by other setups[Optional] Audit policy should be tested (e.g. that it includes all core resources, more context)Documentation
@crassirostris is working on this in kubernetes/website#7679
To discuss
[Optional] Auditing federation setupsAuditing multi-apiserver setups/cc @sttts @soltysh @tallclair @ericchiang @CaoShuFeng @hzxuzhonghu @piosz
Feel free to add what's missing
The text was updated successfully, but these errors were encountered: