Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Audit Policy #71230

Closed
wants to merge 2 commits into from
Closed

Dynamic Audit Policy #71230

wants to merge 2 commits into from

Conversation

pbarker
Copy link
Contributor

@pbarker pbarker commented Nov 19, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adds policy rules to the AuditSink API object #70818

Special notes for your reviewer:
This PR is based off discussions with @lavalamp and @tallclair to make the audit policy more composable and readable for the API objects.

Does this PR introduce a user-facing change?:

Add AuditClass object that allows for fine grained filtering of audit events for AuditSink API objects

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 19, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @pbarker. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 19, 2018
@pbarker
Copy link
Contributor Author

pbarker commented Nov 19, 2018

/cc @tallclair @lavalamp @liggitt
/sig auth

@pbarker
Copy link
Contributor Author

pbarker commented Nov 19, 2018

/cc @liggitt

@yue9944882
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 20, 2018
@pbarker
Copy link
Contributor Author

pbarker commented Nov 20, 2018

/retest

1 similar comment
@pbarker
Copy link
Contributor Author

pbarker commented Nov 20, 2018

/retest

@WanLinghao
Copy link
Contributor

/cc

@WanLinghao
Copy link
Contributor

Hello, is there any reasons to complicate the struct with the existence of ClassRule, why don't we put AuditClass directly in Policy?

@pbarker
Copy link
Contributor Author

pbarker commented Nov 22, 2018

@WanLinghao the idea was to make it composable so that Sinks could share the classes, and an app developer could create a class for their application and have it packaged up with everything else. This was just the first direction we wanted to explore. I'm building a little CRD version now to test out these ideas that will serve as input, I'll ping you with the repo once is live

@WanLinghao
Copy link
Contributor

@pbarker thank you!

@lavalamp
Copy link
Member

lavalamp commented Apr 2, 2019

Just checking in--looks like it's not my turn yet. Please ping me when it is! :)

@WanLinghao
Copy link
Contributor

Hi everyone, is this patch ready to merge?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pbarker
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: lavalamp

If they are not already assigned, you can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// required
Policy Policy `json:"policy" protobuf:"bytes,1,opt,name=policy"`

// Webhook to send events
// `webhook` to send events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete sentence? e.g. webhook describes how to contact the sink.

// +optional
Stages []Stage `json:"stages" protobuf:"bytes,2,opt,name=stages"`

// `rules` define how `auditClass` objects should be handled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// `rules` define how `auditClass` objects should be handled.
// `rules` define how requests having a matching `auditClass` should be handled.

?

// +optional
Stages []Stage `json:"stages" protobuf:"bytes,2,opt,name=stages"`

// `rules` define how `auditClass` objects should be handled.
// A request may fall under multiple `auditClass` objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fall under/match/

?

type WebhookThrottleConfig struct {
// ThrottleQPS maximum number of batches per second
// `qps` maximum number of batches per second.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qps is the maximum sustained batches per second.

?

type Webhook struct {
// Throttle holds the options for throttling the webhook
// `throttle` holds the options for throttling the webhook.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we called it rate limits? This isn't throttling the webhook, it's throttling requests sent?

What happens if there are more events than can be fit in the given rate limits?

Users []string `json:"users,omitempty" protobuf:"bytes,2,rep,name=users"`
// The user groups in this attribute group. A user is considered matching
// if it is a member of any of the UserGroups.
// An empty list implies every user group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty behavior still seems quite wrong; if you leave users empty and set a group, every user is included due to users being empty.

I think you have to have empty lists mean the criteria is not used, i.e., selects nothing.


// RequestSelector selects requests by matching on the given fields. Selectors are
// used to compose `auditClass` objects.
type RequestSelector struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an "all must match" or an "any may match" list? I think it wants to be the latter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I read through to the bottom of this type, I think it still needs significant work :/

I think users need to be able to express {mist include one of [list], ignore/don't care, must not include one of [list]} for many or all of these clauses. Do we have a list of use cases we want to make easy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an "all must match" or an "any may match" list?

I think all fields in the selector must match, but within a field it's "any". For example, if I specify verbs=[get, create] and resources=[pods, nodes], then that would logically be (verb=get OR verb=create) AND (res=pods OR res=nodes)

I think users need to be able to express {mist include one of [list], ignore/don't care, must not include one of [list]} for many or all of these clauses.

For "must not include one of list", I think the way to express that in the current model would be to have a separate class for the excluded requests, and then handle it before the positive class in the policy rules. I think the common use case will be "I want to handle this type of request in this way, except ignore (don't log) these specific noisy requests".

Do we have a list of use cases we want to make easy?

I think the default audit policy is a good starting point: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L824-L950

UserGroups []string `json:"userGroups,omitempty" protobuf:"bytes,3,rep,name=userGroups"`

// `verbs` included in this selector.
// An empty list implies every verb.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need both IncludeVerbs and ExcludeVerbs :/

I think you may need this for users and groups, too :/


// Selectors can apply to API resources (such as "pods" or "secrets"),
// non-resource URL paths (such as "/api"), or neither, but not both.
// If neither is specified, the selector is treated as a default for all URLs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing field? Zombie comment?

// *s are allowed, but only as the full, final step in the path, and are delimited by the path separator
// Examples:
// "/metrics" - Log requests for apiserver metrics
// "/healthz/*" - Log all health checks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should make it easy to get all or none of these?

// Using this field requires resources to be specified.
// An empty list implies that every instance of the resource is matched.
// +optional
ObjectNames []string `json:"objectNames,omitempty" protobuf:"bytes,3,rep,name=objectNames"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ResourceNames to be consistent with RBAC, AdmissionWebhooks and the current AuditPolicy?

Then copying existing rules into classes can be done easily.

Users []string `json:"users,omitempty" protobuf:"bytes,2,rep,name=users"`
// The user groups in this attribute group. A user is considered matching
// if it is a member of any of the UserGroups.
// An empty list implies every user group.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp This is more or less copy of the current documentation and behavior of AuditPolicy.

// available options: None, Metadata, Request, RequestResponse
// required
Level Level `json:"level" protobuf:"bytes,1,opt,name=level"`

// Stages is a list of stages for which events are created.
// `stages` is a list of stages for which events are created.
// If no stages are given nothing will be logged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no stages have been specified, nothing will be logged even if the rules specify stages of their own?
Can we make any correlation more explicit if there is one?

Level Level `json:"level" protobuf:"bytes,2,opt,name=level"`

// `stages` is a list of stages for which events are created.
// If no stages are given nothing will be logged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this works as a union with the Policy stages as in the audit API or it completely overrides it? This should also be made explicit.

@liggitt
Copy link
Member

liggitt commented Jul 15, 2019

met with @lavalamp, @liggitt, @mvladev, @shturec, @tallclair: recording at https://zoom.us/recording/share/DJa-lhV7gX_3z5-H89LswcvUF6lePS4b6rslRNMLng2wIumekTziMw

next steps:
• work on expanding on the personas/use cases we want to cover from the KEP (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/0014-dynamic-audit-configuration.md#user-stories)
• enumerate a handful of very specific use cases ("as , with , configure an audit webhook to receive <specific requests/levels> for ")
• share that list of use cases with sig-auth/api-machinery (maybe kubernetes-dev / kubernetes-users) asking for feedback for any obviously missing and substantively different use cases
• for each use case, write out the API config required to accomplish it with the API as proposed in the current PR, and with some of the proposed API changes (audit classes, include/exclude lists, etc)
• evaluate the API shape that enables essential use cases in a natural way

the use cases and example API configs should fold back into the KEP, and eventually be usable in some for as end-user documentation of the feature (xref the sample admission webhook configurations referenced from https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 12, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@pbarker: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2019
@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liggitt liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label Feb 21, 2020
@xuchen-xiaoying
Copy link

xuchen-xiaoying commented Jun 18, 2022

met with @lavalamp, @liggitt, @mvladev, @shturec, @tallclair: recording at https://zoom.us/recording/share/DJa-lhV7gX_3z5-H89LswcvUF6lePS4b6rslRNMLng2wIumekTziMw

next steps: • work on expanding on the personas/use cases we want to cover from the KEP (https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/0014-dynamic-audit-configuration.md#user-stories) • enumerate a handful of very specific use cases ("as , with , configure an audit webhook to receive <specific requests/levels> for ") • share that list of use cases with sig-auth/api-machinery (maybe kubernetes-dev / kubernetes-users) asking for feedback for any obviously missing and substantively different use cases • for each use case, write out the API config required to accomplish it with the API as proposed in the current PR, and with some of the proposed API changes (audit classes, include/exclude lists, etc) • evaluate the API shape that enables essential use cases in a natural way

the use cases and example API configs should fold back into the KEP, and eventually be usable in some for as end-user documentation of the feature (xref the sample admission webhook configurations referenced from https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration)

@liggitt is this mr still valuable to move on? I have similar need to dynamic change audit policy with out restart apiserver.

@xuchen-xiaoying
Copy link

@pbarker I think this mr is greatly helpful, may I ask why was this mr closed?

@pbarker
Copy link
Contributor Author

pbarker commented Jun 20, 2022

Hey @xuchen-xiaoying the company I was working for when building this got acquired and this work was deprioritized, the community also didn't have enough interest to continue to push it forward. I think this is largely due to most k8s clusters being provisioned on a cloud provider, and the cloud providers having their own audit integrations built in.

@xuchen-xiaoying
Copy link

Hey @xuchen-xiaoying the company I was working for when building this got acquired and this work was deprioritized, the community also didn't have enough interest to continue to push it forward. I think this is largely due to most k8s clusters being provisioned on a cloud provider, and the cloud providers having their own audit integrations built in.

@pbarker really appreciate your reply and related enhancement, which I think is valuable and elegant for hot restart and dynamic configuration. much tanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.