Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous & unbatched audit log writes #67223

Merged
merged 1 commit into from
Aug 16, 2018

Conversation

tallclair
Copy link
Member

@tallclair tallclair commented Aug 10, 2018

What this PR does / why we need it:
When enabling buffered audit log file writes to reduce latency under high load, we shouldn't be batching the writes, as the large data write can have an inverse (though unpredictable) impact. Additionally, batched audit log writes should not be done asynchronously, as this just creates lock contention on the log writer.

This is a clean-ed up version of #61217

Which issue(s) this PR fixes
Fixes #61932

Release note:

Defaults for file audit logging backend in batch mode changed:
- Logs are written 1 at a time (no batching)
- Only a single writer process (lock contention)

/sig auth
/priority important-soon
/kind bug
/milestone v1.12

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Aug 10, 2018
@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Aug 10, 2018
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 10, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2018
@tallclair tallclair force-pushed the audit-log branch 2 times, most recently from b738900 to 1989ecb Compare August 10, 2018 01:44
@tallclair
Copy link
Member Author

Fixed broken tests.

@tallclair
Copy link
Member Author

This could be considered a breaking change as it changes some default flag values. However, I think we should consider it a bug fix since those values didn't make any sense. I regret that we even exposed those options as flags, but we're stuck with them now.

@liggitt
Copy link
Member

liggitt commented Aug 10, 2018

staging/src/k8s.io/apiserver/pkg/server/options/audit.go:48:16: "paramaters" is a misspelling of "parameters"

thanks, bots

infiniteTimeCh <-chan time.Time = make(chan time.Time)
closedTimeCh = func() <-chan time.Time {
infiniteTimeCh <-chan time.Time
closedTimeCh = func() <-chan time.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closedTimeCh is not used any more, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also inlined closedStopCh.

@x13n
Copy link
Member

x13n commented Aug 13, 2018

Copying my question from #61932:

I may be missing some context, but what's the point of having a synchronous buffer? Why is it better than having no buffer at all?

@tallclair
Copy link
Member Author

Copying my response from #61932 :)

The buffer is still async, but calling the delagate backend can be synchronous for the buffer worker. More specifically, this is the old behavior, with each line being a separate go-routine

  1. request handler: queue audit event in buffer
  2. buffer worker: unqueue audit events, batch them, pass batch to delegate go-routine
  3. delegate go-routine: process batched events

This makes sense for the webhook backend with a different routine to handle each audit request, but it doesn't make sense for the log backend where the different writing routines just create lock contention. With the new PR & AsyncDelegate = false, you get:

  1. request handler: queue audit event in buffer
  2. buffer worker: unqueue audit events, batch them, call delegate to process the events (in the same routine)

Make sense?

@tallclair
Copy link
Member Author

/retest

@x13n
Copy link
Member

x13n commented Aug 14, 2018

Thanks! Looks good, please just squash the commits.

@tallclair
Copy link
Member Author

Squashed.

/assign @sttts
for approval

@x13n
Copy link
Member

x13n commented Aug 16, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2018
@sttts
Copy link
Contributor

sttts commented Aug 16, 2018

/retest
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts, tallclair, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 87e7b9f into kubernetes:master Aug 16, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix batching log audit backend
9 participants