New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous & unbatched audit log writes #67223

Merged
merged 1 commit into from Aug 16, 2018

Conversation

@tallclair
Member

tallclair commented Aug 10, 2018

What this PR does / why we need it:
When enabling buffered audit log file writes to reduce latency under high load, we shouldn't be batching the writes, as the large data write can have an inverse (though unpredictable) impact. Additionally, batched audit log writes should not be done asynchronously, as this just creates lock contention on the log writer.

This is a clean-ed up version of #61217

Which issue(s) this PR fixes
Fixes #61932

Release note:

Defaults for file audit logging backend in batch mode changed:
- Logs are written 1 at a time (no batching)
- Only a single writer process (lock contention)

/sig auth
/priority important-soon
/kind bug
/milestone v1.12

@tallclair

This comment has been minimized.

Show comment
Hide comment
@tallclair

tallclair Aug 10, 2018

Member

Fixed broken tests.

Member

tallclair commented Aug 10, 2018

Fixed broken tests.

@tallclair

This comment has been minimized.

Show comment
Hide comment
@tallclair

tallclair Aug 10, 2018

Member

This could be considered a breaking change as it changes some default flag values. However, I think we should consider it a bug fix since those values didn't make any sense. I regret that we even exposed those options as flags, but we're stuck with them now.

Member

tallclair commented Aug 10, 2018

This could be considered a breaking change as it changes some default flag values. However, I think we should consider it a bug fix since those values didn't make any sense. I regret that we even exposed those options as flags, but we're stuck with them now.

@liggitt

This comment has been minimized.

Show comment
Hide comment
@liggitt

liggitt Aug 10, 2018

Member

staging/src/k8s.io/apiserver/pkg/server/options/audit.go:48:16: "paramaters" is a misspelling of "parameters"

thanks, bots

Member

liggitt commented Aug 10, 2018

staging/src/k8s.io/apiserver/pkg/server/options/audit.go:48:16: "paramaters" is a misspelling of "parameters"

thanks, bots

@x13n

This comment has been minimized.

Show comment
Hide comment
@x13n

x13n Aug 13, 2018

Member

Copying my question from #61932:

I may be missing some context, but what's the point of having a synchronous buffer? Why is it better than having no buffer at all?

Member

x13n commented Aug 13, 2018

Copying my question from #61932:

I may be missing some context, but what's the point of having a synchronous buffer? Why is it better than having no buffer at all?

@tallclair

This comment has been minimized.

Show comment
Hide comment
@tallclair

tallclair Aug 13, 2018

Member

Copying my response from #61932 :)

The buffer is still async, but calling the delagate backend can be synchronous for the buffer worker. More specifically, this is the old behavior, with each line being a separate go-routine

  1. request handler: queue audit event in buffer
  2. buffer worker: unqueue audit events, batch them, pass batch to delegate go-routine
  3. delegate go-routine: process batched events

This makes sense for the webhook backend with a different routine to handle each audit request, but it doesn't make sense for the log backend where the different writing routines just create lock contention. With the new PR & AsyncDelegate = false, you get:

  1. request handler: queue audit event in buffer
  2. buffer worker: unqueue audit events, batch them, call delegate to process the events (in the same routine)

Make sense?

Member

tallclair commented Aug 13, 2018

Copying my response from #61932 :)

The buffer is still async, but calling the delagate backend can be synchronous for the buffer worker. More specifically, this is the old behavior, with each line being a separate go-routine

  1. request handler: queue audit event in buffer
  2. buffer worker: unqueue audit events, batch them, pass batch to delegate go-routine
  3. delegate go-routine: process batched events

This makes sense for the webhook backend with a different routine to handle each audit request, but it doesn't make sense for the log backend where the different writing routines just create lock contention. With the new PR & AsyncDelegate = false, you get:

  1. request handler: queue audit event in buffer
  2. buffer worker: unqueue audit events, batch them, call delegate to process the events (in the same routine)

Make sense?

@tallclair

This comment has been minimized.

Show comment
Hide comment
@tallclair

tallclair Aug 14, 2018

Member

/retest

Member

tallclair commented Aug 14, 2018

/retest

@x13n

This comment has been minimized.

Show comment
Hide comment
@x13n

x13n Aug 14, 2018

Member

Thanks! Looks good, please just squash the commits.

Member

x13n commented Aug 14, 2018

Thanks! Looks good, please just squash the commits.

@tallclair

This comment has been minimized.

Show comment
Hide comment
@tallclair

tallclair Aug 14, 2018

Member

Squashed.

/assign @sttts
for approval

Member

tallclair commented Aug 14, 2018

Squashed.

/assign @sttts
for approval

@x13n

This comment has been minimized.

Show comment
Hide comment
@x13n

x13n Aug 16, 2018

Member

/lgtm

Member

x13n commented Aug 16, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 16, 2018

@sttts

This comment has been minimized.

Show comment
Hide comment
@sttts

sttts Aug 16, 2018

Contributor

/retest
/approve

Contributor

sttts commented Aug 16, 2018

/retest
/approve

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Aug 16, 2018

Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts, tallclair, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Contributor

k8s-ci-robot commented Aug 16, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts, tallclair, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Aug 16, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot commented Aug 16, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Aug 16, 2018

Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

Contributor

k8s-merge-robot commented Aug 16, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Aug 16, 2018

Contributor

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

Contributor

k8s-merge-robot commented Aug 16, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 87e7b9f into kubernetes:master Aug 16, 2018

17 of 18 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation tallclair authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment