Add option to k8s apiserver to reject incoming requests upon audit failure #65763

x13n · 2018-07-03T12:59:26Z

What this PR does / why we need it: Part of #65266

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer:

Release note:

apiserver can be configured to reject requests that cannot be audit-logged.

cc @loburm @tallclair @wojtek-t

tallclair

just a handful of nits, then LGTM.

tallclair · 2018-07-03T20:58:45Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/audit.go

+func auditEventProcessingError(w http.ResponseWriter, req *http.Request) {
+	w.Header().Set("Content-Type", "text/plain")
+	w.Header().Set("X-Content-Type-Options", "nosniff")
+	w.WriteHeader(http.StatusInsufficientStorage)


I don't think this is the right error code... maybe just return a 500?

If this is a plain 500, then we won't be able to differentiate it from other errors. The one I used is from 5xx family, since it's a server side error. At the same time, it is more specific than "something broke server side" which is what 500 means. Maybe someone from apimachinery should take a look?

Yeah, I was thinking maybe we don't want to distinguish it intentionally. Error messages like this can leak useful data to attackers, so distinguishing this case makes me a little nervous. Debug logs & metrics can provide information for those authorized, without clueing in the (possibly unprivileged) requester.

That is a valid point. At the same time, using the same error code will limit the ability to monitor how often requests are rejected because of audit backend failures. We'll be able to see just how often they are rejected in general. Is it possible to have both security and observability?

If you're talking about monitoring from the client side, then no, not without returning this information. But if you're talking about monitoring the server, then that's what prometheus metrics are for.

I thought about existing prometheus metric that already exposes http response code, but you're right, this can be a separate metric exclusively for audit errors.

tallclair · 2018-07-03T21:00:58Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/errors.go

@@ -30,19 +30,19 @@ import (
 )

 // Avoid emitting errors that look like valid HTML. Quotes are okay.
-var sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")
+var Sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")


Is this rendered correctly for plain content? I would think setting content-type + nosniff should be sufficient...

tallclair · 2018-07-03T21:02:06Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/errors.go

@@ -30,19 +30,19 @@ import (
 )

 // Avoid emitting errors that look like valid HTML. Quotes are okay.
-var sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")
+var Sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")


Use html.EscapeString instead? https://golang.org/pkg/html/

tallclair · 2018-07-03T21:12:54Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/audit.go

+	w.Header().Set("Content-Type", "text/plain")
+	w.Header().Set("X-Content-Type-Options", "nosniff")
+	w.WriteHeader(http.StatusInsufficientStorage)
+	fmt.Fprintf(w, "Server couldn't store an audit event for request: %q", responsewriters.Sanitizer.Replace(req.RequestURI))


Maybe just use responsewriters.InternalErorr?

tallclair · 2018-07-03T21:16:10Z

staging/src/k8s.io/apiserver/plugin/pkg/audit/buffered/buffered.go

 		}
-	}
+	}()
+	return sendErr == nil


alternatively, just use a named return value. I'll let you decide which is cleaner.

Actually, buffer will not be used when blocking-strict is selected, changed to true to keep existing behavior.

tallclair · 2018-07-03T21:18:15Z

staging/src/k8s.io/apiserver/pkg/audit/types.go

@@ -25,7 +25,8 @@ type Sink interface {
 	// Errors might be logged by the sink itself. If an error should be fatal, leading to an internal
 	// error, ProcessEvents is supposed to panic. The event must not be mutated and is reused by the caller
 	// after the call returns, i.e. the sink has to make a deepcopy to keep a copy around if necessary.
-	ProcessEvents(events ...*auditinternal.Event)
+	// Returns true on success, may return false on error.
+	ProcessEvents(events ...*auditinternal.Event) bool


hmm, I was surprised you didn't just return the error, but I guess since we're handling the errors in-place it makes sense.

tallclair · 2018-07-03T21:21:18Z

staging/src/k8s.io/apiserver/plugin/pkg/audit/log/backend.go

 	for _, ev := range events {
-		b.logEvent(ev)
+		if !b.logEvent(ev) {
+			success = false


alternatively: success = success && b.logEvent(ev)

This would cause events to stop being logged after first failure.

success = b.logEvent(ev) && success would work though.

oh yeah, good call.

tallclair · 2018-07-03T21:23:18Z

staging/src/k8s.io/apiserver/plugin/pkg/audit/safe/safe.go

+
+type safeBackend struct {
+	// The delegate backend that actually exports events.
+	delegateBackend audit.Backend


nit: just embed it, then you only need to define ProcessEvents.

tallclair · 2018-07-03T21:24:10Z

staging/src/k8s.io/apiserver/plugin/pkg/audit/safe/safe.go

+limitations under the License.
+*/
+
+package safe


A dedicated package feels a bit heavy weight for this.... maybe just embed it in options? (should only need ~6 lines)

What do you mean by embedding this in options? Wouldn't it require handling this flag in every backend?

I meant just append https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/options/audit.go with:

type safeBackend struct { audit.Backend } func (s *safeBackend) ProcessEvents(ev ...*auditinternal.Event) bool { s.Backend.ProcessEvents(ev...) return true }

Since it's only used in that file anyway.

Geez, you're right, that makes way more sense. Thanks, updated.

hzxuzhonghu · 2018-07-04T03:53:41Z

staging/src/k8s.io/apiserver/pkg/audit/types.go

@@ -25,7 +25,8 @@ type Sink interface {
 	// Errors might be logged by the sink itself. If an error should be fatal, leading to an internal
 	// error, ProcessEvents is supposed to panic. The event must not be mutated and is reused by the caller
 	// after the call returns, i.e. the sink has to make a deepcopy to keep a copy around if necessary.
-	ProcessEvents(events ...*auditinternal.Event)
+	// Returns true on success, may return false on error.
+	ProcessEvents(events ...*auditinternal.Event) bool


How about the buffered webhook audit? Does this function always return true in this scenario?

hzxuzhonghu · 2018-07-04T03:55:50Z

staging/src/k8s.io/apiserver/plugin/pkg/audit/buffered/buffered.go

 		}
-	}
+	}()
+	return sendErr == nil


Saw it, return false when the buffer is full.

Actually, changed to true since buffered backend will not be used when blocking-strict is selected.

hzxuzhonghu · 2018-07-04T04:05:34Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/audit.go

@@ -56,7 +56,11 @@ func WithAudit(handler http.Handler, sink audit.Sink, policy policy.Checker, lon
 		}

 		ev.Stage = auditinternal.StageRequestReceived
-		processAuditEvent(sink, ev, omitStages)
+		if !processAuditEvent(sink, ev, omitStages) {


Only reject request on stage StageRequestReceived, It is not consistent ignoring other stags audit failure.

I doesn't make sense to reject request at later stages (at least for mutating requests), as the request will already have been processed at this point.

Yes，it has been processed. But also without audit logs of these stages. So what i want to say is this pr does not handle lost audit events completely.

With my change, an auditor can get a guarantee that if there was no event with RequestReceived stage, there was no change to apiserver state and the caller will have a guarantee that if the request failed, the state wasn't changed.

With what you are suggesting, auditor will still get the guarantee that no event implies no change, but you will take away that guarantee from the caller. When request fails, you don't really know what happened.

One change I think might make sense here would be to reject non-mutating requests upon failures on other stages. WDYT?

Yeah, this is kind of enhancement whatever .

After some thought I don't like the idea of rejecting requests inconsistently - with audit policy configured to omit RequestReceived stage this would mean that audit backend issue will block all non-mutating requests, but allow mutating ones.

@x13n I don't understand your response to this - The code that is here only rejects requests when the RequestReceived event cannot be written. I think this is the correct behavior, but I'm not sure it matches what you said above?

Apologies for being unclear. Let me try to clarify what I meant above.

I think this check should be applied only at RequestReceived stage. Later on, a failure to audit log anything shouldn't affect apiserver response, because the request was already processed, possibly modifying etcd state. Now, one could argue that there is no harm in rejecting the request if it was non-mutating, but failed to be audit logged e.g. at ResponseCompleted stage. However, this would lead to some weirdness, when RequestReceived stage is omitted and audit backend has problems: mutating requests would ignore audit logging failures and simply work, while read only requests would be rejected because of audit backend problems.

x13n · 2018-07-06T10:12:18Z

/retest

x13n · 2018-07-06T14:25:06Z

/retest

x13n · 2018-07-10T11:39:28Z

/assign @deads2k

tallclair

lgtm, please squash commits

tallclair · 2018-07-12T01:27:02Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/audit.go

@@ -56,7 +56,11 @@ func WithAudit(handler http.Handler, sink audit.Sink, policy policy.Checker, lon
 		}

 		ev.Stage = auditinternal.StageRequestReceived
-		processAuditEvent(sink, ev, omitStages)
+		if !processAuditEvent(sink, ev, omitStages) {


@x13n I don't understand your response to this - The code that is here only rejects requests when the RequestReceived event cannot be written. I think this is the correct behavior, but I'm not sure it matches what you said above?

x13n · 2018-07-12T08:11:33Z

Thanks! Squashed.

x13n · 2018-07-18T14:19:30Z

Ping. Is there any additional work needed in this PR?

tallclair · 2018-07-18T18:15:20Z

/lgtm

CaoShuFeng · 2018-08-21T03:38:04Z

staging/src/k8s.io/apiserver/pkg/server/options/audit.go

@@ -327,10 +333,26 @@ func (o *AuditBatchOptions) AddFlags(pluginName string, fs *pflag.FlagSet) {
 			"moment if ThrottleQPS was not utilized before. Only used in batch mode.")
 }

+type safeBackend struct {
+	audit.Backend
+}


Looks like type safeBackend should be put in package pluginbuffered

I initially had a separate package for this, but @tallclair convinced me this makes more sense here, as it is quite trivial. Since nothing in safeBackend is pluginbuffered-specific, I'd rather leave it here.

Sure. Thanks.

CaoShuFeng · 2018-08-21T03:44:40Z

staging/src/k8s.io/apiserver/plugin/pkg/audit/truncate/truncate.go

@@ -57,6 +57,8 @@ type backend struct {

 	// Encoder used to calculate audit event sizes.
 	e runtime.Encoder
+
+	// Value returned from ProcessEvents when there was any failure.


What is this?

Good catch, some leftover. Removed now.

fejta-bot · 2018-11-14T11:44:36Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-14T13:50:35Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-14T15:56:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-14T18:02:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-14T20:08:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-14T22:14:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T00:20:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T02:26:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T04:53:08Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T06:59:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T09:05:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T11:32:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-15T13:38:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

x13n · 2018-11-15T15:18:30Z

/hold

I need to fix some merge conflicts and failing tests...

…ilure

k8s-ci-robot · 2018-11-16T09:33:16Z

New changes are detected. LGTM label has been removed.

x13n · 2018-11-16T11:11:44Z

/retest

x13n · 2018-11-16T12:35:01Z

@sttts I rebased this and fixed failing tests. PTAL.
/hold cancel

fejta-bot · 2018-11-17T07:58:54Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-11-17T12:10:54Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 3, 2018

k8s-ci-robot requested review from enj and soltysh July 3, 2018 12:59

tallclair reviewed Jul 3, 2018

View reviewed changes

hzxuzhonghu reviewed Jul 4, 2018

View reviewed changes

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 6, 2018

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 6, 2018

x13n force-pushed the audit-logging branch from aff6e60 to adeb800 Compare July 6, 2018 12:50

x13n force-pushed the audit-logging branch from 99a7d44 to 5b85c1e Compare July 9, 2018 09:45

k8s-ci-robot assigned deads2k Jul 10, 2018

tallclair reviewed Jul 12, 2018

View reviewed changes

x13n force-pushed the audit-logging branch from 464c8c0 to 71768e0 Compare July 12, 2018 08:11

k8s-ci-robot assigned tallclair Jul 18, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2018

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2018

CaoShuFeng reviewed Aug 21, 2018

View reviewed changes

x13n force-pushed the audit-logging branch from 71768e0 to d6fbdf5 Compare August 21, 2018 08:06

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 15, 2018

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2018

Add option to k8s apiserver to reject incoming requests upon audit fa…

7a10f4e

…ilure

x13n force-pushed the audit-logging branch from 79a0da2 to 7a10f4e Compare November 16, 2018 09:33

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 16, 2018

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2018

mikedanese added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Nov 17, 2018

k8s-ci-robot merged commit 46ebebc into kubernetes:master Nov 17, 2018

Add option to k8s apiserver to reject incoming requests upon audit failure #65763

Add option to k8s apiserver to reject incoming requests upon audit failure #65763

Conversation

x13n commented Jul 3, 2018

tallclair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

x13n commented Jul 6, 2018

x13n commented Jul 6, 2018

x13n commented Jul 10, 2018

tallclair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

x13n commented Jul 12, 2018

x13n commented Jul 18, 2018

tallclair commented Jul 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CaoShuFeng Aug 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Nov 14, 2018

fejta-bot commented Nov 14, 2018

fejta-bot commented Nov 14, 2018

fejta-bot commented Nov 14, 2018

fejta-bot commented Nov 14, 2018

fejta-bot commented Nov 14, 2018

fejta-bot commented Nov 15, 2018

fejta-bot commented Nov 15, 2018

fejta-bot commented Nov 15, 2018

fejta-bot commented Nov 15, 2018

fejta-bot commented Nov 15, 2018

fejta-bot commented Nov 15, 2018

fejta-bot commented Nov 15, 2018

x13n commented Nov 15, 2018

k8s-ci-robot commented Nov 16, 2018

x13n commented Nov 16, 2018

x13n commented Nov 16, 2018

fejta-bot commented Nov 17, 2018

fejta-bot commented Nov 17, 2018

CaoShuFeng Aug 21, 2018 •

edited