New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to k8s apiserver to reject incoming requests upon audit failure #65763

Merged
merged 1 commit into from Nov 17, 2018

Conversation

@x13n
Copy link
Member

x13n commented Jul 3, 2018

What this PR does / why we need it: Part of #65266

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer:

Release note:

apiserver can be configured to reject requests that cannot be audit-logged.

cc @loburm @tallclair @wojtek-t

@tallclair
Copy link
Member

tallclair left a comment

just a handful of nits, then LGTM.

func auditEventProcessingError(w http.ResponseWriter, req *http.Request) {
w.Header().Set("Content-Type", "text/plain")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.WriteHeader(http.StatusInsufficientStorage)

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

I don't think this is the right error code... maybe just return a 500?

This comment has been minimized.

@x13n

x13n Jul 4, 2018

Member

If this is a plain 500, then we won't be able to differentiate it from other errors. The one I used is from 5xx family, since it's a server side error. At the same time, it is more specific than "something broke server side" which is what 500 means. Maybe someone from apimachinery should take a look?

This comment has been minimized.

@tallclair

tallclair Jul 6, 2018

Member

Yeah, I was thinking maybe we don't want to distinguish it intentionally. Error messages like this can leak useful data to attackers, so distinguishing this case makes me a little nervous. Debug logs & metrics can provide information for those authorized, without clueing in the (possibly unprivileged) requester.

This comment has been minimized.

@x13n

x13n Jul 9, 2018

Member

That is a valid point. At the same time, using the same error code will limit the ability to monitor how often requests are rejected because of audit backend failures. We'll be able to see just how often they are rejected in general. Is it possible to have both security and observability?

This comment has been minimized.

@tallclair

tallclair Jul 9, 2018

Member

If you're talking about monitoring from the client side, then no, not without returning this information. But if you're talking about monitoring the server, then that's what prometheus metrics are for.

This comment has been minimized.

@x13n

x13n Jul 10, 2018

Member

I thought about existing prometheus metric that already exposes http response code, but you're right, this can be a separate metric exclusively for audit errors.

@@ -30,19 +30,19 @@ import (
)

// Avoid emitting errors that look like valid HTML. Quotes are okay.
var sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")
var Sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

Is this rendered correctly for plain content? I would think setting content-type + nosniff should be sufficient...

@@ -30,19 +30,19 @@ import (
)

// Avoid emitting errors that look like valid HTML. Quotes are okay.
var sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")
var Sanitizer = strings.NewReplacer(`&`, "&amp;", `<`, "&lt;", `>`, "&gt;")

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

Use html.EscapeString instead? https://golang.org/pkg/html/

w.Header().Set("Content-Type", "text/plain")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.WriteHeader(http.StatusInsufficientStorage)
fmt.Fprintf(w, "Server couldn't store an audit event for request: %q", responsewriters.Sanitizer.Replace(req.RequestURI))

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

Maybe just use responsewriters.InternalErorr?

}
}
}()
return sendErr == nil

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

alternatively, just use a named return value. I'll let you decide which is cleaner.

This comment has been minimized.

@x13n

x13n Jul 6, 2018

Member

Actually, buffer will not be used when blocking-strict is selected, changed to true to keep existing behavior.

@@ -25,7 +25,8 @@ type Sink interface {
// Errors might be logged by the sink itself. If an error should be fatal, leading to an internal
// error, ProcessEvents is supposed to panic. The event must not be mutated and is reused by the caller
// after the call returns, i.e. the sink has to make a deepcopy to keep a copy around if necessary.
ProcessEvents(events ...*auditinternal.Event)
// Returns true on success, may return false on error.
ProcessEvents(events ...*auditinternal.Event) bool

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

hmm, I was surprised you didn't just return the error, but I guess since we're handling the errors in-place it makes sense.

for _, ev := range events {
b.logEvent(ev)
if !b.logEvent(ev) {
success = false

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

alternatively: success = success && b.logEvent(ev)

This comment has been minimized.

@x13n

x13n Jul 4, 2018

Member

This would cause events to stop being logged after first failure.

This comment has been minimized.

@x13n

x13n Jul 4, 2018

Member

success = b.logEvent(ev) && success would work though.

This comment has been minimized.

@tallclair

tallclair Jul 6, 2018

Member

oh yeah, good call.


type safeBackend struct {
// The delegate backend that actually exports events.
delegateBackend audit.Backend

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

nit: just embed it, then you only need to define ProcessEvents.

limitations under the License.
*/

package safe

This comment has been minimized.

@tallclair

tallclair Jul 3, 2018

Member

A dedicated package feels a bit heavy weight for this.... maybe just embed it in options? (should only need ~6 lines)

This comment has been minimized.

@x13n

x13n Jul 4, 2018

Member

What do you mean by embedding this in options? Wouldn't it require handling this flag in every backend?

This comment has been minimized.

@tallclair

tallclair Jul 6, 2018

Member

I meant just append https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/options/audit.go with:

type safeBackend struct {
	audit.Backend
}

func (s *safeBackend) ProcessEvents(ev ...*auditinternal.Event) bool {
	s.Backend.ProcessEvents(ev...)
	return true
}

Since it's only used in that file anyway.

This comment has been minimized.

@x13n

x13n Jul 9, 2018

Member

Geez, you're right, that makes way more sense. Thanks, updated.

@@ -25,7 +25,8 @@ type Sink interface {
// Errors might be logged by the sink itself. If an error should be fatal, leading to an internal
// error, ProcessEvents is supposed to panic. The event must not be mutated and is reused by the caller
// after the call returns, i.e. the sink has to make a deepcopy to keep a copy around if necessary.
ProcessEvents(events ...*auditinternal.Event)
// Returns true on success, may return false on error.
ProcessEvents(events ...*auditinternal.Event) bool

This comment has been minimized.

@hzxuzhonghu

hzxuzhonghu Jul 4, 2018

Member

How about the buffered webhook audit? Does this function always return true in this scenario?

}
}
}()
return sendErr == nil

This comment has been minimized.

@hzxuzhonghu

hzxuzhonghu Jul 4, 2018

Member

Saw it, return false when the buffer is full.

This comment has been minimized.

@x13n

x13n Jul 6, 2018

Member

Actually, changed to true since buffered backend will not be used when blocking-strict is selected.

@@ -56,7 +56,11 @@ func WithAudit(handler http.Handler, sink audit.Sink, policy policy.Checker, lon
}

ev.Stage = auditinternal.StageRequestReceived
processAuditEvent(sink, ev, omitStages)
if !processAuditEvent(sink, ev, omitStages) {

This comment has been minimized.

@hzxuzhonghu

hzxuzhonghu Jul 4, 2018

Member

Only reject request on stage StageRequestReceived, It is not consistent ignoring other stags audit failure.

This comment has been minimized.

@x13n

x13n Jul 4, 2018

Member

I doesn't make sense to reject request at later stages (at least for mutating requests), as the request will already have been processed at this point.

This comment has been minimized.

@hzxuzhonghu

hzxuzhonghu Jul 5, 2018

Member

Yes,it has been processed. But also without audit logs of these stages. So what i want to say is this pr does not handle lost audit events completely.

This comment has been minimized.

@x13n

x13n Jul 5, 2018

Member

With my change, an auditor can get a guarantee that if there was no event with RequestReceived stage, there was no change to apiserver state and the caller will have a guarantee that if the request failed, the state wasn't changed.

With what you are suggesting, auditor will still get the guarantee that no event implies no change, but you will take away that guarantee from the caller. When request fails, you don't really know what happened.

One change I think might make sense here would be to reject non-mutating requests upon failures on other stages. WDYT?

This comment has been minimized.

@hzxuzhonghu

hzxuzhonghu Jul 5, 2018

Member

Yeah, this is kind of enhancement whatever .

This comment has been minimized.

@x13n

x13n Jul 6, 2018

Member

After some thought I don't like the idea of rejecting requests inconsistently - with audit policy configured to omit RequestReceived stage this would mean that audit backend issue will block all non-mutating requests, but allow mutating ones.

This comment has been minimized.

@tallclair

tallclair Jul 12, 2018

Member

@x13n I don't understand your response to this - The code that is here only rejects requests when the RequestReceived event cannot be written. I think this is the correct behavior, but I'm not sure it matches what you said above?

This comment has been minimized.

@x13n

x13n Jul 12, 2018

Member

Apologies for being unclear. Let me try to clarify what I meant above.

I think this check should be applied only at RequestReceived stage. Later on, a failure to audit log anything shouldn't affect apiserver response, because the request was already processed, possibly modifying etcd state. Now, one could argue that there is no harm in rejecting the request if it was non-mutating, but failed to be audit logged e.g. at ResponseCompleted stage. However, this would lead to some weirdness, when RequestReceived stage is omitted and audit backend has problems: mutating requests would ignore audit logging failures and simply work, while read only requests would be rejected because of audit backend problems.

@k8s-ci-robot k8s-ci-robot added size/M and removed size/L labels Jul 6, 2018

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Jul 6, 2018

/retest

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Jul 6, 2018

@x13n x13n force-pushed the x13n:audit-logging branch from aff6e60 to adeb800 Jul 6, 2018

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Jul 6, 2018

/retest

@x13n x13n force-pushed the x13n:audit-logging branch from 99a7d44 to 5b85c1e Jul 9, 2018

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Jul 10, 2018

/assign @deads2k

@tallclair
Copy link
Member

tallclair left a comment

lgtm, please squash commits

@@ -56,7 +56,11 @@ func WithAudit(handler http.Handler, sink audit.Sink, policy policy.Checker, lon
}

ev.Stage = auditinternal.StageRequestReceived
processAuditEvent(sink, ev, omitStages)
if !processAuditEvent(sink, ev, omitStages) {

This comment has been minimized.

@tallclair

tallclair Jul 12, 2018

Member

@x13n I don't understand your response to this - The code that is here only rejects requests when the RequestReceived event cannot be written. I think this is the correct behavior, but I'm not sure it matches what you said above?

@x13n x13n force-pushed the x13n:audit-logging branch from 464c8c0 to 71768e0 Jul 12, 2018

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Jul 12, 2018

Thanks! Squashed.

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Jul 18, 2018

Ping. Is there any additional work needed in this PR?

@tallclair

This comment has been minimized.

Copy link
Member

tallclair commented Jul 18, 2018

/lgtm

@@ -327,10 +333,26 @@ func (o *AuditBatchOptions) AddFlags(pluginName string, fs *pflag.FlagSet) {
"moment if ThrottleQPS was not utilized before. Only used in batch mode.")
}

type safeBackend struct {
audit.Backend
}

This comment has been minimized.

@CaoShuFeng

CaoShuFeng Aug 21, 2018

Member

Looks like type safeBackend should be put in package pluginbuffered

This comment has been minimized.

@x13n

x13n Aug 21, 2018

Member

I initially had a separate package for this, but @tallclair convinced me this makes more sense here, as it is quite trivial. Since nothing in safeBackend is pluginbuffered-specific, I'd rather leave it here.

This comment has been minimized.

@CaoShuFeng

CaoShuFeng Aug 21, 2018

Member

Sure. Thanks.

@@ -57,6 +57,8 @@ type backend struct {

// Encoder used to calculate audit event sizes.
e runtime.Encoder

// Value returned from ProcessEvents when there was any failure.

This comment has been minimized.

@CaoShuFeng

CaoShuFeng Aug 21, 2018

Member

What is this?

This comment has been minimized.

@x13n

x13n Aug 21, 2018

Member

Good catch, some leftover. Removed now.

@x13n x13n force-pushed the x13n:audit-logging branch from 71768e0 to d6fbdf5 Aug 21, 2018

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 14, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

11 similar comments
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 14, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 14, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 14, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 14, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 15, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Nov 15, 2018

/hold

I need to fix some merge conflicts and failing tests...

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 16, 2018

New changes are detected. LGTM label has been removed.

@x13n x13n force-pushed the x13n:audit-logging branch from 79a0da2 to 7a10f4e Nov 16, 2018

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Nov 16, 2018

/retest

@x13n

This comment has been minimized.

Copy link
Member

x13n commented Nov 16, 2018

@sttts I rebased this and fixed failing tests. PTAL.
/hold cancel

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 17, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Nov 17, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 46ebebc into kubernetes:master Nov 17, 2018

18 checks passed

cla/linuxfoundation x13n authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment