WIP: Strict schema validation for POST and PUT #104433

kevindelgado · 2021-08-18T13:53:11Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Performs strict server side schema validation for POST and PUT requests via the validate=strict query parameter.

Step one of deprecating the very painful client side validation.

Which issue(s) this PR fixes:

First step in solving #39434 and #5889

Does this PR introduce a user-facing change?

Performs strict server side schema validation for POST and PUT requests via the `validate=strict` query parameter.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

design doc
KEP-2885

[design doc](https://docs.google.com/document/d/1MmEkA2xEr_vOJ9SrpWu92rAzvaxEY47PHeRKcfO86mc/edit?usp=sharing)
[KEP-2885](https://github.com/kubernetes/enhancements/issues/2885)

kevindelgado · 2021-08-18T13:53:38Z

/sig api-machinery

kevindelgado

Most open questions are in line as TODOs. Still working on the KEP, but just thought it would be easier to ask some of the questions I have via code rather than through a doc.

PTAL @apelisse @jpbetz

cc @lavalamp (I still owe you a benchmark of the performance of strict vs non-strict decoding that I will run shortly)

kevindelgado · 2021-08-18T14:05:07Z

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/codec_factory.go

@@ -69,11 +70,19 @@ func newSerializersForScheme(scheme *runtime.Scheme, mf json.MetaFactory, option
 			json.SerializerOptions{Yaml: false, Pretty: true, Strict: options.Strict},
 		)
 	}
-


Something about this function and the slice of serializerTypes it returns doesn't sit well with me.

We don't have a serializer for pretty && strict, every option we add is going to balloon the number of serializerTypes exponentially.

We don't seem to use options.Strict and options.Pretty consistently. We have a separate serializer for pretty, but just pass the strict options through.

Having both a serializer that is always strict and a serializer that is variably strict based on options smells repugnant.

I have a hunch that we could do something about the ballooning constructors in negotiation/negotiate.go that would make our lives easier here, but I haven't fully thought it through yet.

That being said, rewriting all the negotiation constructors might be too much yak shaving for the validation task at hand, let me know if there's a simpler way I'm not thinking of.

This is (should) only be used for DEserialization, it may seem less bad if you name it accordingly? Pretty only makes sense when writing and strict only makes sense when reading.

... the next time we add something we'll probably have this problem though.

kevindelgado · 2021-08-18T20:11:30Z

Added some benchmarking per our discussion from the sig/apimachinery meeting @lavalamp.

Let me know if you think there’s any issues with the methodology (I’ve noted that we’re only benchmarking one simple resource).

The point of the benchmarking was to determine whether we want to do strict validation only when requested, or if we should do it on every request and warn instead of error when the user doesn’t want strict validation. What do you think now given these results?

❯ benchstat bench_non_strict.txt bench_strict.txt
name                        old time/op    new time/op    delta
StrictValidationSimples-16     853µs ±10%     945µs ± 5%     ~     (p=0.056 n=5+5)

name                        old alloc/op   new alloc/op   delta
StrictValidationSimples-16    74.8kB ± 1%    94.6kB ± 1%  +26.51%  (p=0.008 n=5+5)

name                        old allocs/op  new allocs/op  delta
StrictValidationSimples-16       459 ± 0%       596 ± 0%  +29.79%  (p=0.016 n=5+4)

apelisse · 2021-08-18T20:17:07Z

You can possibly use benchstat to display these results, and you probably should include memory usage too

kevindelgado · 2021-08-18T20:41:40Z

You can possibly use benchstat to display these results, and you probably should include memory usage too

Done, updated the above with benchstat output

apelisse · 2021-08-18T20:46:43Z

If we wanted to have these as warnings, we would have to run twice then. We would have to decide in what order (probably strict first and then non-strict, since strict shouldn't fail in most cases). The nominal case would then be to do it mostly once with strict.

Something we could do is to try to run this against a huge cluster and see if it has any impact on latency.

fedebongio · 2021-08-19T20:04:53Z

/assign @apelisse
/triage accepted

smarterclayton · 2021-09-03T17:37:21Z

I don't understand why strict serializer is so expensive. While 20-30% is acceptable for opt in, the actual checks should be amortizable (we already have to iterate every field before assignment). Naively I would expect that strict serialization should be capable of being free (<3-5%)

lavalamp · 2021-09-07T21:55:05Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

+// TODO: currently strict validation is set via
+// query param. Alternatively we could use a request header.
+// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)
+// TODO: what do we want the query param to actually be (I went with ?validate=strict but am open to whatever)


?validation=strict

Changed to ?fieldValidation=Strict based on comment from KEP. Param name not set in stone yet though, if you or anyone else has a better suggestion.

lavalamp · 2021-09-07T21:55:32Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
 	return len(url.Query()["dryRun"]) != 0
 }

+// TODO: currently strict validation is set via
+// query param. Alternatively we could use a request header.
+// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)


If it doesn't work, it needs to return an error.

First pass at doing this. It's a little ugly, so I will take another pass writing it more cleanly once I add further testing for this.

lavalamp · 2021-09-07T21:55:59Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
 	return len(url.Query()["dryRun"]) != 0
 }

+// TODO: currently strict validation is set via
+// query param. Alternatively we could use a request header.


Query param is good (easier to use from some languages).

Agreed. Removed the TODO about this.

lavalamp · 2021-09-07T22:01:23Z

I don't understand why strict serializer is so expensive

Judging by sub-ms times, I think "simples" are very tiny and any constant overhead is going to be pronounced. Can we try with bigger objects?

k8s-ci-robot · 2021-09-14T20:58:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kevindelgado
To complete the pull request process, please assign liggitt after the PR has been reviewed.
You can assign the PR to them by writing /assign @liggitt in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kevindelgado · 2021-09-14T21:01:35Z

I don't understand why strict serializer is so expensive

Judging by sub-ms times, I think "simples" are very tiny and any constant overhead is going to be pronounced. Can we try with bigger objects?

@lavalamp This has sent me down a bit of a rabbit hole, so I wanted to check in before I go too deep.

I’m trying to benchmark objects bigger than simples. The place that I was benchmarking the simples apiserver/pkg/endpoints/apiserver_test.go, doesn’t seem like an appropriate place to benchmark bigger objects, because you have to manually add the rest.Storage map to the test server (and just from a consistency perspective, nothing else in this package uses any objects beyond simples).

My approach has instead been to add a benchmark to the integration test in test/integration/apiserver/apiserver_test.gosince this gives us access to the full suite of resources.

This has given promising results with strict validation being ~11% slower and ~25% more memory.

❯ benchstat benchvalidation-10x-nonstrict.log benchvalidation-10x-strict.log
name                                  old time/op    new time/op    delta
FieldValidation/strict-deployment-16    5.18ms ± 5%    5.73ms ± 9%  +10.76%  (p=0.001 n=9+9)
 
name                                  old alloc/op   new alloc/op   delta
FieldValidation/strict-deployment-16     264kB ± 0%     323kB ± 1%  +22.47%  (p=0.000 n=10+10)
 
name                                  old allocs/op  new allocs/op  delta
FieldValidation/strict-deployment-16     3.23k ± 0%     4.07k ± 0%  +26.04%  (p=0.000 n=10+10)

Is this a better place to do the benchmarking? Are there other/better places to add benchmarking? I also considered adding benchmarking to the low level serializer/json package but that has its own issues (doesn’t seem like it would benchmark anything that jsoniter wouldn’t already do, doesn’t actually capture the effect on the whole API request, etc).

If this is a good place to put it, I can go ahead and add even bigger objects to the testing table so that we can see the effect of object size on slowness of strict validation. (Aside: do you know where I can find objects of various sizes in raw json/yaml that I can test here, otherwise I can hand craft them myself, but wanted to check first?).

kevindelgado

Addressed remaining feedback. Still peppered with TODOs, but wanted to start conversation on proper benchmarking before getting too far into the review.

kevindelgado · 2021-09-15T00:31:52Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
 	return len(url.Query()["dryRun"]) != 0
 }

+// TODO: currently strict validation is set via
+// query param. Alternatively we could use a request header.


Agreed. Removed the TODO about this.

kevindelgado · 2021-09-15T00:32:19Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
 	return len(url.Query()["dryRun"]) != 0
 }

+// TODO: currently strict validation is set via
+// query param. Alternatively we could use a request header.
+// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)


First pass at doing this. It's a little ugly, so I will take another pass writing it more cleanly once I add further testing for this.

kevindelgado · 2021-09-15T00:32:22Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

+// TODO: currently strict validation is set via
+// query param. Alternatively we could use a request header.
+// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)
+// TODO: what do we want the query param to actually be (I went with ?validate=strict but am open to whatever)


Changed to ?fieldValidation=Strict based on comment from KEP. Param name not set in stone yet though, if you or anyone else has a better suggestion.

luxas

Thanks @kevindelgado!

luxas · 2021-09-16T12:55:07Z

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/codec_factory.go

@@ -85,6 +94,7 @@ func newSerializersForScheme(scheme *runtime.Scheme, mf json.MetaFactory, option
 			FileExtensions:     []string{"yaml"},
 			EncodesAsText:      true,
 			Serializer:         yamlSerializer,


This reminds me of the fact (about code smell with this setup that you pointed out above) that there is no PrettySerializer for YAML, which has bitten me at least once.

luxas · 2021-09-16T12:56:28Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

+// media type, because the list of media types that support field validation are a subset of
+// all supported media types (only json and yaml supports field validation).
+func fieldValidation(req *http.Request) (fieldValidationDirective, error) {
+	supportedMediaTypes := []string{"application/json", "application/yaml"}


Aha, I thought from reading the KEP that we would not support this way, and hence, keep the API area as small as possible to begin with at least?

Ultimately, I’m just trying to make sure we error if the user requests validation on protobuf content.

I figured it would be more correct to have an explicit allowlist of json and yaml rather than a denylist of protobuf. I’m not sure if that’s better or not though, since protobuf is the only disallowed content type (that wouldn’t trigger an unsupported content type error later on). Let me know what you think is the best way to do this

luxas · 2021-09-16T12:56:42Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

+func fieldValidation(req *http.Request) (fieldValidationDirective, error) {
+	supportedMediaTypes := []string{"application/json", "application/yaml"}
+	supported := false
+	contentType := req.Header.Get("ContentType")


Content-Type?

Good catch, I've fixed and added tests for this

luxas · 2021-09-16T12:57:16Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go

+	case 0:
+		return ignoreFieldValidation, nil
+	case 1:
+		switch validationParam[0] {


Does casing matter, if not, I'd do a strings.ToLower here

Hmm, that’s a good question and one I’d pose to an API reviewer. I was using uppercase based on this comment, but agree that it seems a little weird to require uppercase. I’ve added strings.ToLower for now. @lavalamp, do you know if query parameters are or should be case-sensitive?

luxas · 2021-09-16T12:58:17Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/update.go

@@ -103,7 +103,17 @@ func UpdateResource(r rest.Updater, scope *RequestScope, admit admission.Interfa
 		original := r.New()

 		trace.Step("About to convert to expected version")
-		decoder := scope.Serializer.DecoderToVersion(s.Serializer, scope.HubGroupVersion)
+		decodeSerializer := s.Serializer


is this use-case common enough to warrant some kind of internal helper like decodeSerializer := serializerFor(req), which also takes care of the feature gate?

I think it’s a little more involved given the serializer requirements of the other handlers (specifically the various ways patch handling works). I’m going to follow up on this in another commit that tries to better unify decode serializer generation across all the handlers.

luxas · 2021-09-16T13:33:12Z

I don't understand why strict serializer is so expensive. While 20-30% is acceptable for opt in, the actual checks should be amortizable (we already have to iterate every field before assignment). Naively I would expect that strict serialization should be capable of being free (<3-5%)

As far as I understand, this is because in the strict deserialization codepath we

decode the object in a non-strict manner into obj (same as the non-strict path, i.e. no overhead yet) https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/serializer/json/json.go#L279
run a YAMLToJSONStrict (validates duplicate fields) which in turn does
- yaml.UnmarshalStrict https://github.com/kubernetes-sigs/yaml/blob/master/yaml.go#L124
- json.Marshal https://github.com/kubernetes-sigs/yaml/blob/master/yaml.go#L139
deep-copy obj https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/serializer/json/json.go#L288 (makes sure we don't modify obj in any way as a side-effect of 3.)
run the strict decoder https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/serializer/json/json.go#L289 (unknown field checking)

So in fact, with this in mind, I'm surprised that it's not actually more expensive than shown. If one compares the strict and non-strict decode codepaths, I think it could be an order of magnitude more expensive even.

E.g. for a benchmark of

var testData = []byte(`{"foo": "bar", "baz": 1234, "is": true, "arr": [], "obj": {}}`)

type testStruct struct {
	Foo string            `json:"foo" yaml:"foo"`
	Baz int64             `json:"baz" yaml:"baz"`
	Is  bool              `json:"is" yaml:"is"`
	Arr []string          `json:"arr" yaml:"arr"`
	Obj map[string]string `json:"obj" yaml:"obj"`
}

I got benchmarking results (for encoding/json.Unmarshal, json-iter (the way we have it configured in k8s.io/apimachinery currently), and yaml.v2 that we're using in sigs.k8s.io/yaml) as follows:

BenchmarkJSONUnmarshal-8                   	  540610	      1893 ns/op	     360 B/op	       9 allocs/op
BenchmarkJSONIterUnmarshal-8               	 2344988	       505.5 ns/op	     123 B/op	       4 allocs/op
BenchmarkYAMLv2Unmarshal-8                 	   91000	     13636 ns/op	   15248 B/op	      72 allocs/op

The reason for doing the above flow at the time (#74111) was that there was (and still is) no builtin way in neither encoding/json or json-iter to support detecting duplicate fields, hence the duplicate fields are checked using YAMLToJSONStrict, which is able to do this. Of course, it would be much better/sane/faster to do this directly in the decoder, so I have good hopes for #105030 to address this fully 👍. A side note is that once we upgrade to yaml.v3 (YAML 1.2) in sigs.k8s.io/yaml, duplicate fields will always error, as this is a violation of the YAML spec.

The deep copy was added mostly for being super safe and consistent with the non-strict codepath, but if we're confident in the strict decoder, then we should (at a first sight at least) be able to drop that, and just run either the non-strict or strict decoder, for which I think the performance hit will become negligible. We will need struct errors from the underlying decoder to give different reasons for failing, so we can return the appropriate runtime Unknown/Duplicate field errors.

liggitt · 2021-09-16T13:45:38Z

The reason for doing the above flow at the time (#74111) was that there was (and still is) no builtin way in neither encoding/json or json-iter to support detecting duplicate fields, hence the duplicate fields are checked using YAMLToJSONStrict, which is able to do this. Of course, it would be much better/sane/faster to do this directly in the decoder, so I have good hopes for #105030 to address this fully

#105030 adds the ability to error on duplicate fields directly in the json decoder, which would let us avoid calling the yaml decoder, then the json decoder (for json input).

However, callers of the strict unmarshalers currently depend on being able to distinguish a strict error from a normal error (see uses of IsStrictDecodingError). To continue to distinguish that, we would still have to do the regular decode, then the strict decode. I'd be open to dropping that capability and just making those callers try decoding twice on their own.

liggitt · 2021-09-16T13:46:35Z

We will need struct errors from the underlying decoder to give different reasons for failing, so we can return the appropriate runtime Unknown/Duplicate field errors.

Or make callers that want to distinguish double decode themselves... we currently have exactly two uses in-tree, and both are at component startup, so decoding config twice would not be a performance issue.

apelisse · 2021-09-16T17:00:59Z

staging/src/k8s.io/apiserver/pkg/endpoints/apiserver_test.go

+// runRequestOrDie is like runRequest, but used for benchmarking.
+func runRequestOrDie(path, verb string, data []byte, contentType string) *http.Response {
+	request, err := http.NewRequest(verb, path, bytes.NewBuffer(data))
+	if err != nil {
+		panic(err)
+	}
+	if contentType != "" {
+		request.Header.Set("Content-Type", contentType)
+	}
+	response, err := http.DefaultClient.Do(request)
+	if err != nil {
+		panic(err)
+	}
+	return response
+}


I feel like we should return the error in the previous runRequest function, avoid the duplication, and the caller should decide whether they want to Fatal or panic.

But stepping back a notch, I think panic and Fatal have almost the same goal, so I'm guessing your problem is that you can't give testing.B to runRequest. You could merely change runRequest to take a TB

Ya, I was being dumb and forgot TB existed, changed it.

kevindelgado · 2021-10-26T17:36:36Z

Superseded by #105916

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 18, 2021

k8s-ci-robot requested review from jpbetz and logicalhan August 18, 2021 13:54

k8s-ci-robot added the area/apiserver label Aug 18, 2021

Strict schema validation for POST and PUT

b18af56

kevindelgado force-pushed the strict-validation-post-put branch from 5ccbd65 to b18af56 Compare August 18, 2021 14:01

kevindelgado commented Aug 18, 2021

View reviewed changes

Add benchmarking for strict validation

c2b5ebb

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 18, 2021

k8s-ci-robot assigned apelisse Aug 19, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2021

kevindelgado mentioned this pull request Aug 26, 2021

WIP: Strict Schema Validation for JSON Patch Requests #104619

Closed

lavalamp reviewed Sep 7, 2021

View reviewed changes

add benchmarking to integration tests

fbbb9bd

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 14, 2021

kevindelgado commented Sep 15, 2021

View reviewed changes

fieldValidation returns enum, error

a50d996

kevindelgado force-pushed the strict-validation-post-put branch from d6244b9 to a50d996 Compare September 15, 2021 00:38

This was referenced Sep 16, 2021

Error for duplicate fields json-iterator/go#575

Draft

Upgrade yaml.v2 to yaml.v3, improve testing and bugfixes kubernetes-sigs/yaml#61

Closed

luxas reviewed Sep 16, 2021

View reviewed changes

apelisse reviewed Sep 16, 2021

View reviewed changes

kevindelgado added 2 commits September 21, 2021 17:38

WIP start addressing PR feedback

8fdf47d

Address feedback, improve integration/bench tests

40ef1fc

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 1, 2021

kevindelgado force-pushed the strict-validation-post-put branch from 193c40d to 40ef1fc Compare October 1, 2021 20:25

k8s-ci-robot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Oct 1, 2021

kevindelgado mentioned this pull request Oct 26, 2021

Server Side Field Validation liggitt/kubernetes#2

Closed

kevindelgado closed this Oct 26, 2021

WIP: Strict schema validation for POST and PUT #104433

WIP: Strict schema validation for POST and PUT #104433

Conversation

kevindelgado commented Aug 18, 2021 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

kevindelgado commented Aug 18, 2021

kevindelgado left a comment • edited

Choose a reason for hiding this comment

kevindelgado Aug 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevindelgado commented Aug 18, 2021 • edited

apelisse commented Aug 18, 2021

kevindelgado commented Aug 18, 2021

apelisse commented Aug 18, 2021

fedebongio commented Aug 19, 2021

smarterclayton commented Sep 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevindelgado Sep 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp commented Sep 7, 2021

k8s-ci-robot commented Sep 14, 2021

kevindelgado commented Sep 14, 2021

kevindelgado left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevindelgado Sep 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luxas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luxas commented Sep 16, 2021

liggitt commented Sep 16, 2021 • edited

liggitt commented Sep 16, 2021

apelisse Sep 16, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevindelgado commented Oct 26, 2021

kevindelgado commented Aug 18, 2021 •

edited

kevindelgado left a comment •

edited

kevindelgado Aug 18, 2021 •

edited

kevindelgado commented Aug 18, 2021 •

edited

kevindelgado Sep 15, 2021 •

edited

kevindelgado Sep 15, 2021 •

edited

liggitt commented Sep 16, 2021 •

edited

apelisse Sep 16, 2021 •

edited