Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Strict schema validation for POST and PUT #104433

Closed

Conversation

kevindelgado
Copy link
Contributor

@kevindelgado kevindelgado commented Aug 18, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

Performs strict server side schema validation for POST and PUT requests via the validate=strict query parameter.

Step one of deprecating the very painful client side validation.

Which issue(s) this PR fixes:

First step in solving #39434 and #5889

Does this PR introduce a user-facing change?

Performs strict server side schema validation for POST and PUT requests via the `validate=strict` query parameter.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

design doc
KEP-2885

[design doc](https://docs.google.com/document/d/1MmEkA2xEr_vOJ9SrpWu92rAzvaxEY47PHeRKcfO86mc/edit?usp=sharing)
[KEP-2885](https://github.com/kubernetes/enhancements/issues/2885)

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 18, 2021
@kevindelgado
Copy link
Contributor Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 18, 2021
Copy link
Contributor Author

@kevindelgado kevindelgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most open questions are in line as TODOs. Still working on the KEP, but just thought it would be easier to ask some of the questions I have via code rather than through a doc.

PTAL @apelisse @jpbetz

cc @lavalamp (I still owe you a benchmark of the performance of strict vs non-strict decoding that I will run shortly)

@@ -69,11 +70,19 @@ func newSerializersForScheme(scheme *runtime.Scheme, mf json.MetaFactory, option
json.SerializerOptions{Yaml: false, Pretty: true, Strict: options.Strict},
)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this function and the slice of serializerTypes it returns doesn't sit well with me.

  1. We don't have a serializer for pretty && strict, every option we add is going to balloon the number of serializerTypes exponentially.
  2. We don't seem to use options.Strict and options.Pretty consistently. We have a separate serializer for pretty, but just pass the strict options through.
  3. Having both a serializer that is always strict and a serializer that is variably strict based on options smells repugnant.

I have a hunch that we could do something about the ballooning constructors in negotiation/negotiate.go that would make our lives easier here, but I haven't fully thought it through yet.

That being said, rewriting all the negotiation constructors might be too much yak shaving for the validation task at hand, let me know if there's a simpler way I'm not thinking of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (should) only be used for DEserialization, it may seem less bad if you name it accordingly? Pretty only makes sense when writing and strict only makes sense when reading.

... the next time we add something we'll probably have this problem though.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 18, 2021
@kevindelgado
Copy link
Contributor Author

kevindelgado commented Aug 18, 2021

Added some benchmarking per our discussion from the sig/apimachinery meeting @lavalamp.

Let me know if you think there’s any issues with the methodology (I’ve noted that we’re only benchmarking one simple resource).

The point of the benchmarking was to determine whether we want to do strict validation only when requested, or if we should do it on every request and warn instead of error when the user doesn’t want strict validation. What do you think now given these results?

❯ benchstat bench_non_strict.txt bench_strict.txt
name                        old time/op    new time/op    delta
StrictValidationSimples-16     853µs ±10%     945µs ± 5%     ~     (p=0.056 n=5+5)

name                        old alloc/op   new alloc/op   delta
StrictValidationSimples-16    74.8kB ± 1%    94.6kB ± 1%  +26.51%  (p=0.008 n=5+5)

name                        old allocs/op  new allocs/op  delta
StrictValidationSimples-16       459 ± 0%       596 ± 0%  +29.79%  (p=0.016 n=5+4)

@apelisse
Copy link
Member

You can possibly use benchstat to display these results, and you probably should include memory usage too

@kevindelgado
Copy link
Contributor Author

You can possibly use benchstat to display these results, and you probably should include memory usage too

Done, updated the above with benchstat output

@apelisse
Copy link
Member

If we wanted to have these as warnings, we would have to run twice then. We would have to decide in what order (probably strict first and then non-strict, since strict shouldn't fail in most cases). The nominal case would then be to do it mostly once with strict.

Something we could do is to try to run this against a huge cluster and see if it has any impact on latency.

@fedebongio
Copy link
Contributor

/assign @apelisse
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2021
@smarterclayton
Copy link
Contributor

I don't understand why strict serializer is so expensive. While 20-30% is acceptable for opt in, the actual checks should be amortizable (we already have to iterate every field before assignment). Naively I would expect that strict serialization should be capable of being free (<3-5%)

// TODO: currently strict validation is set via
// query param. Alternatively we could use a request header.
// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)
// TODO: what do we want the query param to actually be (I went with ?validate=strict but am open to whatever)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?validation=strict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ?fieldValidation=Strict based on comment from KEP. Param name not set in stone yet though, if you or anyone else has a better suggestion.

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
return len(url.Query()["dryRun"]) != 0
}

// TODO: currently strict validation is set via
// query param. Alternatively we could use a request header.
// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't work, it needs to return an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass at doing this. It's a little ugly, so I will take another pass writing it more cleanly once I add further testing for this.

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
return len(url.Query()["dryRun"]) != 0
}

// TODO: currently strict validation is set via
// query param. Alternatively we could use a request header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query param is good (easier to use from some languages).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Removed the TODO about this.

@lavalamp
Copy link
Member

lavalamp commented Sep 7, 2021

I don't understand why strict serializer is so expensive

Judging by sub-ms times, I think "simples" are very tiny and any constant overhead is going to be pronounced. Can we try with bigger objects?

@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 14, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kevindelgado
To complete the pull request process, please assign liggitt after the PR has been reviewed.
You can assign the PR to them by writing /assign @liggitt in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kevindelgado
Copy link
Contributor Author

I don't understand why strict serializer is so expensive

Judging by sub-ms times, I think "simples" are very tiny and any constant overhead is going to be pronounced. Can we try with bigger objects?

@lavalamp This has sent me down a bit of a rabbit hole, so I wanted to check in before I go too deep.

I’m trying to benchmark objects bigger than simples. The place that I was benchmarking the simples apiserver/pkg/endpoints/apiserver_test.go, doesn’t seem like an appropriate place to benchmark bigger objects, because you have to manually add the rest.Storage map to the test server (and just from a consistency perspective, nothing else in this package uses any objects beyond simples).

My approach has instead been to add a benchmark to the integration test in test/integration/apiserver/apiserver_test.gosince this gives us access to the full suite of resources.

This has given promising results with strict validation being ~11% slower and ~25% more memory.

❯ benchstat benchvalidation-10x-nonstrict.log benchvalidation-10x-strict.log
name                                  old time/op    new time/op    delta
FieldValidation/strict-deployment-16    5.18ms ± 5%    5.73ms ± 9%  +10.76%  (p=0.001 n=9+9)
 
name                                  old alloc/op   new alloc/op   delta
FieldValidation/strict-deployment-16     264kB ± 0%     323kB ± 1%  +22.47%  (p=0.000 n=10+10)
 
name                                  old allocs/op  new allocs/op  delta
FieldValidation/strict-deployment-16     3.23k ± 0%     4.07k ± 0%  +26.04%  (p=0.000 n=10+10)

Is this a better place to do the benchmarking? Are there other/better places to add benchmarking? I also considered adding benchmarking to the low level serializer/json package but that has its own issues (doesn’t seem like it would benchmark anything that jsoniter wouldn’t already do, doesn’t actually capture the effect on the whole API request, etc).

If this is a good place to put it, I can go ahead and add even bigger objects to the testing table so that we can see the effect of object size on slowness of strict validation. (Aside: do you know where I can find objects of various sizes in raw json/yaml that I can test here, otherwise I can hand craft them myself, but wanted to check first?).

Copy link
Contributor Author

@kevindelgado kevindelgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed remaining feedback. Still peppered with TODOs, but wanted to start conversation on proper benchmarking before getting too far into the review.

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
return len(url.Query()["dryRun"]) != 0
}

// TODO: currently strict validation is set via
// query param. Alternatively we could use a request header.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Removed the TODO about this.

@@ -456,6 +456,15 @@ func isDryRun(url *url.URL) bool {
return len(url.Query()["dryRun"]) != 0
}

// TODO: currently strict validation is set via
// query param. Alternatively we could use a request header.
// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass at doing this. It's a little ugly, so I will take another pass writing it more cleanly once I add further testing for this.

// TODO: currently strict validation is set via
// query param. Alternatively we could use a request header.
// TODO: maybe we should check content-type here and error for protobuf (ie take the full req instead of just the URL?)
// TODO: what do we want the query param to actually be (I went with ?validate=strict but am open to whatever)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ?fieldValidation=Strict based on comment from KEP. Param name not set in stone yet though, if you or anyone else has a better suggestion.

Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kevindelgado!

@@ -85,6 +94,7 @@ func newSerializersForScheme(scheme *runtime.Scheme, mf json.MetaFactory, option
FileExtensions: []string{"yaml"},
EncodesAsText: true,
Serializer: yamlSerializer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of the fact (about code smell with this setup that you pointed out above) that there is no PrettySerializer for YAML, which has bitten me at least once.

// media type, because the list of media types that support field validation are a subset of
// all supported media types (only json and yaml supports field validation).
func fieldValidation(req *http.Request) (fieldValidationDirective, error) {
supportedMediaTypes := []string{"application/json", "application/yaml"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I thought from reading the KEP that we would not support this way, and hence, keep the API area as small as possible to begin with at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, I’m just trying to make sure we error if the user requests validation on protobuf content.

I figured it would be more correct to have an explicit allowlist of json and yaml rather than a denylist of protobuf. I’m not sure if that’s better or not though, since protobuf is the only disallowed content type (that wouldn’t trigger an unsupported content type error later on). Let me know what you think is the best way to do this

func fieldValidation(req *http.Request) (fieldValidationDirective, error) {
supportedMediaTypes := []string{"application/json", "application/yaml"}
supported := false
contentType := req.Header.Get("ContentType")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've fixed and added tests for this

case 0:
return ignoreFieldValidation, nil
case 1:
switch validationParam[0] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does casing matter, if not, I'd do a strings.ToLower here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that’s a good question and one I’d pose to an API reviewer. I was using uppercase based on this comment, but agree that it seems a little weird to require uppercase. I’ve added strings.ToLower for now. @lavalamp, do you know if query parameters are or should be case-sensitive?

@@ -103,7 +103,17 @@ func UpdateResource(r rest.Updater, scope *RequestScope, admit admission.Interfa
original := r.New()

trace.Step("About to convert to expected version")
decoder := scope.Serializer.DecoderToVersion(s.Serializer, scope.HubGroupVersion)
decodeSerializer := s.Serializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this use-case common enough to warrant some kind of internal helper like decodeSerializer := serializerFor(req), which also takes care of the feature gate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s a little more involved given the serializer requirements of the other handlers (specifically the various ways patch handling works). I’m going to follow up on this in another commit that tries to better unify decode serializer generation across all the handlers.

@luxas
Copy link
Member

luxas commented Sep 16, 2021

I don't understand why strict serializer is so expensive. While 20-30% is acceptable for opt in, the actual checks should be amortizable (we already have to iterate every field before assignment). Naively I would expect that strict serialization should be capable of being free (<3-5%)

As far as I understand, this is because in the strict deserialization codepath we

  1. decode the object in a non-strict manner into obj (same as the non-strict path, i.e. no overhead yet) https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/serializer/json/json.go#L279
  2. run a YAMLToJSONStrict (validates duplicate fields) which in turn does
  3. deep-copy obj https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/serializer/json/json.go#L288 (makes sure we don't modify obj in any way as a side-effect of 3.)
  4. run the strict decoder https://github.com/kubernetes/apimachinery/blob/master/pkg/runtime/serializer/json/json.go#L289 (unknown field checking)

So in fact, with this in mind, I'm surprised that it's not actually more expensive than shown. If one compares the strict and non-strict decode codepaths, I think it could be an order of magnitude more expensive even.

E.g. for a benchmark of

var testData = []byte(`{"foo": "bar", "baz": 1234, "is": true, "arr": [], "obj": {}}`)

type testStruct struct {
	Foo string            `json:"foo" yaml:"foo"`
	Baz int64             `json:"baz" yaml:"baz"`
	Is  bool              `json:"is" yaml:"is"`
	Arr []string          `json:"arr" yaml:"arr"`
	Obj map[string]string `json:"obj" yaml:"obj"`
}

I got benchmarking results (for encoding/json.Unmarshal, json-iter (the way we have it configured in k8s.io/apimachinery currently), and yaml.v2 that we're using in sigs.k8s.io/yaml) as follows:

BenchmarkJSONUnmarshal-8                   	  540610	      1893 ns/op	     360 B/op	       9 allocs/op
BenchmarkJSONIterUnmarshal-8               	 2344988	       505.5 ns/op	     123 B/op	       4 allocs/op
BenchmarkYAMLv2Unmarshal-8                 	   91000	     13636 ns/op	   15248 B/op	      72 allocs/op

The reason for doing the above flow at the time (#74111) was that there was (and still is) no builtin way in neither encoding/json or json-iter to support detecting duplicate fields, hence the duplicate fields are checked using YAMLToJSONStrict, which is able to do this. Of course, it would be much better/sane/faster to do this directly in the decoder, so I have good hopes for #105030 to address this fully 👍. A side note is that once we upgrade to yaml.v3 (YAML 1.2) in sigs.k8s.io/yaml, duplicate fields will always error, as this is a violation of the YAML spec.

The deep copy was added mostly for being super safe and consistent with the non-strict codepath, but if we're confident in the strict decoder, then we should (at a first sight at least) be able to drop that, and just run either the non-strict or strict decoder, for which I think the performance hit will become negligible. We will need struct errors from the underlying decoder to give different reasons for failing, so we can return the appropriate runtime Unknown/Duplicate field errors.

@liggitt
Copy link
Member

liggitt commented Sep 16, 2021

The reason for doing the above flow at the time (#74111) was that there was (and still is) no builtin way in neither encoding/json or json-iter to support detecting duplicate fields, hence the duplicate fields are checked using YAMLToJSONStrict, which is able to do this. Of course, it would be much better/sane/faster to do this directly in the decoder, so I have good hopes for #105030 to address this fully

#105030 adds the ability to error on duplicate fields directly in the json decoder, which would let us avoid calling the yaml decoder, then the json decoder (for json input).

However, callers of the strict unmarshalers currently depend on being able to distinguish a strict error from a normal error (see uses of IsStrictDecodingError). To continue to distinguish that, we would still have to do the regular decode, then the strict decode. I'd be open to dropping that capability and just making those callers try decoding twice on their own.

@liggitt
Copy link
Member

liggitt commented Sep 16, 2021

We will need struct errors from the underlying decoder to give different reasons for failing, so we can return the appropriate runtime Unknown/Duplicate field errors.

Or make callers that want to distinguish double decode themselves... we currently have exactly two uses in-tree, and both are at component startup, so decoding config twice would not be a performance issue.

Comment on lines 4371 to 4385
// runRequestOrDie is like runRequest, but used for benchmarking.
func runRequestOrDie(path, verb string, data []byte, contentType string) *http.Response {
request, err := http.NewRequest(verb, path, bytes.NewBuffer(data))
if err != nil {
panic(err)
}
if contentType != "" {
request.Header.Set("Content-Type", contentType)
}
response, err := http.DefaultClient.Do(request)
if err != nil {
panic(err)
}
return response
}
Copy link
Member

@apelisse apelisse Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should return the error in the previous runRequest function, avoid the duplication, and the caller should decide whether they want to Fatal or panic.

But stepping back a notch, I think panic and Fatal have almost the same goal, so I'm guessing your problem is that you can't give testing.B to runRequest. You could merely change runRequest to take a TB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I was being dumb and forgot TB existed, changed it.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 1, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Oct 1, 2021
@kevindelgado
Copy link
Contributor Author

Superseded by #105916

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants