New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apimachinery: Add a strict YAML and JSON deserializer option #71589
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: neolit123 If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @neolit123! Great to see this moving forward 👏
As a consumer, I want something in between in the meantime that warns the user, not completely exits the application on a failed strict decode, so I think we need to be able to check the error type, and also it'd be nice if we could get some more programmatic information from the error instead of "just" the string, e.g. the gvk metadata for the type decode that failed.
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/codec_factory.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/codec_factory.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
@neolit123 edited the title/relnote a bit to make it clear that this is optional, and not enforced by default. |
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go
Outdated
Show resolved
Hide resolved
i will definitely include the GVK and properly type the errors. currently there are a couple of ways to do this:
both ways would be computationally similar as unmarshaling has to be done twice. |
Option 2 is what I prefer, it is way better.
See my comment in #71589 (comment) to avoid doing unmarshal twice. In any case, I think we need some Go benchmarks to know how much slower the strict decoding is, if we're ever gonna use it in places like the API server where milliseconds matter. |
07bf78f
to
79837db
Compare
@liggitt @luxas this ends up being in the lines of:
please TAL at this part: |
79837db
to
0aa052a
Compare
added unit tests for valid input to the strict decoders. |
cc @smarterclayton for strict decoding mechanism |
/assign @smarterclayton |
i will update the PR with the comments by @luxas on Monday. |
Add a new universal decoder and universal deserializer. This enables checks for unknown and duplicate fields in input YAML and JSON data. Example usage: runtime.DecodeInto(MyCodecFactory.UniversalStrictDecoder(), content, into) MyCodecFactory.UniversalStrictDeserializer().Decode(content, gvk, into) The same CodecFactory can also return the non-strict variants. A custom json-iterator API object is used to check for unknown fields. For duplicate fields the sigs.k8s.io/yaml.YAMLToJSONStrict() function is used. Also add: - Unit tests in json_test.go. - New error types StrictDecoderError, DuplicateFieldError, UnknownFieldError.
0aa052a
to
7cd866d
Compare
updated. |
some benchmarks pseudo test code: start := time.Now()
for i := 0; i < 10000; i++ {
runtime.DecodeInto(myScheme.Codecs.UniversalDecoder(), fileContent, targetObject)
}
elapsed := time.Since(start)
fmt.Printf("elapsed non-strict %v\n", elapsed)
start = time.Now()
for i := 0; i < 10000; i++ {
runtime.DecodeInto(myScheme.Codecs.UniversalStrictDecoder(), fileContent, targetObject)
}
elapsed = time.Since(start)
fmt.Printf("elapsed strict %v\n", elapsed) test data: apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
etcd:
local:
imageRepository: "k8s.gcr.io"
imageTag: "3.2.24"
dataDir: "/var/lib/etcd"
extraArgs:
listen-client-urls: "http://10.100.0.1:2379"
serverCertSANs:
- "ec2-10-100-0-1.compute-1.amazonaws.com"
peerCertSANs:
- "10.100.0.1"
networking:
serviceSubnet: "10.96.0.0/12"
podSubnet: "10.100.0.1/24"
dnsDomain: "cluster.local"
kubernetesVersion: "v1.12.0"
controlPlaneEndpoint: "10.100.0.1:6443"
apiServer:
extraArgs:
authorization-mode: "Node,RBAC"
extraVolumes:
- name: "some-volume"
hostPath: "/etc/some-path"
mountPath: "/etc/some-pod-path"
readOnly: false
pathType: File
certSANs:
- "10.100.1.1"
- "ec2-10-100-0-1.compute-1.amazonaws.com"
timeoutForControlPlane: 4m0s
controllerManager:
extraArgs:
"node-cidr-mask-size": "20"
extraVolumes:
- name: "some-volume"
hostPath: "/etc/some-path"
mountPath: "/etc/some-pod-path"
readOnly: false
pathType: File
scheduler:
extraArgs:
address: "10.100.0.1"
extraVolumes:
- name: "some-volume"
hostPath: "/etc/some-path"
mountPath: "/etc/some-pod-path"
readOnly: false
pathType: File
certificatesDir: "/etc/kubernetes/pki"
imageRepository: "k8s.gcr.io"
useHyperKubeImage: false
clusterName: "example-cluster" test cases:
summary |
There are three types of behavior we'll eventually want from decoders:
All of those could be handled uniformly if the decoder returned structured duplicate/unknown field info separately, and the caller decided whether to ignore, warn, or error on it. I'm not sure adding factory APIs to construct alternate decoders with unstructured fail-fast errors for duplicate/unknown fields takes us in the right direction. Would like @smarterclayton's thoughts on the direction of that approach. |
with the current usage of low level libraries, a warning state is not possible without multi-pass unmarshal.
i can still see usage for the separate strict / non strict decoder and the space in between if one wants to handle warnings. |
Yeah, I am really concerned with adding a new path to the factory that
doesn’t take those into account. We want to reduce the complexity of
decoding, not increase it.
There are three rough decoding angles at play:
1. An apiserver needs to decode into a target version, get an accounting of
everything it does not recognize, and then make a decision based on other
api input whether to warn, error, or continue (and definitely needs
structured errors a la the invalid structure which identifies field names)
2. A client talking to the apiserver needs the choice of whether to warn or
ignore, but handles it differently (based on the callers needs for the use
case)
3. Disk / stable storage reading code needs to perform minimal
transformation of the input where possible and delegate to the server
(unstructured / kubectl) or it needs to have strictly defined behavior
(reading config from disk or loading data from etcd)
We have talked about dramatically simplifying the serialization stack for 1
and 2, and the first part of three (likely we would either remove or
simplify codec and the factory). The second part of three would probably
also go through some simplification to make conversion explicit.
It might be best if we talk through what the changes above might mean
before we grow the factory.
On Jan 7, 2019, at 12:36 PM, Jordan Liggitt <notifications@github.com> wrote:
There are three types of behavior we'll eventually want from decoders:
- ignore duplicate/unknown fields (current behavior)
- warn on duplicate/unknown fields (useful for surfacing potential
issues while keeping API compatibility)
- error on duplicate/unknown fields (what this PR partially adds)
All of those could be handled uniformly if the decoder returned structured
duplicate/unknown field info separately, and the caller decided whether to
ignore, warn, or error on it. I'm not sure adding factory APIs to construct
alternate decoders with unstructured fail-fast errors for duplicate/unknown
fields takes us in the right direction. Would like @smarterclayton
<https://github.com/smarterclayton>'s thoughts on the direction of that
approach.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#71589 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p7Q-TotZnQ5fQ5SWtgjYy8V7enokks5vA4W3gaJpZM4Y7IAF>
.
|
ok, i'm going to leave this PR to the lifecycle bots. |
closing in favor of: #72883 |
It is useful to apply the storage testsuite also to "external" (= out-of-tree) storage drivers. One way of doing that is setting up a custom E2E test suite, but that's still quite a bit of work. An easier alternative is to parameterize the Kubernetes e2e.test binary at runtime so that it instantiates the testsuite for one or more drivers. Some parameters have to be provided before starting the test because they define configuration and capabilities of the driver and its storage backend that cannot be discovered at runtime. This is done by populating the DriverDefinition with the content of the file that the new -storage.testdriver parameters points to. The universal .yaml and .json decoder from Kubernetes is used. It's flexible, but has some downsides: - currently ignores unknown fields (see kubernetes#71589) - poor error messages when fields have the wrong type Storage drivers have to be installed in the test cluster before starting e2e.test. Only tests involving dynamically provisioned volumes are currently supported.
It is useful to apply the storage testsuite also to "external" (= out-of-tree) storage drivers. One way of doing that is setting up a custom E2E test suite, but that's still quite a bit of work. An easier alternative is to parameterize the Kubernetes e2e.test binary at runtime so that it instantiates the testsuite for one or more drivers. Some parameters have to be provided before starting the test because they define configuration and capabilities of the driver and its storage backend that cannot be discovered at runtime. This is done by populating the DriverDefinition with the content of the file that the new -storage.testdriver parameters points to. The universal .yaml and .json decoder from Kubernetes is used. It's flexible, but has some downsides: - currently ignores unknown fields (see kubernetes#71589) - poor error messages when fields have the wrong type Storage drivers have to be installed in the test cluster before starting e2e.test. Only tests involving dynamically provisioned volumes are currently supported.
What type of PR is this?
/kind feature
What this PR does / why we need it:
pkg/runtime: implement a strict YAML and JSON deserializer
Add a new universal decoder and universal deserializer.
This enables checks for unknown and duplicate fields in input YAML
and JSON data.
Example usage:
The same CodecFactory can also return the non-strict variants.
A custom json-iterator API object is used to check for unknown fields.
For duplicate fields the sigs.k8s.io/yaml.YAMLToJSONStrict() function
is used.
Also add:
UnknownFieldError.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):xref: kubernetes/community#2977
?
Special notes for your reviewer:
NONE
Does this PR introduce a user-facing change?:
/assign @liggitt @luxas
cc @BenTheElder
/priority important-longterm
/sig api-machinery