Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable strict serializer in kubelet #83204

Merged
merged 1 commit into from Oct 31, 2019

Conversation

@obitech
Copy link
Contributor

obitech commented Sep 26, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
With strict serializers enabled, an error will be thrown on things such as duplicate or unknown fields when parsing a YAML file. This will improve code quality of user's configuration files as it enforces a proper configuration file to be present when starting a component.

Which issue(s) this PR fixes:

Ref #82924

Special notes for your reviewer:
In order to test for the error type of StrictDecodingError, the returned error in DecodeKubeletConfiguration has to be wrapped via errors.Wrap.

Does this PR introduce a user-facing change?:

kubelet: a configuration file specified via `--config` is now loaded with strict deserialization, which fails if the config file contains duplicate or unknown fields. This protects against accidentally running with config files that are malformed, mis-indented, or have typos in field names, and getting unexpected behavior.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE
@obitech

This comment has been minimized.

Copy link
Contributor Author

obitech commented Sep 26, 2019

/sig cluster-lifecycle
/sig api-machinery
/wg component-standard
/cc @stealthybox @mtaufen

@obitech

This comment has been minimized.

Copy link
Contributor Author

obitech commented Sep 26, 2019

/assign mtaufen

@jennybuckley

This comment has been minimized.

Copy link
Contributor

jennybuckley commented Sep 26, 2019

/cc @liggitt

@k8s-ci-robot k8s-ci-robot requested a review from liggitt Sep 26, 2019
@jennybuckley

This comment has been minimized.

Copy link
Contributor

jennybuckley commented Sep 26, 2019

/remove-sig api-machinery

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Sep 26, 2019

the changes requested in #82924 are quite breaking for existing kubelet and kube-proxy users. while kube-proxy is still v1alpha1 it has seen a lot of use over the years in production, mainly due to the fact it hasn't progressed to beta.

the better option would be to enable strict deserializer warnings for a grace period and not exit on errors.
kubeadm is doing that for it's API types (but this ends up as a two pass operation).

@obitech obitech force-pushed the obitech:kubelet_strict_serializer branch from 4b0d955 to a051f27 Sep 26, 2019
@obitech

This comment has been minimized.

Copy link
Contributor Author

obitech commented Sep 26, 2019

/retest

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Sep 26, 2019

@neolit123 should we suspect that many people have invalid keys in their config files today?

@craiglpeters craiglpeters added this to In progress in Provider Azure Oct 7, 2019
@craiglpeters craiglpeters removed this from In progress in Provider Azure Oct 7, 2019
@@ -35,6 +35,6 @@ func NewSchemeAndCodecs() (*runtime.Scheme, *serializer.CodecFactory, error) {
if err := kubeletconfigv1beta1.AddToScheme(scheme); err != nil {
return nil, nil, err
}
codecs := serializer.NewCodecFactory(scheme)
codecs := serializer.NewCodecFactory(scheme, serializer.EnableStrict)

This comment has been minimized.

Copy link
@mtaufen

mtaufen Oct 11, 2019

Contributor

@liggitt, did you mean returning two codecs from this function (that's how I interpreted your recommendation)?

}
klog.Warningf("using lenient decoding as strict decoding failed: %v", strictErr)
} else {
return nil, fmt.Errorf("failed to decode: %w", err)

This comment has been minimized.

Copy link
@mtaufen

mtaufen Oct 11, 2019

Contributor

probably wait on K8s to support Go 1.13 for the %w

This comment has been minimized.

Copy link
@obitech

obitech Oct 12, 2019

Author Contributor

Yup, I changed it back to use errors.Wrap from pkg/errors. I'm also using errors.Cause in the tests, which checks recursively:

errors.Cause will recursively retrieve the topmost error which does not implement causer, which is assumed to be the original cause.

return nil, fmt.Errorf("failed to decode, error: %v", err)
// For now allow lenient decoding without strict serializer. This
// path is to be dropped when v1beta1 is deprecated and removed.
if runtime.IsStrictDecodingError(err) {

This comment has been minimized.

Copy link
@mtaufen

mtaufen Oct 11, 2019

Contributor

Rather than constructing the non-strict codec here, let's change DecodeKubeletConfiguration to take a strict and legacy non-strict codec as arguments. Then, NewSchemeAndCodecs can return both the strict and non-strict codecs. I think this is what @liggitt meant.

This comment has been minimized.

Copy link
@obitech

obitech Oct 12, 2019

Author Contributor

Ok got it. I just want to double check since changing the return values on NewSchemeAndCodecs would require me to touch a couple of files:

In those places where the returned codec is important, I'm assuming to use the strict codecs that would be returned by NewSchemeAndCodecs?

This comment has been minimized.

Copy link
@liggitt

liggitt Oct 12, 2019

Member

If it is fewer ripples to build a lenient one in place (like the PR currently does), and that is the only place we try to decode a kubelet config, I'm fine with the current approach.

This comment has been minimized.

Copy link
@obitech

obitech Oct 12, 2019

Author Contributor

@liggitt I think there are two places: when decoding a local and remote config.

pkg/kubelet/kubeletconfig/util/codec/codec.go Outdated Show resolved Hide resolved
pkg/kubelet/kubeletconfig/util/codec/codec.go Outdated Show resolved Hide resolved
Copy link
Member

rosti left a comment

Thanks @obitech !
I share @liggitt 's opinion, that we should enforce strict decoding for new config versions only.
Also, I think, that strict decoders combined with Kustomize use over component configs, that lack ObjectMeta will not work. Kustomize requires the presence of metadata.name and the only way you could use it with component configs is to just add a dummy metadata.name. However, with strict decoders this won't be possible now.
Hence, we should definitely have a lenient option for now.

@obitech

This comment has been minimized.

Copy link
Contributor Author

obitech commented Oct 16, 2019

@liggitt after talking with @mtaufen we realized that we would always need to do non-strict decoding in

func DecodeRemoteConfigSource(data []byte) (RemoteConfigSource, error) {
since that comes from the API server it may contain newer fields, which would break things with strict decoding.

My idea is to make NewSchemeAndCodecs variadic, accepting

type CodecFactoryOptionsMutator func(*CodecFactoryOptions)
and then be explicit about when to be strict and non-strict. Would that work?

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Oct 16, 2019

since that comes from the API server it may contain newer fields, which would break things with strict decoding.

isn't failing on unknown fields an explicit goal of strict decoding, since it's impossible to know if the unknown fields are new or typos?

that content comes from a configmap, not a typed API the API server defaults, right?

@mtaufen

This comment has been minimized.

Copy link
Contributor

mtaufen commented Oct 17, 2019

@liggitt there are two objects at play here:

  1. The component config from the config map.
  2. The SerializedNodeConfigSource, which is a copy of the core/v1 NodeConfigSource substructure of the Node object.

(2) is where I want to be careful and consider staying non-strict, since it's ultimately derived from the API.

In theory, it may work strict because Kubelets only know about the keys they got at build time, and we'd have decoded the NodeConfigSource non-strict and dropped any unrecognized keys (when we decode the Node from the API) before copying into SerializedNodeConfigSource and persisting it, but in principle because it's derived from the API and not a user-facing input, may be better to just keep using the same semantics as the API.

@obitech obitech force-pushed the obitech:kubelet_strict_serializer branch from 5d9b581 to 9ad2dc2 Oct 20, 2019
@obitech

This comment has been minimized.

Copy link
Contributor Author

obitech commented Oct 20, 2019

Squashed.

if !runtime.IsStrictDecodingError(err) {
return nil, errors.Wrap(err, "failed to decode")
}
klog.Warningf("using lenient decoding as strict decoding failed: %v", err)

This comment has been minimized.

Copy link
@liggitt

liggitt Oct 22, 2019

Member

it seems like we should only print the warning if we successfully decode with the lenient codec and continue with that object (otherwise, a v1 config with duplicate keys will always print this warning)

also, it seems like we should return the original strict error when lenient decoding also fails (otherwise, a v1 config would return an unknown kind error because we are not registering v1 into the lenient codec)

var lenientErr error
_, lenientCodecs, lenientErr := newLenientSchemeAndCodecs()
if lenientErr != nil {
  return nil, lenientErr
}

obj, gvk, lenientErr = lenientCodecs.UniversalDecoder().Decode(data, nil, nil)
if lenientErr != nil {
  // lenient decoding also failed, return the original strict error
  return nil, fmt.Errorf("failed to decode, error: %v", err)
}
// continue with the object decoded leniently, but warn
klog.Warningf("using lenient decoding as strict decoding failed: %v", err)

This comment has been minimized.

Copy link
@mtaufen

mtaufen Oct 22, 2019

Contributor

If we did the latter (return the original strict error), could that mask the real problem in the v1?
For instance, earlier in the file I have an unrecognized key, and later in the file I have a syntax error.

This comment has been minimized.

Copy link
@mtaufen

mtaufen Oct 22, 2019

Contributor

Either way it's a confusing error...

This comment has been minimized.

Copy link
@liggitt

liggitt Oct 22, 2019

Member

If we did the latter (return the original strict error), could that mask the real problem in the v1?

v1 should always be strict, and should always fail the lenient decoder (because it only has v1beta1 registered), right?

This comment has been minimized.

Copy link
@obitech

obitech Oct 24, 2019

Author Contributor

@liggitt fixed

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Oct 22, 2019

one comment about the warning/error returns in the lenient fallback case, lgtm otherwise

@obitech obitech force-pushed the obitech:kubelet_strict_serializer branch from 9ad2dc2 to 1f96aac Oct 24, 2019
CodecFactory is started with EnableStrict that throws an error when deserializing
a Kubelet component config that is malformed (e.g. unknown or duplicate keys).

When strict decoding a v1beta1 config fails, non-strict decoding is used and a warning is emitted.
For this, NewSchemeAndCodecs is now a variadic function that can take multiple
arguments for augmenting the returned codec factory. Strict decoding is
then explicitely enabled when decoding a kubelet config.
Additionally, decoding a RemoteConfigSource needs to be non-strict
to avoid an accidental error when it contains newer API fields that are not
yet known to the Kubelet.

DecodeKubeletConfiguration returns a wrapped error instead of a simple string
so its type can be tested.

Add unit tests for unhappy paths when loading a component config

Add keys for test cases struct fields, remove nil field initialization

Co-Authored-By: Jordan Liggitt <jordan@liggitt.net>
@obitech obitech force-pushed the obitech:kubelet_strict_serializer branch from 1f96aac to bba15d3 Oct 24, 2019
@obitech

This comment has been minimized.

Copy link
Contributor Author

obitech commented Oct 31, 2019

@liggitt can you take another pass? I addressed your last comment 🙂

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Oct 31, 2019

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Oct 31, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, obitech

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 3383d7c into kubernetes:master Oct 31, 2019
15 checks passed
15 checks passed
cla/linuxfoundation obitech authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 31, 2019
@obitech obitech moved this from In progress to Done in component-base Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.