Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set expected in-memory version when decoding unstructured objects from etcd #78713

Merged
merged 2 commits into from Jun 6, 2019

Conversation

@liggitt
Copy link
Member

commented Jun 5, 2019

What type of PR is this?
/kind bug
/kind flake

What this PR does / why we need it:

  • Fixes in-memory versions and conversion for custom resources (and adds tests)
  • Fixes defaulting of custom resource list items in list responses (and adds tests)

Which issue(s) this PR fixes:
Fixes #72493

Does this PR introduce a user-facing change?:

Fixed a spurious error where update requests to the status subresource of multi-version custom resources would complain about an incorrect API version.

/cc @sttts @jpbetz @roycaihw
/sig api-machinery

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

successfully added a test that recreates the issue:

TestWebhookConverterWithoutWatchCache/nontrivial-converter/check-3/getting_objects_created_as_v1alpha1:

conversion_test.go:340: MultiVersion.stable.example.com "converted-v1alpha1" is invalid: apiVersion: Invalid value: "stable.example.com/v1beta1": must be stable.example.com/v1alpha1

@liggitt liggitt force-pushed the liggitt:crd-status-conversion-error branch from 263b77d to 4451245 Jun 5, 2019

@liggitt liggitt changed the title WIP - Set expected in-memory kind for CRD StatusREST Set expected in-memory version when decoding unstructured objects from etcd Jun 5, 2019

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

/priority critical-urgent

this fix is required for webhook conversion

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

/milestone v1.15

@k8s-ci-robot k8s-ci-robot added this to the v1.15 milestone Jun 5, 2019

@liggitt liggitt force-pushed the liggitt:crd-status-conversion-error branch from 4451245 to 37acaec Jun 5, 2019

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

looks like defaulting has a similar issue when I disable the watch cache in the integration test, it fails 100% of the time. will run that test with and without the watch cache, and dig into fixing that in this PR as well

/hold

@sttts

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

/lgtm

@liggitt liggitt force-pushed the liggitt:crd-status-conversion-error branch from 83434db to 296e60f Jun 6, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jun 6, 2019

@liggitt liggitt force-pushed the liggitt:crd-status-conversion-error branch from 296e60f to 8f91440 Jun 6, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

looks like defaulting has a similar issue when I disable the watch cache in the integration test, it fails 100% of the time. will run that test with and without the watch cache, and dig into fixing that in this PR as well

There were a couple issues here:

Issue the 1st
The TestCustomResourceDefaulting integration test was not correct. It did the following:

  • set up a single-version custom resource
  • set up defaults for spec.c and status.c fields
  • made a status update without specifying status.c and verified a get of the resulting object contained both spec.c and status.c set
  • removed the defaults
  • asserted the spec.c field should disappear from get responses

However, since defaulting is applied to the object in the incoming request and the object read from etcd, the status update combining the status stanza of the incoming object (with status.c defaulted) and the spec stanza of the existing object (with spec.c defaulted) should have persisted both fields in etcd. Even when the defaults were removed from the schema, those persisted values should remain.

How then, was this test passing in master? Soldier on, dear reader.

Issue the 2nd
Defaults were not being applied to custom resources in list responses. This was due to an incorrect short-circuit in versioning.codec#Decode, which skipped defaulting (incorrectly) along with conversion (correctly) when the provided object to decode into was used as-is:

if into == obj {
if isVersioned {
return versioned, gvk, nil
}
return into, gvk, nil
}
// perform defaulting if requested
if c.defaulter != nil {
// create a copy to ensure defaulting is not applied to the original versioned objects
if isVersioned {
versioned.Objects = []runtime.Object{obj.DeepCopyObject()}
}
c.defaulter.Default(obj)
} else {
if isVersioned {
versioned.Objects = []runtime.Object{obj}
}
}
if err := c.convertor.Convert(obj, into, c.decodeVersion); err != nil {
return nil, gvk, err
}

Only conversion should have been skipped there (there's no need to convert an object to itself). The result was that custom resource list responses did not have defaults applied to items.

A notable consumer of list responses is a lister/watcher that feeds the watch cache. That caused items in the watch cache from the initial list to be missing default values. Because updates attempt to use the last observed item from the watch cache as the "old object", that meant that usually the old object did not have defaults, so the spec.c field of the old object was not defaulted, and the incorrect test assertion that spec.c would disappear from get requests once the default was removed proved to be what actually happened, because spec.c never got persisted.

The fix
The fix correctly applies defaults on decode, and custom resource defaulting tests were added for the following operations:

  • with the watch cache enabled:
    • get
    • get cached (resourceVersion=0)
    • list
    • list cached (resourceVersion=0)
  • with the watch cache disabled:
    • get
    • get cached (resourceVersion=0)
    • list
    • list cached (resourceVersion=0)
    • watch

Additionally, a multi-version CRD was used, so that we can set different defaults for the storage version and REST API version, to make sure both phases are getting exercised.

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

/hold cancel

@liggitt liggitt force-pushed the liggitt:crd-status-conversion-error branch from 8f91440 to 0236aca Jun 6, 2019

@liggitt liggitt force-pushed the liggitt:crd-status-conversion-error branch from 0236aca to a3bb81f Jun 6, 2019

@sttts

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jun 6, 2019

@liggitt

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit f3f2b40 into kubernetes:master Jun 6, 2019

21 checks passed

cla/linuxfoundation liggitt authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@liggitt liggitt deleted the liggitt:crd-status-conversion-error branch Jun 6, 2019

@roycaihw

This comment has been minimized.

Copy link
Member

commented Sep 17, 2019

/area custom-resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.