Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let the garbage collector use json merge patch when SMP is not supported #63386

Merged
merged 1 commit into from Jun 7, 2018

Conversation

@roycaihw
Copy link
Member

roycaihw commented May 3, 2018

What this PR does / why we need it:
Let garbage collector fallback to use json merge patch when strategic merge patch returns 415. This enables orphan delete on custom resources.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #56348

Special notes for your reviewer:
This PR is developed based on #56595. Ref #56606 for more information.

Release note:

Orphan delete is now supported for custom resources

/sig api-machinery

@k8s-ci-robot k8s-ci-robot requested review from deads2k and gmarek May 3, 2018

@roycaihw roycaihw force-pushed the roycaihw:gc-json-patch branch 3 times, most recently from bb0424d to dbaff2b May 3, 2018

@roycaihw roycaihw force-pushed the roycaihw:gc-json-patch branch from dbaff2b to eaf04f9 May 24, 2018

@roycaihw roycaihw force-pushed the roycaihw:gc-json-patch branch from eaf04f9 to ea2512d May 24, 2018

@roycaihw

This comment has been minimized.

Copy link
Member Author

roycaihw commented May 25, 2018

/retest

@roycaihw roycaihw force-pushed the roycaihw:gc-json-patch branch from ea2512d to 6408c69 May 25, 2018

@@ -39,21 +39,27 @@ func deleteOwnerRefStrategicMergePatch(dependentUID types.UID, ownerUIDs ...type
return []byte(patch)
}

// TODO: remove this function when we can use strategic merge patch

This comment has been minimized.

@jennybuckley

jennybuckley May 25, 2018

Contributor

Do we not plan on supporting smp on cr? or do we just plan on always allowing json patch in the gc, even if we start supporting smp on cr

This comment has been minimized.

@roycaihw

roycaihw May 26, 2018

Author Member

We dropped the plan for supporting smp on cr ref #58414 and #56606

// unavailable.
return nil, fmt.Errorf("doesn't have a local cache for %s", gvr)
// If local cache doesn't exist for mapping.Resource, send a GET request to API server
apiResource, _, err := gc.apiResource(apiVersion, kind)

This comment has been minimized.

@jennybuckley

jennybuckley May 25, 2018

Contributor

why not use mapping.Resource again? looks like it does the same thing as the beginning of this function https://github.com/roycaihw/kubernetes/blob/ebb54562d1789f0e198cad35ad34a0ae1175deb4/pkg/controller/garbagecollector/operations.go#L43-L50

This comment has been minimized.

@roycaihw

roycaihw May 29, 2018

Author Member

Nice catch! I guess I was copypasting something and didn't realize the redundant.

return
}
_, err = gc.patchObject(dependent.identity, patch, types.MergePatchType)
if err != nil {

This comment has been minimized.

@jennybuckley

jennybuckley May 25, 2018

Contributor

Why not "if err != nil && !errors.IsNotFound(err)" like at line 534?

This comment has been minimized.

@roycaihw

roycaihw May 29, 2018

Author Member

JSON merge patch cannot manipulate arrays (owner reference in this case). JSON merge patch replaces the existing array with an entire array (which we constructed in deleteOwnerRefJSONMergePatch). So we don't need to handle error if the target owner reference doesn't exist here.

}
_, err = gc.patchObject(item.identity, patch, types.MergePatchType)
return err
}

This comment has been minimized.

@caesarxuchao

caesarxuchao May 29, 2018

Member

Can you catch the "try SMP, if is fails then try JSONMergePatch" logic into a function? Something like:

type jsonMergePatchFunc func() []byte
func (gc *GarbageCollector) patch(identity objectReference, smp []byte, jmp jsonMergePatchFunc) (*Unstructured, error)

This comment has been minimized.

@roycaihw

roycaihw May 29, 2018

Author Member

Thanks for the advise! It's much more readable now. One nit is that I have to add a unnamed parameter to unblockOwnerReferencesJSONMergePatch to implement the function interface https://github.com/roycaihw/kubernetes/blob/2117b46c1ae5bb4565bbce1eb0c88ed3fb9ac568/pkg/controller/garbagecollector/patch.go#L155

mapping, err := gc.restMapper.RESTMapping(fqKind.GroupKind(), fqKind.Version)
if err != nil {
return nil, newRESTMappingError(kind, apiVersion)
}

This comment has been minimized.

@caesarxuchao

caesarxuchao May 29, 2018

Member

These can be replaced by gc.apiResource(). And we can remove the call to apiResource at line 53

return nil, err
}
// Unstructed implements metav1 interface
return resource, nil

This comment has been minimized.

@caesarxuchao

caesarxuchao May 29, 2018

Member

Why not just return gc.dynamicClient.Resource(apiResource).Namespace(namespace).Get(name, metav1.GetOptions{})

@jennybuckley

This comment has been minimized.

Copy link
Contributor

jennybuckley commented May 29, 2018

/lgtm

/test pull-kubernetes-e2e-kops-aws

@k8s-ci-robot k8s-ci-robot added the lgtm label May 29, 2018

@deads2k

This comment has been minimized.

Copy link
Contributor

deads2k commented May 30, 2018

If we could build a jsonmerge patch, why would we ever try to submit a strategic merge patch?

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented May 30, 2018

If we could build a jsonmerge patch, why would we ever try to submit a strategic merge patch?

jsonmerge must include resourceVersion to safely remove a single item from finalizers, so it is more subject to conflicts the client has to retry

@deads2k

This comment has been minimized.

Copy link
Contributor

deads2k commented May 30, 2018

If we could build a jsonmerge patch, why would we ever try to submit a strategic merge patch?

jsonmerge must include resourceVersion to safely remove a single item from finalizers, so it is more subject to conflicts the client has to retry

Realistically, how many conflicts do you think you'll get on deleted objects? Not that it isn't a concern, but it does seem rather dubious.

I don't feel strongly, just seems like its optimizing for an edge.

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented May 30, 2018

Realistically, how many conflicts do you think you'll get on deleted objects? Not that it isn't a concern, but it does seem rather dubious.

I don't feel strongly, just seems like its optimizing for an edge.

I had similar thoughts (start with jsonmerge only and client retry and measure whether it matters in practice). @lavalamp suggested SMP falling back to jsonmerge at #56348 (comment)

@caesarxuchao

This comment has been minimized.

Copy link
Member

caesarxuchao commented May 30, 2018

@liggitt @deads2k I don't know if there exists realistic workload tests to measure the number conflicts. Last time I checked, the kubemark load test only has replicasets and pods.
As a first step, maybe we can check how many conflicts there are when running the entire e2e suite.

_, err = gc.patchObject(item.identity, patch)
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, gc.deleteOwnerRefJSONMergePatch, ownerUIDs...)

This comment has been minimized.

@caesarxuchao

caesarxuchao May 30, 2018

Member

To get rid of the variadic variable, you can

type jsonMergePatchFunc func(*node) ([]byte, error)

gc.patch(item, patch, func (n *node) ([]byte, error) {
   gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
 })

This comment has been minimized.

@caesarxuchao

caesarxuchao May 30, 2018

Member

@deads2k @liggitt This patch is to an object that's not being deleted.

patch := deleteOwnerRefPatch(dependent.identity.UID, owner.UID)
_, err := gc.patchObject(dependent.identity, patch)
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, gc.deleteOwnerRefJSONMergePatch, owner.UID)

This comment has been minimized.

@caesarxuchao

caesarxuchao May 30, 2018

Member

@deads2k @liggitt This patch is to an object that's not being deleted, too.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented May 30, 2018

if err != nil {
return nil, err
}
expectedObjectMeta := ObjectMetaForPatch{}

This comment has been minimized.

@caesarxuchao

caesarxuchao May 30, 2018

Member

@deads2k @liggitt do you have better ideas to construct json merge patch? We used ObjectMetaForPatch to hold patch contents instead of using ObjectMeta directly because some optional fields in ObjectMeta is not pointer, so they are serialized as <field name>: null instead of being omitted, so the resulting json merge patch would be wrong.

This comment has been minimized.

@liggitt

liggitt May 30, 2018

Member

so they are serialized as : null instead of being omitted, so the resulting json merge patch would be wrong.

seems like you need to ensure those fields are never null, e.g. expectedOwners := []metav1.OwnerReference{}

This comment has been minimized.

@liggitt

liggitt May 30, 2018

Member

which fields were problematic? non-pointer slice fields can be set to null

This comment has been minimized.

@caesarxuchao

caesarxuchao May 30, 2018

Member

Actually only one problematic field: CreationTimestamp.

	meta := v1.ObjectMeta{}
	meta.OwnerReferences = []v1.OwnerReference{
		{UID: "freaf"},
	}
	meta.ResourceVersion = "1234"
	data, err := json.Marshal(meta)
	if err != nil {
		panic(err)
	}
	fmt.Printf("data=%s\n", data)

Output is

data={"resourceVersion":"1234","creationTimestamp":null,"ownerReferences":[{"apiVersion":"","kind":"","name":"","uid":"freaf"}]}

Maybe we can manually remove the creationTimestamp part, and add a unit test to make sure the patch only contains resourceVersion and ownerReferences.

This comment has been minimized.

@liggitt

liggitt May 30, 2018

Member

I'd probably use the unstructured object like this:

u := &unstructured.Unstructured{Object: map[string]interface{}{}}
u.SetResourceVersion(...)
u.SetOwnerReferences(...)
data, err := json.Marshal(u)

This comment has been minimized.

@liggitt

liggitt May 31, 2018

Member

The code mentioned above doesn’t use that reflective conversion path. Please don’t use Sprintf to form structured patches that contain any user controlled data.

This comment has been minimized.

@roycaihw

roycaihw May 31, 2018

Author Member

SetOwnerReferences uses the conversion path

func (u *Unstructured) SetOwnerReferences(references []metav1.OwnerReference) {
newReferences := make([]interface{}, 0, len(references))
for _, reference := range references {
out, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&reference)

Is the concern about using Sprintf that we may change the json field name (ownerReferences) in future?

This comment has been minimized.

@liggitt

liggitt May 31, 2018

Member

Ah. I was considering the marshaling, not the SetOwnerReferences call. I'd benchmark the difference before doing anything too complicated in the code over that performance concern.

The concern about sprintf is that data won't be escaped or formatted properly. Sprintf isn't a proper tool for marshaling/escaping structured data. I'd take a single-use typed struct like the current state of the PR over Sprintf

This comment has been minimized.

@roycaihw

roycaihw May 31, 2018

Author Member

Did a benchmark for marshaling the patch but it's hard to tell how it will affect GC performance:

func BenchmarkReflective(b *testing.B) {
	owners := []v1.OwnerReference{
		{UID: "freaf"},
	}
	resourceVersion := "1234"
	for i := 0; i < b.N; i++ {
		u := &unstructured.Unstructured{Object: map[string]interface{}{}}
		u.SetResourceVersion(resourceVersion)
		u.SetOwnerReferences(owners)
		_, err := json.Marshal(u)
		if err != nil {
			panic(err)
		}
	}
}

func BenchmarkSprintf(b *testing.B) {
	owners := []v1.OwnerReference{
		{UID: "freaf"},
	}
	resourceVersion := "1234"
	for i := 0; i < b.N; i++ {
		expectedOwnersJSON, err := json.Marshal(owners)
		if err != nil {
			panic(err)
		}
		_ = []byte(fmt.Sprintf(`{"metadata":{"resourceVersion":"%s","ownerReferences":%s}}`, resourceVersion, expectedOwnersJSON))
	}
}

Result: 
BenchmarkReflective-12            200000              7368 ns/op
BenchmarkSprintf-12              2000000               894 ns/op

Withdrew the last commit to go with the single-use typed struct

This comment has been minimized.

@roycaihw

roycaihw May 31, 2018

Author Member

@liggitt @lavalamp I still prefer using the single-user typed ObjectMetaForPatch struct. If the current implementation looks good I can squash.

Some alternatives:

  1. Use json patch instead of jsonmerge patch. The benefit is that the patch size will be smaller when we delete some owner references. The json patch only contains the owner references we want to delete, instead of the entire owner references array in json merge patch. For unblocking owner references json patch still has to include the entire array. If we want to avoid allocation, we can use Sprintf to construct the json patch. The downside is still that Sprintf isn't a proper tool for marshaling/escaping structured data. And json patch may require more careful implementation & testing regarding deleting element in array by indexes.

  2. Allocate a map[string]interface{} and marshal it e.g.:

metadata := map[string]interface{} {
    "metadata": map[string]interface{} {
        "resourceVersion": string{resourceVersion},
        "ownerReferences": []metav1.OwnerReference {owners},
    },
}
patch, err := json.Marshal(metadata)
@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented May 30, 2018

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jun 6, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 6, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, jennybuckley, lavalamp, roycaihw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jun 6, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jun 6, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jun 6, 2018

I added some labels. If this misses the release, we will cherrypick. I think it's important that GC works on CRs.

@jennybuckley

This comment has been minimized.

Copy link
Contributor

jennybuckley commented Jun 6, 2018

/priority critical-urgent

@jennybuckley

This comment has been minimized.

Copy link
Contributor

jennybuckley commented Jun 6, 2018

/remove-priority important-soon

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jun 6, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@caesarxuchao @jennybuckley @roycaihw

Pull Request Labels
  • sig/api-machinery: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help
@jennybuckley

This comment has been minimized.

Copy link
Contributor

jennybuckley commented Jun 6, 2018

/retest

2 similar comments
@roycaihw

This comment has been minimized.

Copy link
Member Author

roycaihw commented Jun 6, 2018

/retest

@roycaihw

This comment has been minimized.

Copy link
Member Author

roycaihw commented Jun 6, 2018

/retest

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jun 7, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@dims

This comment has been minimized.

Copy link
Member

dims commented Jun 7, 2018

/test pull-kubernetes-node-e2e

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 7, 2018

@roycaihw: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-node-e2e 29d72a7 link /test pull-kubernetes-node-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jun 7, 2018

Automatic merge from submit-queue (batch tested with PRs 63386, 64624, 62297, 64847). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit ccb9590 into kubernetes:master Jun 7, 2018

16 of 18 checks passed

pull-kubernetes-node-e2e Job failed.
Details
Submit Queue Required Github CI test is not green: pull-kubernetes-node-e2e
Details
cla/linuxfoundation caesarxuchao authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.