Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let the garbage collector use json merge patch when SMP is not supported #63386

Merged
merged 1 commit into from
Jun 7, 2018

Conversation

roycaihw
Copy link
Member

@roycaihw roycaihw commented May 3, 2018

What this PR does / why we need it:
Let garbage collector fallback to use json merge patch when strategic merge patch returns 415. This enables orphan delete on custom resources.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #56348

Special notes for your reviewer:
This PR is developed based on #56595. Ref #56606 for more information.

Release note:

Orphan delete is now supported for custom resources

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 3, 2018
@roycaihw roycaihw force-pushed the gc-json-patch branch 3 times, most recently from bb0424d to dbaff2b Compare May 3, 2018 17:38
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2018
@roycaihw
Copy link
Member Author

/retest

@@ -39,21 +39,27 @@ func deleteOwnerRefStrategicMergePatch(dependentUID types.UID, ownerUIDs ...type
return []byte(patch)
}

// TODO: remove this function when we can use strategic merge patch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not plan on supporting smp on cr? or do we just plan on always allowing json patch in the gc, even if we start supporting smp on cr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dropped the plan for supporting smp on cr ref #58414 and #56606

// unavailable.
return nil, fmt.Errorf("doesn't have a local cache for %s", gvr)
// If local cache doesn't exist for mapping.Resource, send a GET request to API server
apiResource, _, err := gc.apiResource(apiVersion, kind)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use mapping.Resource again? looks like it does the same thing as the beginning of this function https://github.com/roycaihw/kubernetes/blob/ebb54562d1789f0e198cad35ad34a0ae1175deb4/pkg/controller/garbagecollector/operations.go#L43-L50

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I guess I was copypasting something and didn't realize the redundant.

return
}
_, err = gc.patchObject(dependent.identity, patch, types.MergePatchType)
if err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not "if err != nil && !errors.IsNotFound(err)" like at line 534?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON merge patch cannot manipulate arrays (owner reference in this case). JSON merge patch replaces the existing array with an entire array (which we constructed in deleteOwnerRefJSONMergePatch). So we don't need to handle error if the target owner reference doesn't exist here.

}
_, err = gc.patchObject(item.identity, patch, types.MergePatchType)
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you catch the "try SMP, if is fails then try JSONMergePatch" logic into a function? Something like:

type jsonMergePatchFunc func() []byte
func (gc *GarbageCollector) patch(identity objectReference, smp []byte, jmp jsonMergePatchFunc) (*Unstructured, error)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advise! It's much more readable now. One nit is that I have to add a unnamed parameter to unblockOwnerReferencesJSONMergePatch to implement the function interface https://github.com/roycaihw/kubernetes/blob/2117b46c1ae5bb4565bbce1eb0c88ed3fb9ac568/pkg/controller/garbagecollector/patch.go#L155

mapping, err := gc.restMapper.RESTMapping(fqKind.GroupKind(), fqKind.Version)
if err != nil {
return nil, newRESTMappingError(kind, apiVersion)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be replaced by gc.apiResource(). And we can remove the call to apiResource at line 53

return nil, err
}
// Unstructed implements metav1 interface
return resource, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return gc.dynamicClient.Resource(apiResource).Namespace(namespace).Get(name, metav1.GetOptions{})

@jennybuckley
Copy link

/lgtm

/test pull-kubernetes-e2e-kops-aws

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2018
@deads2k
Copy link
Contributor

deads2k commented May 30, 2018

If we could build a jsonmerge patch, why would we ever try to submit a strategic merge patch?

@liggitt
Copy link
Member

liggitt commented May 30, 2018

If we could build a jsonmerge patch, why would we ever try to submit a strategic merge patch?

jsonmerge must include resourceVersion to safely remove a single item from finalizers, so it is more subject to conflicts the client has to retry

@deads2k
Copy link
Contributor

deads2k commented May 30, 2018

If we could build a jsonmerge patch, why would we ever try to submit a strategic merge patch?

jsonmerge must include resourceVersion to safely remove a single item from finalizers, so it is more subject to conflicts the client has to retry

Realistically, how many conflicts do you think you'll get on deleted objects? Not that it isn't a concern, but it does seem rather dubious.

I don't feel strongly, just seems like its optimizing for an edge.

@liggitt
Copy link
Member

liggitt commented May 30, 2018

Realistically, how many conflicts do you think you'll get on deleted objects? Not that it isn't a concern, but it does seem rather dubious.

I don't feel strongly, just seems like its optimizing for an edge.

I had similar thoughts (start with jsonmerge only and client retry and measure whether it matters in practice). @lavalamp suggested SMP falling back to jsonmerge at #56348 (comment)

@caesarxuchao
Copy link
Member

@liggitt @deads2k I don't know if there exists realistic workload tests to measure the number conflicts. Last time I checked, the kubemark load test only has replicasets and pods.
As a first step, maybe we can check how many conflicts there are when running the entire e2e suite.

_, err = gc.patchObject(item.identity, patch)
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, gc.deleteOwnerRefJSONMergePatch, ownerUIDs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get rid of the variadic variable, you can

type jsonMergePatchFunc func(*node) ([]byte, error)

gc.patch(item, patch, func (n *node) ([]byte, error) {
   gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
 })

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k @liggitt This patch is to an object that's not being deleted.

patch := deleteOwnerRefPatch(dependent.identity.UID, owner.UID)
_, err := gc.patchObject(dependent.identity, patch)
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, gc.deleteOwnerRefJSONMergePatch, owner.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k @liggitt This patch is to an object that's not being deleted, too.

@lavalamp
Copy link
Member

lavalamp commented May 30, 2018 via email

if err != nil {
return nil, err
}
expectedObjectMeta := ObjectMetaForPatch{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k @liggitt do you have better ideas to construct json merge patch? We used ObjectMetaForPatch to hold patch contents instead of using ObjectMeta directly because some optional fields in ObjectMeta is not pointer, so they are serialized as <field name>: null instead of being omitted, so the resulting json merge patch would be wrong.

Copy link
Member

@liggitt liggitt May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so they are serialized as : null instead of being omitted, so the resulting json merge patch would be wrong.

seems like you need to ensure those fields are never null, e.g. expectedOwners := []metav1.OwnerReference{}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which fields were problematic? non-pointer slice fields can be set to null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually only one problematic field: CreationTimestamp.

	meta := v1.ObjectMeta{}
	meta.OwnerReferences = []v1.OwnerReference{
		{UID: "freaf"},
	}
	meta.ResourceVersion = "1234"
	data, err := json.Marshal(meta)
	if err != nil {
		panic(err)
	}
	fmt.Printf("data=%s\n", data)

Output is

data={"resourceVersion":"1234","creationTimestamp":null,"ownerReferences":[{"apiVersion":"","kind":"","name":"","uid":"freaf"}]}

Maybe we can manually remove the creationTimestamp part, and add a unit test to make sure the patch only contains resourceVersion and ownerReferences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably use the unstructured object like this:

u := &unstructured.Unstructured{Object: map[string]interface{}{}}
u.SetResourceVersion(...)
u.SetOwnerReferences(...)
data, err := json.Marshal(u)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code mentioned above doesn’t use that reflective conversion path. Please don’t use Sprintf to form structured patches that contain any user controlled data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetOwnerReferences uses the conversion path

func (u *Unstructured) SetOwnerReferences(references []metav1.OwnerReference) {
newReferences := make([]interface{}, 0, len(references))
for _, reference := range references {
out, err := runtime.DefaultUnstructuredConverter.ToUnstructured(&reference)

Is the concern about using Sprintf that we may change the json field name (ownerReferences) in future?

Copy link
Member

@liggitt liggitt May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I was considering the marshaling, not the SetOwnerReferences call. I'd benchmark the difference before doing anything too complicated in the code over that performance concern.

The concern about sprintf is that data won't be escaped or formatted properly. Sprintf isn't a proper tool for marshaling/escaping structured data. I'd take a single-use typed struct like the current state of the PR over Sprintf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a benchmark for marshaling the patch but it's hard to tell how it will affect GC performance:

func BenchmarkReflective(b *testing.B) {
	owners := []v1.OwnerReference{
		{UID: "freaf"},
	}
	resourceVersion := "1234"
	for i := 0; i < b.N; i++ {
		u := &unstructured.Unstructured{Object: map[string]interface{}{}}
		u.SetResourceVersion(resourceVersion)
		u.SetOwnerReferences(owners)
		_, err := json.Marshal(u)
		if err != nil {
			panic(err)
		}
	}
}

func BenchmarkSprintf(b *testing.B) {
	owners := []v1.OwnerReference{
		{UID: "freaf"},
	}
	resourceVersion := "1234"
	for i := 0; i < b.N; i++ {
		expectedOwnersJSON, err := json.Marshal(owners)
		if err != nil {
			panic(err)
		}
		_ = []byte(fmt.Sprintf(`{"metadata":{"resourceVersion":"%s","ownerReferences":%s}}`, resourceVersion, expectedOwnersJSON))
	}
}

Result: 
BenchmarkReflective-12            200000              7368 ns/op
BenchmarkSprintf-12              2000000               894 ns/op

Withdrew the last commit to go with the single-use typed struct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt @lavalamp I still prefer using the single-user typed ObjectMetaForPatch struct. If the current implementation looks good I can squash.

Some alternatives:

  1. Use json patch instead of jsonmerge patch. The benefit is that the patch size will be smaller when we delete some owner references. The json patch only contains the owner references we want to delete, instead of the entire owner references array in json merge patch. For unblocking owner references json patch still has to include the entire array. If we want to avoid allocation, we can use Sprintf to construct the json patch. The downside is still that Sprintf isn't a proper tool for marshaling/escaping structured data. And json patch may require more careful implementation & testing regarding deleting element in array by indexes.

  2. Allocate a map[string]interface{} and marshal it e.g.:

metadata := map[string]interface{} {
    "metadata": map[string]interface{} {
        "resourceVersion": string{resourceVersion},
        "ownerReferences": []metav1.OwnerReference {owners},
    },
}
patch, err := json.Marshal(metadata)

@lavalamp
Copy link
Member

lavalamp commented May 30, 2018 via email

@lavalamp
Copy link
Member

lavalamp commented Jun 6, 2018 via email

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, jennybuckley, lavalamp, roycaihw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@lavalamp lavalamp added this to the v1.11 milestone Jun 6, 2018
@lavalamp lavalamp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/bug Categorizes issue or PR as related to a bug. and removed milestone/incomplete-labels labels Jun 6, 2018
@lavalamp
Copy link
Member

lavalamp commented Jun 6, 2018

I added some labels. If this misses the release, we will cherrypick. I think it's important that GC works on CRs.

@jennybuckley
Copy link

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 6, 2018
@jennybuckley
Copy link

/remove-priority important-soon

@k8s-ci-robot k8s-ci-robot removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 6, 2018
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@caesarxuchao @jennybuckley @roycaihw

Pull Request Labels
  • sig/api-machinery: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@jennybuckley
Copy link

/retest

2 similar comments
@roycaihw
Copy link
Member Author

roycaihw commented Jun 6, 2018

/retest

@roycaihw
Copy link
Member Author

roycaihw commented Jun 6, 2018

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@dims
Copy link
Member

dims commented Jun 7, 2018

/test pull-kubernetes-node-e2e

@k8s-ci-robot
Copy link
Contributor

@roycaihw: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-node-e2e 29d72a7 link /test pull-kubernetes-node-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 63386, 64624, 62297, 64847). If you want to cherry-pick this change to another branch, please follow the instructions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CR is stuck with "orphan" finalizer
10 participants