add ability to specify base resourceVersion for three-way server-side patch #63104

lavalamp · 2018-04-24T22:30:15Z

The PATCH handler has some ... interesting behavior. From the comments:

	// first time through,
	// 1. apply the patch
	// 2. save the original and patched to detect whether there were conflicting changes on retries

	// on a conflict (which is the only reason to have more than one attempt),
	// 1. build a strategic merge patch from originalJS and the patchedJS.  Different patch types can
	//    be specified, but a strategic merge patch should be expressive enough handle them.  Build the
	//    patch with this type to handle those cases.
	// 2. build a strategic merge patch from originalJS and the currentJS
	// 3. ensure no conflicts between the two patches
	// 4. apply the #1 patch to the currentJS object

I think there are three modes of operation which it potentially makes sense for patch to have:

User specifies RV; patch makes one try. Fail if RV no longer matches. (This makes sense if the user's patch depends on some part of the object in an unpredictable way.)
User does not specify RV; patch makes multiple tries, fails if the patch stops applying cleanly. (User is opting out of any locking--'.spec.replicas' should be X, I don't care what anyone else said.)
User specifies a base object and a patch; patch interprets the patch changes as having an implicit "from" precondition for every change. It retries as long as the patch with the implied preconditions continues to apply cleanly. (User wants concurrency-safe application and wants the server to do the retry loop. And the patch is predictable.)

~~Unfortunately the code today does NONE of those sensible things, and is instead a conflation of 2 and 3 above.~~ edit: 1 and 2 were addressed by #63146

If a user tries case 1 today, we think that it still does 5 retries, even though if the first one has a conflict it is certain that the others will, too.

If a user today doesn't specify an RV, today's retry loop does a bunch of conflict detection that is pretty pointless: the user didn't specify any relation to the RV, therefore using the RV when the request first came in as a "base object" is an arbitrary choice. If a conflict is generated, user is just going to retry with the exact same patch.

And if a user wants behavior 3, there is no way to get it today. I personally recall 3 as being the originally advertised behavior of PATCH but this doesn't seem to be a universal memory.

So, I propose changes:

~~If the user specifies a RV in the patch, do not do any retries. This is backwards compatible because they all would have failed anyway.~~ done in collapse patch conflict retry onto GuaranteedUpdate #63146
If the user doesn't specify an RV, don't do the conflict detection mentioned in the comments, just retry with the original patch. I think this is backwards compatible because all the failures this causes are actually spurious. done in collapse patch conflict retry onto GuaranteedUpdate #63146
Add a mechanism for the user to deliberately invoke behavior 3. This would be a "new" feature.

The text was updated successfully, but these errors were encountered:

lavalamp · 2018-04-24T22:31:46Z

People who might be interested: @liggitt @wojtek-t @bgrant0607 @apelisse

wojtek-t · 2018-04-25T08:03:28Z

@lavalamp one question that I have is:
"how user can specify RV in the patch"

As part of the request, you're specifying only "patch that should be applied". Not really how the initial object looked like (and on what version it has). So I'm not really sure how that is possible.
[I agree that if it is possible, not retrying make sense here]

liggitt · 2018-04-25T14:22:03Z

"how user can specify RV in the patch"

like this: {"metadata":{"resourceVersion":"10"},"foo":"bar"}

/assign

wojtek-t · 2018-04-25T14:28:49Z

Wait, but that means that we want to overwrite RV to 10, isn't it?

liggitt · 2018-04-25T14:31:55Z

that takes whatever the object coming from etcd is, sets resourceVersion=10, and foo=bar, then tries to store it. that lets you set the resourceVersion=10 precondition, which gives you normal resourceVersion semantics when it tries to persist

wojtek-t · 2018-04-25T14:37:44Z

aah ok - I forgot that that object with rv=10 is effectively a precondition on that version.
Thanks for explanation.

@lavalamp

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. collapse patch conflict retry onto GuaranteedUpdate xref #63104 This PR builds on #62868 1. When the incoming patch specified a resourceVersion that failed as a precondition, the patch handler would retry uselessly 5 times. This PR collapses onto GuaranteedUpdate, which immediately stops retrying in that case. 2. When the incoming patch did not specify a resourceVersion, and persisting to etcd contended with other etcd updates, the retry would try to detect patch conflicts with deltas from the first 'current object' retrieved from etcd and fail with a conflict error in that case. Given that the user did not provide any information about the starting version they expected their patch to apply to, this does not make sense, and results in arbitrary conflict errors, depending on when the patch was submitted relative to other changes made to the resource. This PR changes the patch application to be performed on the object retrieved from etcd identically on every attempt. fixes #58017 SMP is no longer computed for CRD objects fixes #42644 No special state is retained on the first attempt, so the patch handler correctly handles the cached storage optimistically trying with a cached object first /assign @lavalamp ```release-note fixed spurious "unable to find api field" errors patching custom resources ```

@lavalamp

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. collapse patch conflict retry onto GuaranteedUpdate xref kubernetes/kubernetes#63104 This PR builds on kubernetes/kubernetes#62868 1. When the incoming patch specified a resourceVersion that failed as a precondition, the patch handler would retry uselessly 5 times. This PR collapses onto GuaranteedUpdate, which immediately stops retrying in that case. 2. When the incoming patch did not specify a resourceVersion, and persisting to etcd contended with other etcd updates, the retry would try to detect patch conflicts with deltas from the first 'current object' retrieved from etcd and fail with a conflict error in that case. Given that the user did not provide any information about the starting version they expected their patch to apply to, this does not make sense, and results in arbitrary conflict errors, depending on when the patch was submitted relative to other changes made to the resource. This PR changes the patch application to be performed on the object retrieved from etcd identically on every attempt. fixes #58017 SMP is no longer computed for CRD objects fixes #42644 No special state is retained on the first attempt, so the patch handler correctly handles the cached storage optimistically trying with a cached object first /assign @lavalamp ```release-note fixed spurious "unable to find api field" errors patching custom resources ``` Kubernetes-commit: 6aad80cce3cc429f04e22238ce9be13574c61cd4

@lavalamp

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. collapse patch conflict retry onto GuaranteedUpdate xref kubernetes/kubernetes#63104 This PR builds on kubernetes/kubernetes#62868 1. When the incoming patch specified a resourceVersion that failed as a precondition, the patch handler would retry uselessly 5 times. This PR collapses onto GuaranteedUpdate, which immediately stops retrying in that case. 2. When the incoming patch did not specify a resourceVersion, and persisting to etcd contended with other etcd updates, the retry would try to detect patch conflicts with deltas from the first 'current object' retrieved from etcd and fail with a conflict error in that case. Given that the user did not provide any information about the starting version they expected their patch to apply to, this does not make sense, and results in arbitrary conflict errors, depending on when the patch was submitted relative to other changes made to the resource. This PR changes the patch application to be performed on the object retrieved from etcd identically on every attempt. fixes #58017 SMP is no longer computed for CRD objects fixes #42644 No special state is retained on the first attempt, so the patch handler correctly handles the cached storage optimistically trying with a cached object first /assign @lavalamp ```release-note fixed spurious "unable to find api field" errors patching custom resources ``` Kubernetes-commit: 6aad80cce3cc429f04e22238ce9be13574c61cd4

fejta-bot · 2018-08-02T19:06:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

nikhita · 2018-08-10T14:38:05Z

/remove-lifecycle stale

lavalamp added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Apr 24, 2018

lavalamp mentioned this issue Apr 24, 2018

Refactor patch #62868

Merged

liggitt mentioned this issue Apr 25, 2018

collapse patch conflict retry onto GuaranteedUpdate #63146

Merged

k8s-ci-robot assigned liggitt Apr 25, 2018

liggitt added the kind/feature Categorizes issue or PR as related to a new feature. label May 4, 2018

liggitt removed their assignment May 4, 2018

liggitt changed the title ~~PATCH behavior~~ add ability to specify base resourceVersion for three-way server-side patch May 4, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2018

bgrant0607 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ability to specify base resourceVersion for three-way server-side patch #63104

add ability to specify base resourceVersion for three-way server-side patch #63104

lavalamp commented Apr 24, 2018 •

edited by liggitt

lavalamp commented Apr 24, 2018

wojtek-t commented Apr 25, 2018

liggitt commented Apr 25, 2018

wojtek-t commented Apr 25, 2018

liggitt commented Apr 25, 2018 •

edited

wojtek-t commented Apr 25, 2018

fejta-bot commented Aug 2, 2018

nikhita commented Aug 10, 2018

add ability to specify base resourceVersion for three-way server-side patch #63104

add ability to specify base resourceVersion for three-way server-side patch #63104

Comments

lavalamp commented Apr 24, 2018 • edited by liggitt

lavalamp commented Apr 24, 2018

wojtek-t commented Apr 25, 2018

liggitt commented Apr 25, 2018

wojtek-t commented Apr 25, 2018

liggitt commented Apr 25, 2018 • edited

wojtek-t commented Apr 25, 2018

fejta-bot commented Aug 2, 2018

nikhita commented Aug 10, 2018

lavalamp commented Apr 24, 2018 •

edited by liggitt

liggitt commented Apr 25, 2018 •

edited