Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ability to specify base resourceVersion for three-way server-side patch #63104

Open
lavalamp opened this issue Apr 24, 2018 · 8 comments
Open
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@lavalamp
Copy link
Member

lavalamp commented Apr 24, 2018

The PATCH handler has some ... interesting behavior. From the comments:

	// first time through,
	// 1. apply the patch
	// 2. save the original and patched to detect whether there were conflicting changes on retries

	// on a conflict (which is the only reason to have more than one attempt),
	// 1. build a strategic merge patch from originalJS and the patchedJS.  Different patch types can
	//    be specified, but a strategic merge patch should be expressive enough handle them.  Build the
	//    patch with this type to handle those cases.
	// 2. build a strategic merge patch from originalJS and the currentJS
	// 3. ensure no conflicts between the two patches
	// 4. apply the #1 patch to the currentJS object

I think there are three modes of operation which it potentially makes sense for patch to have:

  1. User specifies RV; patch makes one try. Fail if RV no longer matches. (This makes sense if the user's patch depends on some part of the object in an unpredictable way.)
  2. User does not specify RV; patch makes multiple tries, fails if the patch stops applying cleanly. (User is opting out of any locking--'.spec.replicas' should be X, I don't care what anyone else said.)
  3. User specifies a base object and a patch; patch interprets the patch changes as having an implicit "from" precondition for every change. It retries as long as the patch with the implied preconditions continues to apply cleanly. (User wants concurrency-safe application and wants the server to do the retry loop. And the patch is predictable.)

Unfortunately the code today does NONE of those sensible things, and is instead a conflation of 2 and 3 above. edit: 1 and 2 were addressed by #63146

If a user tries case 1 today, we think that it still does 5 retries, even though if the first one has a conflict it is certain that the others will, too.

If a user today doesn't specify an RV, today's retry loop does a bunch of conflict detection that is pretty pointless: the user didn't specify any relation to the RV, therefore using the RV when the request first came in as a "base object" is an arbitrary choice. If a conflict is generated, user is just going to retry with the exact same patch.

And if a user wants behavior 3, there is no way to get it today. I personally recall 3 as being the originally advertised behavior of PATCH but this doesn't seem to be a universal memory.

So, I propose changes:

  1. If the user specifies a RV in the patch, do not do any retries. This is backwards compatible because they all would have failed anyway. done in collapse patch conflict retry onto GuaranteedUpdate #63146
  2. If the user doesn't specify an RV, don't do the conflict detection mentioned in the comments, just retry with the original patch. I think this is backwards compatible because all the failures this causes are actually spurious. done in collapse patch conflict retry onto GuaranteedUpdate #63146
  3. Add a mechanism for the user to deliberately invoke behavior 3. This would be a "new" feature.
@lavalamp lavalamp added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Apr 24, 2018
@lavalamp
Copy link
Member Author

People who might be interested: @liggitt @wojtek-t @bgrant0607 @apelisse

@wojtek-t
Copy link
Member

@lavalamp one question that I have is:
"how user can specify RV in the patch"

As part of the request, you're specifying only "patch that should be applied". Not really how the initial object looked like (and on what version it has). So I'm not really sure how that is possible.
[I agree that if it is possible, not retrying make sense here]

@liggitt
Copy link
Member

liggitt commented Apr 25, 2018

"how user can specify RV in the patch"

like this: {"metadata":{"resourceVersion":"10"},"foo":"bar"}

/assign

@wojtek-t
Copy link
Member

Wait, but that means that we want to overwrite RV to 10, isn't it?

@liggitt
Copy link
Member

liggitt commented Apr 25, 2018

that takes whatever the object coming from etcd is, sets resourceVersion=10, and foo=bar, then tries to store it. that lets you set the resourceVersion=10 precondition, which gives you normal resourceVersion semantics when it tries to persist

@wojtek-t
Copy link
Member

aah ok - I forgot that that object with rv=10 is effectively a precondition on that version.
Thanks for explanation.

k8s-github-robot pushed a commit that referenced this issue Apr 26, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

collapse patch conflict retry onto GuaranteedUpdate

xref #63104

This PR builds on #62868

1. When the incoming patch specified a resourceVersion that failed as a precondition, the patch handler would retry uselessly 5 times. This PR collapses onto GuaranteedUpdate, which immediately stops retrying in that case.

2. When the incoming patch did not specify a resourceVersion, and persisting to etcd contended with other etcd updates, the retry would try to detect patch conflicts with deltas from the first 'current object' retrieved from etcd and fail with a conflict error in that case. Given that the user did not provide any information about the starting version they expected their patch to apply to, this does not make sense, and results in arbitrary conflict errors, depending on when the patch was submitted relative to other changes made to the resource. This PR changes the patch application to be performed on the object retrieved from etcd identically on every attempt.

fixes #58017
SMP is no longer computed for CRD objects

fixes #42644
No special state is retained on the first attempt, so the patch handler correctly handles the cached storage optimistically trying with a cached object first

/assign @lavalamp

```release-note
fixed spurious "unable to find api field" errors patching custom resources
```
k8s-publishing-bot added a commit to kubernetes/apiserver that referenced this issue Apr 27, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

collapse patch conflict retry onto GuaranteedUpdate

xref kubernetes/kubernetes#63104

This PR builds on kubernetes/kubernetes#62868

1. When the incoming patch specified a resourceVersion that failed as a precondition, the patch handler would retry uselessly 5 times. This PR collapses onto GuaranteedUpdate, which immediately stops retrying in that case.

2. When the incoming patch did not specify a resourceVersion, and persisting to etcd contended with other etcd updates, the retry would try to detect patch conflicts with deltas from the first 'current object' retrieved from etcd and fail with a conflict error in that case. Given that the user did not provide any information about the starting version they expected their patch to apply to, this does not make sense, and results in arbitrary conflict errors, depending on when the patch was submitted relative to other changes made to the resource. This PR changes the patch application to be performed on the object retrieved from etcd identically on every attempt.

fixes #58017
SMP is no longer computed for CRD objects

fixes #42644
No special state is retained on the first attempt, so the patch handler correctly handles the cached storage optimistically trying with a cached object first

/assign @lavalamp

```release-note
fixed spurious "unable to find api field" errors patching custom resources
```

Kubernetes-commit: 6aad80cce3cc429f04e22238ce9be13574c61cd4
k8s-publishing-bot added a commit to kubernetes/apiextensions-apiserver that referenced this issue Apr 27, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

collapse patch conflict retry onto GuaranteedUpdate

xref kubernetes/kubernetes#63104

This PR builds on kubernetes/kubernetes#62868

1. When the incoming patch specified a resourceVersion that failed as a precondition, the patch handler would retry uselessly 5 times. This PR collapses onto GuaranteedUpdate, which immediately stops retrying in that case.

2. When the incoming patch did not specify a resourceVersion, and persisting to etcd contended with other etcd updates, the retry would try to detect patch conflicts with deltas from the first 'current object' retrieved from etcd and fail with a conflict error in that case. Given that the user did not provide any information about the starting version they expected their patch to apply to, this does not make sense, and results in arbitrary conflict errors, depending on when the patch was submitted relative to other changes made to the resource. This PR changes the patch application to be performed on the object retrieved from etcd identically on every attempt.

fixes #58017
SMP is no longer computed for CRD objects

fixes #42644
No special state is retained on the first attempt, so the patch handler correctly handles the cached storage optimistically trying with a cached object first

/assign @lavalamp

```release-note
fixed spurious "unable to find api field" errors patching custom resources
```

Kubernetes-commit: 6aad80cce3cc429f04e22238ce9be13574c61cd4
@liggitt liggitt added the kind/feature Categorizes issue or PR as related to a new feature. label May 4, 2018
@liggitt liggitt removed their assignment May 4, 2018
@liggitt liggitt changed the title PATCH behavior add ability to specify base resourceVersion for three-way server-side patch May 4, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2018
@nikhita
Copy link
Member

nikhita commented Aug 10, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2018
@bgrant0607 bgrant0607 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

7 participants