-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace gets conflict error from watch cache #41892
Comments
I've seen this multiple times as well. It is coming from here: kubernetes/hack/make-rules/test-cmd-util.sh Lines 669 to 678 in 7a8c467
|
I'm not sure what the expectation is with this particular part of the test suite. It creates a node, then immediately tries to run |
kubectl replace is effectively a "stomp" (all PUTs are stomps from the perspective of the user). I don't think as a user I expect kubectl replace to fail on conflict up to some reasonable retry limit. |
I think this might be happening because the |
@ncdc Assigning to you since you are investigating. |
cc @ymqytw |
I've yet to reproduce this locally but I think my analysis is probably correct. |
If you want an unconditional update (retry on conflict), then I think you want something different. Why don't we just create something different to replace on? |
from a client's perspective, kubernetes/hack/make-rules/test-cmd-util.sh Lines 669 to 678 in 7a8c467
|
There is |
@liggitt I'm not clear on the code path that results in the conflict error surfacing. I tried looking at the rest code that handles updates but it wasn't the easiest to follow. |
Yeah, that's where I was looking. I am confused about what happens w.r.t. resource version for the updated object (the one supplied to |
@wojtek-t I think this is the issue: The caching storage implements guaranteedupdate and uses the object in the watch cache as the current object |
I think I'd expect it to do that, but on a conflict failure, I'd expect it to retry live. Seems like most of the time, the cache would save us a get. |
@deads2k Can I mark this as non-release-blocker to reduce to noise for the release managers? |
@pwittrock I'd say yes |
This issue is likely related to a recent failure: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/3796/ |
Potential fix #43152 |
At least for the replace issue |
after further investigation, this doesn't affect unconditional replace kubectl turns unconditional replaces into conditional ones by pre-fetching the existing object and filling in a resource version. given that behavior of kubectl, the test script is vulnerable to legitimate version conflicts. however, the watch cache issue still exposes clients to illegitimate version conflicts:
the referenced test-cmd test can flake for the following reasons:
|
Automatic merge from submit-queue Retry kubectl test replace on conflict Since kubectl is doing a resource-version-constrained replace, it is subject to conflicts on a contentious resource (like a node managed by the node controller) Fixes #41892 (the specific flake, not the watch cache issue)
reopening to track server-side issue, moving to 1.6.1 |
still seen in kubectl test in https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/43489/pull-kubernetes-unit/23566/ |
…conflict Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. etcd3 store: retry with live object on conflict if there was a suggestion Retry with a live object instead of the cached version if the watch cache receives a conflict trying to do the update. Fixes #41892
I0222 06:34:25.223] node "node-v1-test" created
W0222 06:34:25.446] Error from server (Conflict): error when replacing "STDIN": Operation cannot be fulfilled on nodes "node-v1-test": the object has been modified; please apply your changes to the latest version and try again
W0222 06:34:25.453] !!! [0222 06:34:25] Call tree:
W0222 06:34:25.455] !!! [0222 06:34:25] 1: /go/src/k8s.io/kubernetes/hack/make-rules/test-cmd-util.sh:2853 run_pod_tests(...)
W0222 06:34:25.458] !!! [0222 06:34:25] 2: hack/make-rules/test-cmd.sh:142 runTests(...)
I0222 06:34:25.759] +++ [0222 06:34:25] Clean up complete
@kubernetes/sig-cli-api-reviews
The text was updated successfully, but these errors were encountered: