Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace gets conflict error from watch cache #41892

Closed
deads2k opened this issue Feb 22, 2017 · 34 comments
Closed

replace gets conflict error from watch cache #41892

deads2k opened this issue Feb 22, 2017 · 34 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@deads2k
Copy link
Contributor

deads2k commented Feb 22, 2017

I0222 06:34:25.223] node "node-v1-test" created
W0222 06:34:25.446] Error from server (Conflict): error when replacing "STDIN": Operation cannot be fulfilled on nodes "node-v1-test": the object has been modified; please apply your changes to the latest version and try again
W0222 06:34:25.453] !!! [0222 06:34:25] Call tree:
W0222 06:34:25.455] !!! [0222 06:34:25] 1: /go/src/k8s.io/kubernetes/hack/make-rules/test-cmd-util.sh:2853 run_pod_tests(...)
W0222 06:34:25.458] !!! [0222 06:34:25] 2: hack/make-rules/test-cmd.sh:142 runTests(...)
I0222 06:34:25.759] +++ [0222 06:34:25] Clean up complete

@kubernetes/sig-cli-api-reviews

@deads2k deads2k added kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Feb 22, 2017
@deads2k
Copy link
Contributor Author

deads2k commented Feb 22, 2017

@ncdc
Copy link
Member

ncdc commented Feb 24, 2017

I've seen this multiple times as well. It is coming from here:

kubectl replace -f - "${kube_flags[@]}" << __EOF__
{
"kind": "Node",
"apiVersion": "v1",
"metadata": {
"name": "node-v1-test",
"annotations": {"a":"b"}
}
}
__EOF__

@ncdc
Copy link
Member

ncdc commented Feb 24, 2017

I'm not sure what the expectation is with this particular part of the test suite. It creates a node, then immediately tries to run kubectl replace on it. Should it always succeed, regardless of conflict?

@smarterclayton
Copy link
Contributor

kubectl replace is effectively a "stomp" (all PUTs are stomps from the perspective of the user). I don't think as a user I expect kubectl replace to fail on conflict up to some reasonable retry limit.

@ncdc
Copy link
Member

ncdc commented Feb 28, 2017

I think this might be happening because the kubectl replace call is fighting with the attach/detach reconciler, which is doing a two way merge patch to update the node's status. I'm continuing to investigate.

@pwittrock
Copy link
Member

@ncdc Assigning to you since you are investigating.

@pwittrock
Copy link
Member

cc @ymqytw

@ncdc
Copy link
Member

ncdc commented Mar 6, 2017

I've yet to reproduce this locally but I think my analysis is probably correct. kubectl replace does not currently retry on conflict. Do we think it should? cc @kubernetes/sig-cli-bugs @fabianofranz @soltysh @deads2k @liggitt

@deads2k
Copy link
Contributor Author

deads2k commented Mar 6, 2017

kubectl replace does not currently retry on conflict. Do we think it should?

If you want an unconditional update (retry on conflict), then I think you want something different. Why don't we just create something different to replace on?

@liggitt
Copy link
Member

liggitt commented Mar 6, 2017

from a client's perspective,

kubectl replace -f - "${kube_flags[@]}" << __EOF__
{
"kind": "Node",
"apiVersion": "v1",
"metadata": {
"name": "node-v1-test",
"annotations": {"a":"b"}
}
}
__EOF__
is an unconditional replace (no resource version provided in the input), which is what makes surfacing that error really odd...

@ncdc
Copy link
Member

ncdc commented Mar 6, 2017

There is --force but that does a delete followed by a create

@ncdc
Copy link
Member

ncdc commented Mar 6, 2017

@liggitt I'm not clear on the code path that results in the conflict error surfacing. I tried looking at the rest code that handles updates but it wasn't the easiest to follow.

@ncdc
Copy link
Member

ncdc commented Mar 6, 2017

Yeah, that's where I was looking. I am confused about what happens w.r.t. resource version for the updated object (the one supplied to kubectl replace) if the input doesn't contain a resource version.

@liggitt
Copy link
Member

liggitt commented Mar 6, 2017

@wojtek-t I think this is the issue:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/cacher.go#L518-L535

The caching storage implements guaranteedupdate and uses the object in the watch cache as the current object

@deads2k
Copy link
Contributor Author

deads2k commented Mar 6, 2017

The caching storage implements guaranteedupdate and uses the object in the watch cache as the current object

I think I'd expect it to do that, but on a conflict failure, I'd expect it to retry live. Seems like most of the time, the cache would save us a get.

@pwittrock
Copy link
Member

@deads2k Can I mark this as non-release-blocker to reduce to noise for the release managers?

@ncdc
Copy link
Member

ncdc commented Mar 6, 2017

@pwittrock I'd say yes

@marun
Copy link
Contributor

marun commented Mar 15, 2017

This issue is likely related to a recent failure: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/3796/

@ncdc
Copy link
Member

ncdc commented Mar 15, 2017

Potential fix #43152

@ncdc
Copy link
Member

ncdc commented Mar 15, 2017

At least for the replace issue

@liggitt liggitt changed the title test-cmd failure probably kubectl replace unconditional replace gets conflict error Mar 15, 2017
@liggitt liggitt modified the milestones: v1.6, v1.7 Mar 15, 2017
@liggitt liggitt changed the title unconditional replace gets conflict error replace gets conflict error from watch cache Mar 16, 2017
@liggitt
Copy link
Member

liggitt commented Mar 16, 2017

after further investigation, this doesn't affect unconditional replace

kubectl turns unconditional replaces into conditional ones by pre-fetching the existing object and filling in a resource version. given that behavior of kubectl, the test script is vulnerable to legitimate version conflicts.

however, the watch cache issue still exposes clients to illegitimate version conflicts:

  • a client that updated, then updated again with the RV returned from the first update should not see a conflict, but can

the referenced test-cmd test can flake for the following reasons:

k8s-github-robot pushed a commit that referenced this issue Mar 16, 2017
Automatic merge from submit-queue

Retry kubectl test replace on conflict

Since kubectl is doing a resource-version-constrained replace, it is subject to conflicts on a contentious resource (like a node managed by the node controller)

Fixes #41892 (the specific flake, not the watch cache issue)
@liggitt
Copy link
Member

liggitt commented Mar 16, 2017

reopening to track server-side issue, moving to 1.6.1

@liggitt liggitt reopened this Mar 16, 2017
@liggitt liggitt modified the milestones: v1.6.1, v1.6 Mar 16, 2017
@liggitt liggitt added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Mar 16, 2017
@liggitt
Copy link
Member

liggitt commented Mar 22, 2017

@k8s-github-robot
Copy link

This Issue hasn't been active in 75 days. It will be closed in 14 days (Jun 20, 2017).

cc @deads2k @ncdc

You can add 'keep-open' label to prevent this from happening, or add a comment to keep it open another 90 days

k8s-github-robot pushed a commit that referenced this issue Sep 16, 2017
…conflict

Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

etcd3 store: retry with live object on conflict if there was a suggestion

Retry with a live object instead of the cached version if the watch
cache receives a conflict trying to do the update.

Fixes #41892
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

9 participants