Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idempotency: repeated Node Unpublish operations cause failure #105

Closed
okartau opened this issue Oct 23, 2019 · 6 comments · Fixed by #139
Closed

Idempotency: repeated Node Unpublish operations cause failure #105

okartau opened this issue Oct 23, 2019 · 6 comments · Fixed by #139

Comments

@okartau
Copy link
Contributor

okartau commented Oct 23, 2019

Background: in pmem-csi driver we added more idempotency conformance.
To test, the repeated Node Volume operations were added in csi-test.
In process of upstreaming this change, the CI tests that use host-path-driver, fail:
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_csi-test/229/pull-kubernetes-csi-csi-test/1185891170576764931

host-path driver currently does not pay attention to repeated requests possibility, there are no mutexes that protect against related race conditions.

@pohly
Copy link
Contributor

pohly commented Oct 30, 2019

We discussed this today in the Kubernetes-CSI standup and everyone agreed that a) this should be fixed in the hostpath driver and b) that having these idempotency tests in csi-test will be very useful.

@BenTheElder
Copy link

what is necessary to fix this? just limiting the concurrency of operations in the driver?

@BenTheElder
Copy link

xref: #72, this sounds like a superset of that

@okartau
Copy link
Contributor Author

okartau commented Jan 15, 2020

I created #139 to tolerate repeated NodeUnpublish requests.
My local testing shows that this allows hostpath driver to pass the added repeated csi-sanity testing currently coming in kubernetes-csi/csi-test#229, so right now there is no need to add mutex locking to satisfy those repeated tests, as I can't see failures.

@pohly
Copy link
Contributor

pohly commented Jan 15, 2020

so right now there is no need to add mutex locking to satisfy those repeated tests, as I can't see failures.

This is a dangerous argument: that you haven't been able to trigger a problem caused by missing locking does not mean that the problem doesn't exist or that we don't need to do anything about it.

In practice the problem should be exceedingly rare. It only happens when two gRPC calls reach the driver at the same time and get processed in parallel. I think the idempotency tests don't cover that, they just issue the same call multiple times sequentially.

So while we don't need mutex locking to make the NodeUnpublishVolume idempotent, we still need it to make the code more robust.

@okartau
Copy link
Contributor Author

okartau commented Jan 15, 2020

This is a dangerous argument: that you haven't been able to trigger a problem caused by missing locking does not mean that the problem doesn't exist or that we don't need to do anything about it.

I agree, I did not mean we don't need to deal with locking at all, just that currently to-be-added test cases do not expose problem that we can see fixed by added locking.
We should add locking, but the progress will be more assuring if we can have test case(s) improved by locking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants