Backport of CSI: reorder controller volume detachment into release/1.1.x #12701

hc-github-team-nomad-core · 2022-04-20T00:05:25Z

Backport

This PR is auto-generated from #12387 to be assessed for backporting due to the inclusion of the label backport/1.1.x.

The below text is copied from the body of the original PR.

Ran into this while working on #12384

In #12112 and #12113 we solved for the problem of races in releasing
volume claims, but there was a case that we missed. During a node
drain with a controller attach/detach, we can hit a race where we call
controller publish before the unpublish has completed. This is
discouraged in the spec but plugins are supposed to handle it
safely. But if the storage provider's API is slow enough and the
plugin doesn't handle the case safely, the volume can get "locked"
into a state where the provider's API won't detach it cleanly.

Check the claim before making any external controller publish RPC
calls so that Nomad is responsible for the canonical information about
whether a volume is currently claimed.

This has a couple side-effects that also had to get fixed here:

Changing the order means that the volume will have a past claim
without a valid external node ID because it came from the client, and
this uncovered a separate bug where we didn't assert the external node
ID was valid before returning it. Fallthrough to getting the ID from
the plugins in the state store in this case. We avoided this
originally because of concerns around plugins getting lost during node
drain but now that we've fixed that we may want to revisit it in
future work.
We should make sure we're handling FailedPrecondition cases from
the controller plugin the same way we handle other retryable cases.
Several tests had to be updated because they were assuming we fail
in a particular order that we're no longer doing.

No changelog entry because this is updating code that hasn't yet shipped.

Fixed E2E test from #12384

$ go test -v . -suite CSI -run 'TestE2E/CSI/\*csi\.CSIControllerPluginEBSTest/TestNodeDrain'
=== RUN   TestE2E
=== RUN   TestE2E/CSI
=== RUN   TestE2E/CSI/*csi.CSIControllerPluginEBSTest
=== RUN   TestE2E/CSI/*csi.CSIControllerPluginEBSTest/TestNodeDrain
--- PASS: TestE2E (114.61s)
    --- PASS: TestE2E/CSI (114.61s)
        --- PASS: TestE2E/CSI/*csi.CSIControllerPluginEBSTest (114.60s)
            --- PASS: TestE2E/CSI/*csi.CSIControllerPluginEBSTest/TestNodeDrain (42.92s)
PASS
ok      github.com/hashicorp/nomad/e2e  115.513s

github-actions · 2022-10-16T02:47:27Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added 2 commits March 28, 2022 15:43

backport of commit 91ead5d

ca15b0c

backport of commit 2639770

1e55af5

hc-github-team-nomad-core force-pushed the backport/b-csi-volume-detach-order/uniformly-busy-locust branch from 2ab7f5b to 1e55af5 Compare April 20, 2022 00:05

hc-github-team-nomad-core merged commit b81dcb9 into release/1.1.x Apr 20, 2022

hc-github-team-nomad-core deleted the backport/b-csi-volume-detach-order/uniformly-busy-locust branch April 20, 2022 00:05

vercel bot deployed to Preview – nomad-storybook-and-ui April 20, 2022 00:13 View deployment

github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of CSI: reorder controller volume detachment into release/1.1.x #12701

Backport of CSI: reorder controller volume detachment into release/1.1.x #12701

hc-github-team-nomad-core commented Apr 20, 2022

github-actions bot commented Oct 16, 2022

Backport of CSI: reorder controller volume detachment into release/1.1.x #12701

Backport of CSI: reorder controller volume detachment into release/1.1.x #12701

Conversation

hc-github-team-nomad-core commented Apr 20, 2022

Backport

github-actions bot commented Oct 16, 2022