unset selected node when storage is exhausted for topology segment #405

pohly · 2020-02-13T14:16:03Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

This makes sense under some specific circumstances:

volume is supposed to be created only in a certain topology segment
that segment is chosen via the pod scheduler via late binding
the CSI driver supports topology
the CSI driver reports that it ran out of storage

Previously, external-provisioner would keep retrying to create the
volume instead of notifying the scheduler to pick a node anew.

Which issue(s) this PR fixes:
Related-to: kubernetes/kubernetes#72031

Special notes for your reviewer:

One downside of this change is that if the conditions above are met
and some other final error occurs, rescheduling also is triggered. A
richer API for the sig-storage-lib-external-provisioner library would
be needed to avoid this.

It's okay to treat ResourceExhausted as final error, the CSI spec
explicitly describes this case. Previously it was treated as non-final
error merely because retrying made more sense (resources might become
available).

The check for "strict topology" was added because without it, a CSI driver should already go ahead and pick some other topology segment than the one preferred for the selected node. It seems to make little sense to do another scheduling round if no topology segment has enough space.

Does this PR introduce a user-facing change?:

If a CSI driver returns ResourceExhausted for CreateVolume, supports topology, the volume uses late binding, and the driver has requested strict topology, then external-scheduler now asks for a rescheduling of the pod that triggered the volume creation. This may then result in a retry with a different selected node.

jsafrane · 2020-03-03T12:06:56Z

pkg/controller/controller.go

+	case codes.ResourceExhausted: // CSI: operation not pending, "Unable to provision in `accessible_topology`"
+		if mayReschedule {
+			// may succeed elsewhere -> give up for now
+			return true
+		}
+		// may still succeed at a later time -> continue
+		return false


This is unfortunate case of mixing gRPC transport error codes with application / CSI errors.

Go gRPC implementation returns ResourceExhausted when it processes too big messages (both when sending and receiving). This is the logic behind the current code - the operation itself may be in progress or even successfully finished (too big success response may have been just thrown away) and the provisioner should retry.

Looking at the code, gRPC default max message size is sometimes defaulted to 4KiB

When a CSI driver intentionally returns ResourceExhausted when it gets out of space on the storage backend, the error is final and we can be 100% sure that the volume is not being provisioned. Rescheduling the pod / PVC to a different node is correct.

Questions are:

How to recognize who sent the response. IMO it's impossible to distinguish CSI driver from gRPC implementation.

Whether we can assume that ResourceExhausted returned by gRPC implementation is safe not to retry.

If CreateVolumeResponse / CreateVolumeRequest was too big, retrying the request should yield the same response and one of them is going to be too big again. Retrying is useless and there is a leaked volume. Trying on a different node won't harm.

This is a problem in the CSI spec, right? Instead of re-using status codes, it would have been better to pick new ones to avoid such ambiguity.

I agree that it is impossible to determine the sender. If sending the request already fails, then trying elsewhere is safe and in fact the only way how (theoretically) the volume might still succeed, because retrying locally will just send the same message over and over again.

When receiving the reply fails with ResourceExhausted, then the situation is worse in the sense that the volume probably has been provisioned, external-provisioner just never gets to know that. Retrying won't help here much either unless an admin steps in an reconfigures the external-provisioner with larger max message sizes.

I think we should bring this up as a problem with spec. But given what it is right now, avoiding the gRPC ResourceExhausted by using large enough message size limits and then assuming that the error really comes from the CSI driver seems viable to me.

sig-storage-lib-external-provisioner v5.0.0 switched to Go modules, therefore we have to adapt the import path.

This makes sense under some specific circumstances: - volume is supposed to be created only in a certain topology segment - that segment is chosen via the pod scheduler via late binding - the CSI driver supports topology - the CSI driver reports that it ran out of storage Previously, external-provisioner would keep retrying to create the volume instead of notifying the scheduler to pick a node anew. It's okay to treat ResourceExhausted as final error, the CSI spec explicitly describes this case. However, it could also come from the gRPC transport layer and thus previously it was treated as non-final error merely because retrying made more sense (resources might become available, except when the root cause "message size exceeded", which is unlikely to change).

pohly · 2020-03-24T16:42:13Z

@jsafrane I've updated this to use sig-storage-lib-external-provisioner v5.0.0. I also filed an issue against the CSI spec about the RESOURCE_EXHAUSTED ambiguity and added that as comment to the code.

What's missing is a local test in pkg/controller for the new code path. Do we need one and if so, any suggestions what a good place for that might be?

E2E testing will cover this (kubernetes/kubernetes#88114), too.

msau42

Can this unit test be extended? https://github.com/kubernetes-csi/external-provisioner/blob/master/pkg/controller/controller_test.go#L2473

msau42 · 2020-03-25T23:42:06Z

pkg/controller/controller.go

+		// only makes sense if:
+		// - The CSI driver supports topology: without that, the next CreateVolume call after
+		//   rescheduling will be exactly the same.
+		// - We are using strict topology: otherwise the CSI driver is already allowed


I think we can remove the check for strict topology. I'm not aware of any drivers that retry provisioning on subsequent topologies in the list if the first fails.

msau42 · 2020-03-25T23:42:45Z

pkg/controller/controller.go

+			p.strictTopology &&
+			options.SelectedNode != nil
+		state := checkError(err, mayReschedule)
+		klog.V(5).Infof("CreateVolume failed, supports topology = %v, strict topology %v, node selected %v => may reschedule = %v => state = %v: %v",


should this be a warning?

I think a warning is too strong because it is a situation that now is supposed to be handled automatically without admin attention. The log entry is really just information that this is happening.

Even drivers which did not explicitly ask for strict topology may benefit from rescheduling (kubernetes-csi#405 (comment)).

The full set of return values of ProvisionExt and the code paths in checkError are covered by the new test. While at it, klog logging gets enabled and a cut-and-paste error for the temp directory name gets fixed.

pohly

Can this unit test be extended? https://github.com/kubernetes-csi/external-provisioner/blob/master/pkg/controller/controller_test.go#L2473

I ended up creating a new test specifically for error handling. ProvisionExt and its state return value had not been covered at all before.

@msau42: please have another look.

pohly · 2020-03-30T17:16:29Z

pkg/controller/controller.go

+			p.strictTopology &&
+			options.SelectedNode != nil
+		state := checkError(err, mayReschedule)
+		klog.V(5).Infof("CreateVolume failed, supports topology = %v, strict topology %v, node selected %v => may reschedule = %v => state = %v: %v",


I think a warning is too strong because it is a situation that now is supposed to be handled automatically without admin attention. The log entry is really just information that this is happening.

pohly · 2020-03-30T17:16:33Z

pkg/controller/controller.go

+		// only makes sense if:
+		// - The CSI driver supports topology: without that, the next CreateVolume call after
+		//   rescheduling will be exactly the same.
+		// - We are using strict topology: otherwise the CSI driver is already allowed


pohly · 2020-03-31T16:44:02Z

/assign @msau42

msau42

/approve

Just have one question about the dependencies

msau42 · 2020-04-01T00:14:18Z

go.mod

 	k8s.io/apiserver v0.17.0
 	k8s.io/client-go v0.17.0
 	k8s.io/component-base v0.17.0
 	k8s.io/csi-translation-lib v0.17.0
 	k8s.io/klog v1.0.0
 	k8s.io/kubernetes v1.14.0
-	sigs.k8s.io/sig-storage-lib-external-provisioner v4.1.0+incompatible
+	sigs.k8s.io/sig-storage-lib-external-provisioner v4.1.0+incompatible // indirect


This is kind of strange, we have dependencies on both versions?

I had noticed that, too, but because we never actually pull in any code from it (= it's not in vendor) ignored it.

This might be a bug in sig-storage-lib-external-provisioner:

$ go mod why -m sigs.k8s.io/sig-storage-lib-external-provisioner # sigs.k8s.io/sig-storage-lib-external-provisioner github.com/kubernetes-csi/external-provisioner/cmd/csi-provisioner sigs.k8s.io/sig-storage-lib-external-provisioner/v5/controller sigs.k8s.io/sig-storage-lib-external-provisioner/v5/controller.test sigs.k8s.io/sig-storage-lib-external-provisioner/controller/metrics

=> controller.test (which we don't use) pulls in the unversioned sigs.k8s.io/sig-storage-lib-external-provisioner here: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/7d9e8b4c678a803070309639110a4494a4010efe/controller/controller_test.go#L26

Not sure how that works; perhaps go test simply maps that to the checked out module source code. Will fix it.

Fix is in kubernetes-sigs/sig-storage-lib-external-provisioner#71

Thanks! Since we're not actually using it here, we don't need to wait for.

k8s-ci-robot · 2020-04-01T00:20:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [msau42]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

msau42 · 2020-04-01T00:20:21Z

/kind bug

The controller.test binary and the example still imported without the /v5 suffix. This resulted in spurious "sigs.k8s.io/sig-storage-lib-external-provisioner v4.1.0+incompatible // indirect" go.mod entries in projects using the lib (kubernetes-csi/external-provisioner#405 (comment)). The example needs to use the sources that it was checked out with. Otherwise it is impossible to change the API and the example in a single commit and/or there is a risk that breaking changes go undetected because the example continues to build with the unmodified lib.

msau42 · 2020-04-01T16:09:35Z

/lgtm

…go_modules/k8s.io/api-0.26.1 Bump k8s.io/api from 0.26.0 to 0.26.1

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 13, 2020

k8s-ci-robot requested review from jsafrane and saad-ali February 13, 2020 14:16

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 13, 2020

pohly mentioned this pull request Feb 13, 2020

controller: enable rescheduling of pods kubernetes-sigs/sig-storage-lib-external-provisioner#68

Merged

jsafrane reviewed Mar 3, 2020

View reviewed changes

pohly force-pushed the unset-selected-node branch from 4364061 to f204cc3 Compare March 19, 2020 14:10

pohly mentioned this pull request Mar 24, 2020

RESOURCE_EXHAUSTED ambiguous container-storage-interface/spec#419

Open

pohly force-pushed the unset-selected-node branch from f204cc3 to 5790768 Compare March 24, 2020 16:02

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 24, 2020

pohly force-pushed the unset-selected-node branch from 5790768 to 68ef5db Compare March 24, 2020 16:09

pohly added 2 commits March 24, 2020 17:17

sig-storage-lib-external-provisioner v5.0.0

c871801

sig-storage-lib-external-provisioner v5.0.0 switched to Go modules, therefore we have to adapt the import path.

pohly force-pushed the unset-selected-node branch from 68ef5db to 56610ce Compare March 24, 2020 16:17

pohly changed the title ~~WIP: unset selected node when storage is exhausted for topology segment~~ unset selected node when storage is exhausted for topology segment Mar 24, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2020

msau42 reviewed Mar 25, 2020

View reviewed changes

pohly added 2 commits March 31, 2020 14:33

controller: reschedule regardless of strict topology

8b91364

Even drivers which did not explicitly ask for strict topology may benefit from rescheduling (kubernetes-csi#405 (comment)).

controller: unit test for error handling

2828193

The full set of return values of ProvisionExt and the code paths in checkError are covered by the new test. While at it, klog logging gets enabled and a cut-and-paste error for the temp directory name gets fixed.

pohly commented Mar 31, 2020

View reviewed changes

k8s-ci-robot assigned msau42 Mar 31, 2020

msau42 reviewed Apr 1, 2020

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2020

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 1, 2020

pohly mentioned this pull request Apr 1, 2020

controller: fix module imports kubernetes-sigs/sig-storage-lib-external-provisioner#71

Merged

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020

k8s-ci-robot merged commit 309b38a into kubernetes-csi:master Apr 1, 2020

kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this pull request Dec 29, 2023

Merge pull request kubernetes-csi#405 from kubernetes-csi/dependabot/…

14dbc59

…go_modules/k8s.io/api-0.26.1 Bump k8s.io/api from 0.26.0 to 0.26.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unset selected node when storage is exhausted for topology segment #405

unset selected node when storage is exhausted for topology segment #405

pohly commented Feb 13, 2020

jsafrane Mar 3, 2020 •

edited

Loading

pohly Mar 3, 2020

pohly commented Mar 24, 2020

msau42 left a comment

msau42 Mar 25, 2020

pohly Mar 30, 2020

msau42 Mar 25, 2020

pohly Mar 30, 2020

pohly left a comment

pohly Mar 30, 2020

pohly Mar 30, 2020

pohly commented Mar 31, 2020

msau42 left a comment

msau42 Apr 1, 2020

pohly Apr 1, 2020

pohly Apr 1, 2020

msau42 Apr 1, 2020

k8s-ci-robot commented Apr 1, 2020

msau42 commented Apr 1, 2020

msau42 commented Apr 1, 2020

unset selected node when storage is exhausted for topology segment #405

unset selected node when storage is exhausted for topology segment #405

Conversation

pohly commented Feb 13, 2020

jsafrane Mar 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Mar 24, 2020

msau42 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Mar 31, 2020

msau42 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 1, 2020

msau42 commented Apr 1, 2020

msau42 commented Apr 1, 2020

jsafrane Mar 3, 2020 •

edited

Loading