fix nestedPendingOperations mount and umount parallel bug #109190

249043822 · 2022-03-31T08:24:31Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #109047

Special notes for your reviewer:

/cc @jingxu97 @gnufied

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

yangjunmyfm192085

/sig node

249043822 · 2022-03-31T11:56:45Z

/retest

Dingshujie · 2022-03-31T11:59:11Z

pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go

@@ -256,10 +257,18 @@ func (grm *nestedPendingOperations) isOperationExists(key operationKey) (bool, i
 			previousOp.key.nodeName == key.nodeName

 		if volumeNameMatch && podNameMatch && nodeNameMatch {


recommend no else

if volumeNameMatch && podNameMatch && nodeNameMatch { if previousOp.operationPending { return true, previousOpIndex } if opIndex == -1 { opIndex = previousOpIndex } }

Dingshujie · 2022-03-31T12:06:43Z

pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go

+			} else {
+				if opIndex == -1 {
+					opIndex = previousOpIndex
+				}


t0 Operations: {umount, pvname2, pod2, mount, pending=true} {umount, pvname, pod1, pending=true} {umount, pvname, pod2, pending=true}
t1 Operations: {umount, pvname2, pod2, mount, pending=true} {umount, pvname, pod1, pending=false} {umount, pvname, pod2, pending=false} // two umount failed
t2 Operations: {umount, pvname2, pod2, mount, pending=true} {mount, pvname, EmptyUniquePodName, pending=true} {umount, pvname, pod2, pending=false} // add pvname mount
t3 Operations: {umount, pvname2, pod2, mount, pending=true} {mount, pvname, EmptyUniquePodName, pending=false} {umount, pvname, pod2, pending=false} // pvname mount failed
t4 Operations: {umount, pvname2, pod2, mount, pending=true} {umount, pvname, pod2, pending=true} {umount, pvname, pod2, pending=false} // pvname pod2 umount add
t5 Operations: {umount, pvname, pod2, pending=false} {umount, pvname, pod2, pending=true} // pvname2 umount success, replace index 0 with index 2, and delete index 2
t6 Operations: {umount, pvname, pod2, pending=true} // umount pod2 success, delete index 0

maybe in this case, also leak a operation can block pv mount

Thanks for reivew, missing exact match first, I will add a test case for this.

Dingshujie · 2022-03-31T12:56:56Z

	opIndex := -1
	for previousOpIndex, previousOp := range grm.operations {
		volumeNameMatch := previousOp.key.volumeName == key.volumeName

		podNameMatch := previousOp.key.podName == EmptyUniquePodName ||
			key.podName == EmptyUniquePodName ||
			previousOp.key.podName == key.podName

		podNameExactMatch := previousOp.key.podName == key.podName

		nodeNameMatch := previousOp.key.nodeName == EmptyNodeName ||
			key.nodeName == EmptyNodeName ||
			previousOp.key.nodeName == key.nodeName

		nodeNameExactMatch := previousOp.key.nodeName == key.nodeName

		if volumeNameMatch && podNameMatch && nodeNameMatch {
			if previousOp.operationPending {
				return true, previousOpIndex
			}
			if opIndex == -1 || (podNameExactMatch && nodeNameExactMatch) {
				opIndex = previousOpIndex
			}
		}
	}

prefer to override pod name exact match and node name exact match operation, if not found, override first match one

249043822 · 2022-04-01T10:55:04Z

/test pull-kubernetes-integration

gnufied · 2022-04-01T17:58:17Z

pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go

+			nodeNameExactMatch := previousOp.key.nodeName == key.nodeName
+			if opIndex == -1 || (podNameExactMatch && nodeNameExactMatch) {
+				opIndex = previousOpIndex
+			}
 		}


Thanks for making the fix. Does following logic seems more direct?

exactMatch := -1 nonExactMatch := -1 for previousOpIndex, previousOp := range grm.operations { volumeNameMatch := previousOp.key.volumeName == key.volumeName if volumeNameMatch { if key.podName == previousOp.key.podName && key.nodeName == previousOp.key.nodeName { exactMatch = previousOpIndex break } podNameMatch := previousOp.key.podName == EmptyUniquePodName || key.podName == EmptyUniquePodName || previousOp.key.podName == key.podName nodeNameMatch := previousOp.key.nodeName == EmptyNodeName || key.nodeName == EmptyNodeName || previousOp.key.nodeName == key.nodeName if podNameMatch && nodeNameMatch { nonExactMatch = previousOpIndex } } } if exactMatch != -1 { return true, exactMatch } if nonExactMatch != -1 { return true, nonExactMatch }

Basically we want to pick exactly matching operation over other ones.

Well my version of code does not return first pending match though(can be tweaked but tbh I am okay with either version). One downside of current approach now is - we must iterate through all operations before IsOperationExists can return. This could potentially be expensive on a busy node.

Yea, in worst case, there is no pending match operations at all but there is a non exact match at the header, we must still iterate all. But I think we don't change the time complexity of O(N) from global perspective, because we should always iterate the slice to find a match volume operation if there are many different volumes operations. shall we add a TODO to improve the algorithm using other data structure in future?

Yeah I wonder if we can change the data structure used here from a slice to something that can accommodate these requirements in a better way. The code is also kinda hard to follow because of underlying complexity (not this PR's fault, it was always hard to follow).

aggree, need to refactor this data structure.

proposing a kep to do this?

gnufied · 2022-04-01T18:00:09Z

test/e2e/storage/persistent_volumes.go

@@ -41,15 +45,15 @@ import (

 // Validate PV/PVC, create and verify writer pod, delete the PVC, and validate the PV's
 // phase. Note: the PV is deleted in the AfterEach, not here.
-func completeTest(f *framework.Framework, c clientset.Interface, ns string, pv *v1.PersistentVolume, pvc *v1.PersistentVolumeClaim) {
+func completeTest(f *framework.Framework, c clientset.Interface, ns string, pv *v1.PersistentVolume, pvc *v1.PersistentVolumeClaim, act func(c clientset.Interface, t *framework.TimeoutContext, ns string, pvc *v1.PersistentVolumeClaim, command string) error) {
 	// 1. verify that the PV and PVC have bound correctly


Does this test case reproduces the issue reliably?

hard to say. some operation orders required to reproduce the problem, this test case can't guarantee it.

Agree, not reliable, only add a e2e test coverage to prevent regression

249043822 · 2022-04-02T03:11:58Z

/test pull-kubernetes-integration

gnufied · 2022-04-04T14:10:26Z

Thanks for refactoring the nested pending operation code. We have failing unit tests though:

{Failed  === RUN   Test_NestedPendingOperations_SecondOpBeforeFirstCompletes/Test_1
    nestedpendingoperations_test.go:894: NestedPendingOperations failed. Expected: <no error> Actual: <Failed to create operation with name {volumeName: podName: nodeName: operationName:}. An operation with that name is already executing.>
    --- FAIL: Test_NestedPendingOperations_SecondOpBeforeFirstCompletes/Test_1 (0.00s)
}

249043822 · 2022-04-04T14:18:11Z

Thanks for refactoring the nested pending operation code? We have failing unit tests though:

{Failed  === RUN   Test_NestedPendingOperations_SecondOpBeforeFirstCompletes/Test_1
    nestedpendingoperations_test.go:894: NestedPendingOperations failed. Expected: <no error> Actual: <Failed to create operation with name {volumeName: podName: nodeName: operationName:}. An operation with that name is already executing.>
    --- FAIL: Test_NestedPendingOperations_SecondOpBeforeFirstCompletes/Test_1 (0.00s)
}

Sorry, will look

k8s-ci-robot · 2022-04-04T15:01:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 249043822
To complete the pull request process, please assign saad-ali after the PR has been reviewed.
You can assign the PR to them by writing /assign @saad-ali in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

249043822 · 2022-04-05T09:41:42Z

/retest

249043822 · 2022-04-05T11:24:52Z

@gnufied All the tests are passed, but i have a little nit, while volumeName is empty, which is for verify_volumes_are_attached_per_node operation, if the operation for the same node is still pending, shall we reject current operation for the same node?
From the test case

kubernetes/pkg/volume/util/operationexecutor/operation_executor_test.go

Line 191 in 0424c7c

    
           func TestOperationExecutor_AttachSingleNodeVolumeConcurrentlyToSameNode(t *testing.T) {

, we can know that verify_volumes_are_attached_per_node for the same node can run conccurently

jingxu97 · 2022-04-06T22:24:58Z

@249043822 249043822, thanks a lot for finding the issue and working on the fix.
I was also thinking about this logic and propose a fix here #109343, the change is smaller. Will that fix logic work for your case?

249043822 · 2022-04-07T00:27:30Z

@249043822 249043822, thanks a lot for finding the issue and working on the fix. I was also thinking about this logic and propose a fix here #109343, the change is smaller. Will that fix logic work for your case?

@jingxu97 thanks for your reply, the fix in #109343 is similar with the first version (d05470a) which i have committed, it's really a minimal change for the bug. Talking with @gnufied , Hemant thinks that fix has one downside - we must iterate through all operations before IsOperationExists can return. This could potentially be expensive on a busy node. #109190 (comment)

so i refactor a new fix as this

jingxu97 · 2022-04-07T03:23:24Z

pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go

+	if volumeName == EmptyUniqueVolumeName {
+		// volumeName empty, nodeName should exists(for verify_volumes_are_attached_per_node)
+		// to make the opKey unique. if the operation for the same node is still pending,
+		// should reject current operation?


right now only verifyVolumeAttached function uses empty volumeName to allow it to run parallel no matter what other volume operations are running. I think we can still make this if part at the beginning of the function

249043822 · 2022-04-07T08:19:18Z

/retest

Dingshujie · 2022-05-12T13:25:37Z

@SergeyKanzhelev @jingxu97 can you reviewed this pr？

xhejtman · 2022-06-25T22:58:41Z

Hello, is there any progress here? The bug is actually a big problem for nextflow computation as tasks get stucked in container creating..

249043822 · 2022-07-05T04:13:55Z

@gnufied @jingxu97 @Dingshujie As this refactor is a big change and may not make sense to all, so can we make this work two steps? First we can consider the smaller change in: #110951 to be landed to solve this bug, then we may promote a nice refactor change for this in next step to make the work completely.

k8s-ci-robot · 2022-07-08T17:35:46Z

@249043822: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xing-yang · 2022-07-24T12:59:48Z

/triage accepted

249043822 · 2022-09-27T07:22:19Z

/close

k8s-ci-robot · 2022-09-27T07:22:24Z

@249043822: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested a review from gnufied March 31, 2022 08:24

k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Mar 31, 2022

k8s-ci-robot requested a review from jingxu97 March 31, 2022 08:24

yangjunmyfm192085 reviewed Mar 31, 2022

View reviewed changes

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Mar 31, 2022

249043822 force-pushed the br-detach-fail branch from 02727b6 to bfa0f8d Compare March 31, 2022 10:47

Dingshujie reviewed Mar 31, 2022

View reviewed changes

249043822 force-pushed the br-detach-fail branch 4 times, most recently from a914014 to d05470a Compare April 1, 2022 04:56

gnufied reviewed Apr 1, 2022

View reviewed changes

249043822 force-pushed the br-detach-fail branch from d05470a to 588513c Compare April 3, 2022 13:49

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 3, 2022

249043822 force-pushed the br-detach-fail branch from cc01c17 to 071d07c Compare April 4, 2022 14:59

ehashman added this to Triage in SIG Node PR Triage Apr 4, 2022

249043822 force-pushed the br-detach-fail branch from 071d07c to 226a7e3 Compare April 5, 2022 03:22

249043822 force-pushed the br-detach-fail branch from 226a7e3 to dc5dcac Compare April 6, 2022 10:57

SergeyKanzhelev added this to Triage in SIG Node CI/Test Board Apr 6, 2022

SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Apr 6, 2022

jingxu97 reviewed Apr 7, 2022

View reviewed changes

fix nestedPendingOperations mount and umount parallel bug

9077161

249043822 force-pushed the br-detach-fail branch from dc5dcac to 9077161 Compare April 7, 2022 06:58

endocrimes moved this from Triage to Done in SIG Node PR Triage May 4, 2022

249043822 mentioned this pull request Jul 5, 2022

fix nestedPendingOperations mount and umount parallel bug -- minimal change #110951

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 24, 2022

k8s-ci-robot closed this Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix nestedPendingOperations mount and umount parallel bug #109190

fix nestedPendingOperations mount and umount parallel bug #109190

249043822 commented Mar 31, 2022

yangjunmyfm192085 left a comment

249043822 commented Mar 31, 2022

Dingshujie Mar 31, 2022

Dingshujie Mar 31, 2022 •

edited

249043822 Apr 1, 2022

Dingshujie commented Mar 31, 2022

249043822 commented Apr 1, 2022

gnufied Apr 1, 2022

gnufied Apr 1, 2022 •

edited

249043822 Apr 2, 2022

gnufied Apr 2, 2022

Dingshujie Apr 2, 2022

gnufied Apr 1, 2022

Dingshujie Apr 2, 2022

249043822 Apr 2, 2022

249043822 commented Apr 2, 2022

gnufied commented Apr 4, 2022 •

edited

249043822 commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

249043822 commented Apr 5, 2022

249043822 commented Apr 5, 2022

jingxu97 commented Apr 6, 2022

249043822 commented Apr 7, 2022 •

edited

jingxu97 Apr 7, 2022

249043822 commented Apr 7, 2022

Dingshujie commented May 12, 2022

xhejtman commented Jun 25, 2022

249043822 commented Jul 5, 2022

k8s-ci-robot commented Jul 8, 2022

xing-yang commented Jul 24, 2022

249043822 commented Sep 27, 2022

k8s-ci-robot commented Sep 27, 2022

		@@ -256,10 +257,18 @@ func (grm *nestedPendingOperations) isOperationExists(key operationKey) (bool, i
		previousOp.key.nodeName == key.nodeName

		if volumeNameMatch && podNameMatch && nodeNameMatch {

fix nestedPendingOperations mount and umount parallel bug #109190

fix nestedPendingOperations mount and umount parallel bug #109190

Conversation

249043822 commented Mar 31, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

yangjunmyfm192085 left a comment

Choose a reason for hiding this comment

249043822 commented Mar 31, 2022

Choose a reason for hiding this comment

Dingshujie Mar 31, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dingshujie commented Mar 31, 2022

249043822 commented Apr 1, 2022

Choose a reason for hiding this comment

gnufied Apr 1, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

249043822 commented Apr 2, 2022

gnufied commented Apr 4, 2022 • edited

249043822 commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

249043822 commented Apr 5, 2022

249043822 commented Apr 5, 2022

jingxu97 commented Apr 6, 2022

249043822 commented Apr 7, 2022 • edited

Choose a reason for hiding this comment

249043822 commented Apr 7, 2022

Dingshujie commented May 12, 2022

xhejtman commented Jun 25, 2022

249043822 commented Jul 5, 2022

k8s-ci-robot commented Jul 8, 2022

xing-yang commented Jul 24, 2022

249043822 commented Sep 27, 2022

k8s-ci-robot commented Sep 27, 2022

Dingshujie Mar 31, 2022 •

edited

gnufied Apr 1, 2022 •

edited

gnufied commented Apr 4, 2022 •

edited

249043822 commented Apr 7, 2022 •

edited