New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix nestedPendingOperations mount and umount parallel bug #109190
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/sig node
02727b6
to
bfa0f8d
Compare
/retest |
@@ -256,10 +257,18 @@ func (grm *nestedPendingOperations) isOperationExists(key operationKey) (bool, i | |||
previousOp.key.nodeName == key.nodeName | |||
|
|||
if volumeNameMatch && podNameMatch && nodeNameMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend no else
if volumeNameMatch && podNameMatch && nodeNameMatch {
if previousOp.operationPending {
return true, previousOpIndex
}
if opIndex == -1 {
opIndex = previousOpIndex
}
}
} else { | ||
if opIndex == -1 { | ||
opIndex = previousOpIndex | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t0 Operations: {umount, pvname2, pod2, mount, pending=true} {umount, pvname, pod1, pending=true} {umount, pvname, pod2, pending=true}
t1 Operations: {umount, pvname2, pod2, mount, pending=true} {umount, pvname, pod1, pending=false} {umount, pvname, pod2, pending=false} // two umount failed
t2 Operations: {umount, pvname2, pod2, mount, pending=true} {mount, pvname, EmptyUniquePodName, pending=true} {umount, pvname, pod2, pending=false} // add pvname mount
t3 Operations: {umount, pvname2, pod2, mount, pending=true} {mount, pvname, EmptyUniquePodName, pending=false} {umount, pvname, pod2, pending=false} // pvname mount failed
t4 Operations: {umount, pvname2, pod2, mount, pending=true} {umount, pvname, pod2, pending=true} {umount, pvname, pod2, pending=false} // pvname pod2 umount add
t5 Operations: {umount, pvname, pod2, pending=false} {umount, pvname, pod2, pending=true} // pvname2 umount success, replace index 0 with index 2, and delete index 2
t6 Operations: {umount, pvname, pod2, pending=true} // umount pod2 success, delete index 0
maybe in this case, also leak a operation can block pv mount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reivew, missing exact match first, I will add a test case for this.
opIndex := -1
for previousOpIndex, previousOp := range grm.operations {
volumeNameMatch := previousOp.key.volumeName == key.volumeName
podNameMatch := previousOp.key.podName == EmptyUniquePodName ||
key.podName == EmptyUniquePodName ||
previousOp.key.podName == key.podName
podNameExactMatch := previousOp.key.podName == key.podName
nodeNameMatch := previousOp.key.nodeName == EmptyNodeName ||
key.nodeName == EmptyNodeName ||
previousOp.key.nodeName == key.nodeName
nodeNameExactMatch := previousOp.key.nodeName == key.nodeName
if volumeNameMatch && podNameMatch && nodeNameMatch {
if previousOp.operationPending {
return true, previousOpIndex
}
if opIndex == -1 || (podNameExactMatch && nodeNameExactMatch) {
opIndex = previousOpIndex
}
}
} prefer to override pod name exact match and node name exact match operation, if not found, override first match one |
a914014
to
d05470a
Compare
/test pull-kubernetes-integration |
nodeNameExactMatch := previousOp.key.nodeName == key.nodeName | ||
if opIndex == -1 || (podNameExactMatch && nodeNameExactMatch) { | ||
opIndex = previousOpIndex | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the fix. Does following logic seems more direct?
exactMatch := -1
nonExactMatch := -1
for previousOpIndex, previousOp := range grm.operations {
volumeNameMatch := previousOp.key.volumeName == key.volumeName
if volumeNameMatch {
if key.podName == previousOp.key.podName && key.nodeName == previousOp.key.nodeName {
exactMatch = previousOpIndex
break
}
podNameMatch := previousOp.key.podName == EmptyUniquePodName ||
key.podName == EmptyUniquePodName ||
previousOp.key.podName == key.podName
nodeNameMatch := previousOp.key.nodeName == EmptyNodeName ||
key.nodeName == EmptyNodeName ||
previousOp.key.nodeName == key.nodeName
if podNameMatch && nodeNameMatch {
nonExactMatch = previousOpIndex
}
}
}
if exactMatch != -1 {
return true, exactMatch
}
if nonExactMatch != -1 {
return true, nonExactMatch
}
Basically we want to pick exactly matching operation over other ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well my version of code does not return first pending match though(can be tweaked but tbh I am okay with either version). One downside of current approach now is - we must iterate through all operations before IsOperationExists
can return. This could potentially be expensive on a busy node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, in worst case, there is no pending match operations at all but there is a non exact match at the header, we must still iterate all. But I think we don't change the time complexity of O(N) from global perspective, because we should always iterate the slice to find a match volume operation if there are many different volumes operations. shall we add a TODO to improve the algorithm using other data structure in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wonder if we can change the data structure used here from a slice to something that can accommodate these requirements in a better way. The code is also kinda hard to follow because of underlying complexity (not this PR's fault, it was always hard to follow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggree, need to refactor this data structure.
proposing a kep to do this?
@@ -41,15 +45,15 @@ import ( | |||
|
|||
// Validate PV/PVC, create and verify writer pod, delete the PVC, and validate the PV's | |||
// phase. Note: the PV is deleted in the AfterEach, not here. | |||
func completeTest(f *framework.Framework, c clientset.Interface, ns string, pv *v1.PersistentVolume, pvc *v1.PersistentVolumeClaim) { | |||
func completeTest(f *framework.Framework, c clientset.Interface, ns string, pv *v1.PersistentVolume, pvc *v1.PersistentVolumeClaim, act func(c clientset.Interface, t *framework.TimeoutContext, ns string, pvc *v1.PersistentVolumeClaim, command string) error) { | |||
// 1. verify that the PV and PVC have bound correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test case reproduces the issue reliably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard to say. some operation orders required to reproduce the problem, this test case can't guarantee it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, not reliable, only add a e2e test coverage to prevent regression
/test pull-kubernetes-integration |
d05470a
to
588513c
Compare
Thanks for refactoring the nested pending operation code. We have failing unit tests though:
|
Sorry, will look |
cc01c17
to
071d07c
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: 249043822 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
071d07c
to
226a7e3
Compare
/retest |
@gnufied All the tests are passed, but i have a little nit, while volumeName is empty, which is for verify_volumes_are_attached_per_node operation, if the operation for the same node is still pending, shall we reject current operation for the same node?
|
226a7e3
to
dc5dcac
Compare
@249043822 249043822, thanks a lot for finding the issue and working on the fix. |
@jingxu97 thanks for your reply, the fix in #109343 is similar with the first version (d05470a) which i have committed, it's really a minimal change for the bug. Talking with @gnufied , Hemant thinks that fix has one downside - we must iterate through all operations before IsOperationExists can return. This could potentially be expensive on a busy node. #109190 (comment) so i refactor a new fix as this |
if volumeName == EmptyUniqueVolumeName { | ||
// volumeName empty, nodeName should exists(for verify_volumes_are_attached_per_node) | ||
// to make the opKey unique. if the operation for the same node is still pending, | ||
// should reject current operation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now only verifyVolumeAttached function uses empty volumeName to allow it to run parallel no matter what other volume operations are running. I think we can still make this if part at the beginning of the function
dc5dcac
to
9077161
Compare
/retest |
@SergeyKanzhelev @jingxu97 can you reviewed this pr? |
Hello, is there any progress here? The bug is actually a big problem for nextflow computation as tasks get stucked in container creating.. |
@gnufied @jingxu97 @Dingshujie As this refactor is a big change and may not make sense to all, so can we make this work two steps? First we can consider the smaller change in: #110951 to be landed to solve this bug, then we may promote a nice refactor change for this in next step to make the work completely. |
@249043822: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
/close |
@249043822: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #109047
Special notes for your reviewer:
/cc @jingxu97 @gnufied
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: