New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add list of pods that use a volume to multiattach events #56288
Conversation
/retest |
/assign @jingxu97 |
@@ -269,7 +270,18 @@ func (rc *reconciler) attachDesiredVolumes() { | |||
nodes := rc.actualStateOfWorld.GetNodesForVolume(volumeToAttach.VolumeName) | |||
if len(nodes) > 0 { | |||
if !volumeToAttach.MultiAttachErrorReported { | |||
simpleMsg, detailedMsg := volumeToAttach.GenerateMsg("Multi-Attach error", "Volume is already exclusively attached to one node and can't be attached to another") | |||
podNames := rc.desiredStateOfWorld.GetVolumePodsOnNodes(nodes, volumeToAttach.VolumeName) | |||
var simpleMsg, detailedMsg string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we exclude current pod that is trying to attach this volume? Perhaps it does not matter. Also, it may be worth not "leaking" information about pods that do not exist in current user's namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a separate commit with elaborate logic that does not reveal pod names to users in another namespace and on the other way logs all pods to controller-manager logs.
e850109
to
d4ff86f
Compare
/retest |
} | ||
|
||
// Get list of pods that use the volume on the other nodes. | ||
pods := rc.desiredStateOfWorld.GetVolumePodsOnNodes(otherNodes, volumeToAttach.VolumeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this information be retrieved from desired state of world or actual state of world? It is possible that - multiple pods are requesting same volume on more than 1 node and they are all not getting attached because some other pod(on some other node) is actually using the volume.
In which case - while MultiAttach error is still mostly accurate, we may be printing wrong information back to the user and to the admin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is that ASW does not keep list of pods for attached volumes. It's reconstructed from node.status when the controller is restarted and we don't put pods there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment below to:
// We did not find any pods that requests the volume. The pod must have been deleted already.
Message sent to user is: "Volume is already exclusively attached to one node and can't be attached to another" and IMO there is not much we could do about it.
So users knows what pods are blocking a volume and can realize their error.
d4ff86f
to
e46c886
Compare
/lgtm |
/retest |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, jsafrane Associated issue requirement bypassed by: jsafrane The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
/retest Review the full test history for this PR. Silence the bot with an |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
So users knows what pods are blocking a volume and can realize their error.
Release note:
UX:
When the node where is the volume attached is known:
Multi-Attach error for volume "volume-name" (UniqueName: "fake-plugin/volume-name") from node "node1" Volume is already used by pods ns2/pod2, ns1/pod3 on node node2, node3
When the node where is the volume attached is not known:
Multi-Attach error for volume "volume-name" (UniqueName: "fake-plugin/volume-name") from node "node1" Volume is already exclusively attached to node node2 and can't be attached to another
/kind bug
/sig storage
/assign @gnufied