Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add list of pods that use a volume to multiattach events #56288

Merged
merged 1 commit into from Jan 25, 2018

Conversation

jsafrane
Copy link
Member

@jsafrane jsafrane commented Nov 23, 2017

So users knows what pods are blocking a volume and can realize their error.

Release note:

NONE

UX:

  • User can get one of following events, depending what other pod(s) are already using a volume and in which namespace they are:
Multi-Attach error for volume"volume-name" Volume is already exclusively attached to one node and can't be attached to another
Multi-Attach error for volume "volume-name" Volume is already used by pod(s) pod3 and 1 pod(s) in different namespaces
  • controller-manager gets always full logs:
    • When the node where is the volume attached is known:
      Multi-Attach error for volume "volume-name" (UniqueName: "fake-plugin/volume-name") from node "node1" Volume is already used by pods ns2/pod2, ns1/pod3 on node node2, node3

    • When the node where is the volume attached is not known:
      Multi-Attach error for volume "volume-name" (UniqueName: "fake-plugin/volume-name") from node "node1" Volume is already exclusively attached to node node2 and can't be attached to another

/kind bug
/sig storage
/assign @gnufied

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 23, 2017
@jsafrane
Copy link
Member Author

/retest
#56262

@jsafrane
Copy link
Member Author

/assign @jingxu97
I hoped a bot assigns second reviewer randomly, apparently it does not.

@@ -269,7 +270,18 @@ func (rc *reconciler) attachDesiredVolumes() {
nodes := rc.actualStateOfWorld.GetNodesForVolume(volumeToAttach.VolumeName)
if len(nodes) > 0 {
if !volumeToAttach.MultiAttachErrorReported {
simpleMsg, detailedMsg := volumeToAttach.GenerateMsg("Multi-Attach error", "Volume is already exclusively attached to one node and can't be attached to another")
podNames := rc.desiredStateOfWorld.GetVolumePodsOnNodes(nodes, volumeToAttach.VolumeName)
var simpleMsg, detailedMsg string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we exclude current pod that is trying to attach this volume? Perhaps it does not matter. Also, it may be worth not "leaking" information about pods that do not exist in current user's namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a separate commit with elaborate logic that does not reveal pod names to users in another namespace and on the other way logs all pods to controller-manager logs.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 27, 2017
@jsafrane
Copy link
Member Author

/retest

@jsafrane
Copy link
Member Author

/retest
@gnufied @jingxu97, PTAL

@jsafrane
Copy link
Member Author

@gnufied @jingxu97, PTAL

}

// Get list of pods that use the volume on the other nodes.
pods := rc.desiredStateOfWorld.GetVolumePodsOnNodes(otherNodes, volumeToAttach.VolumeName)
Copy link
Member

@gnufied gnufied Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this information be retrieved from desired state of world or actual state of world? It is possible that - multiple pods are requesting same volume on more than 1 node and they are all not getting attached because some other pod(on some other node) is actually using the volume.

In which case - while MultiAttach error is still mostly accurate, we may be printing wrong information back to the user and to the admin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is that ASW does not keep list of pods for attached volumes. It's reconstructed from node.status when the controller is restarted and we don't put pods there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment below to:

// We did not find any pods that requests the volume. The pod must have been deleted already.

Message sent to user is: "Volume is already exclusively attached to one node and can't be attached to another" and IMO there is not much we could do about it.

So users knows what pods are blocking a volume and can realize their error.
@gnufied
Copy link
Member

gnufied commented Jan 24, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2018
@jsafrane
Copy link
Member Author

/retest
unrelated flakes

@jsafrane
Copy link
Member Author

/approve no-issue

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, jsafrane

Associated issue requirement bypassed by: jsafrane

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 25, 2018
@jsafrane
Copy link
Member Author

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 7de1a8e into kubernetes:master Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants