Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes attempts to unmount a wrong vSphere volume and stops making any progress after that #37332

Closed
wants to merge 2 commits into from

Conversation

BaluDontu
Copy link
Contributor

@BaluDontu BaluDontu commented Nov 23, 2016

Kubernetes attempts to unmount a wrong vSphere volume in v1.4.5. This fix addresses issue #37022

The following steps are made to reproduce the issue:

  1. kubectl create -f guestbook.yaml
    A deployment is created with a volume spec - "hello-master" and volumePath - "[datastore1] kubevols/sample-volume". At this point, the pod has the volume "[datastore1]kubevols/sample-volume" mounted on it.

  2. kubectl delete -f guestbook.yaml
    When the deployment is deleted, kubernetes attempts to unmount the volume with volumeName - hello-master. Since it cannot find the volume "hello-master" mounted on the node, it stalls and doesn't make any progress after it.

Because of this, any subsequent recreation of deployments fails to mount the volume on the node as the node stalls.

This problem occurred after fixing for issue #33203 on kubernetes repo - Fix volume states out of sync problem after kubelet restarts. There were changes to SYNC loop logic in this fix.

There are 2 fixes that needs to made on 1.4.5 branch

  1. The changes to constructVolumeSpec() in the vSphere plugin to get the volumePath instead of volumeName.
  2. The changes to getDeviceNameFromMount() to get the full volumePath - "[datastore1]kubevols/sample-volume" instead of the just base of the volumePath -"sample-volume". I just observed that this function is fixed as a part of fix issue in converting aws volume id from mount paths #36840 on master branch.

After these changes, the deployment creations and deletions worked perfectly fine. Also the recreation of the deployment worked fine. The volumes are mounted to the pods correctly.

@abrarshivani @kerneltime


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands will still work. Regular contributors should join the org to skip this step.

If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.


If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.


If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 23, 2016
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Nov 23, 2016
@saad-ali
Copy link
Member

@abrarshivani @abithap can you PTAL

@saad-ali saad-ali added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Nov 23, 2016
@saad-ali saad-ali added this to the v1.5 milestone Nov 23, 2016
@saad-ali
Copy link
Member

@k8s-bot ok to test

@saad-ali
Copy link
Member

Bug fix ok for post-code-freeze merge for 1.5.

Once this is approved and merged, it should be cherry picked back to 1.4 branch.

@BaluDontu please make sure you sign the CLA

@@ -119,11 +119,14 @@ func (plugin *vsphereVolumePlugin) newUnmounterInternal(volName string, podUID t
}

func (plugin *vsphereVolumePlugin) ConstructVolumeSpec(volumeName, mountPath string) (*volume.Spec, error) {
mounter := plugin.host.GetMounter()
pluginDir := plugin.host.GetPluginDir(plugin.GetPluginName())
volumePath, _ := mounter.GetDeviceNameFromMount(mountPath, pluginDir)
vsphereVolume := &api.Volume{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add volumePath to log

@saad-ali
Copy link
Member

@BaluDontu please make sure you sign the CLA

@BaluDontu please make sure to sign the CLA or we will not be able to accept this fix.

@abrarshivani
Copy link
Contributor

@saad-ali LGTM

@BaluDontu BaluDontu closed this Nov 23, 2016
@BaluDontu BaluDontu deleted the FixUnmountVolume branch November 23, 2016 23:55
@BaluDontu
Copy link
Contributor Author

I accidentally closed this PR. Can somebody please open this PR if possible. Otherwise, I would post a new PR.

@saad-ali
Copy link
Member

New PR: #37413

k8s-github-robot pushed a commit that referenced this pull request Dec 1, 2016
Automatic merge from submit-queue

kubernetes attempts to unmount a wrong vSphere volume and stops making any progress after that

This is in reference to the bug #37332 which was accidentally closed. So created this new PR.

The code is already reviewed as part of PR #37332 

Fixes issue #37022 

@saad-ali @jingxu97 @abrarshivani @kerneltime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants