Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod with an inline volume created(and then deleted) post vsphere csi migration is stuck in terminating state #103745

Closed
sashrith opened this issue Jul 16, 2021 · 12 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@sashrith
Copy link
Contributor

sashrith commented Jul 16, 2021

What happened:

Enabled CSI migration for vsphere in-tree volumes
Created a pod with inline volume
After the pod reached running state deleted the pod
Pod is stuck in terminating state
From CSI logs and kubelet logs the volume appears to have detached.

root@k8s-control1:~# kubectl get pod -n vcp-2-csi-syncer-8081 pvc-tester-l2hzk -o wide
NAME               READY   STATUS        RESTARTS   AGE   IP           NODE          NOMINATED NODE   READINESS GATES
pvc-tester-l2hzk   0/1     Terminating   0          39m   10.244.2.3   k8s-worker2   <none>           <none>
root@k8s-control1:~#

Kubelet logs:
For detach

Jul 16 19:13:21 k8s-worker2 kubelet[2881332]: I0716 19:13:21.839013 2881332 operation_generator.go:930] UnmountDevice succeeded for volume "csi.vsphere.vmware.com-[vsanDatastore] e2e/test-1626462720247166001-271.vmdk" %!(EXTRA string=UnmountDevice succeeded for volume "csi.vsphere.vmware.com-[vsanDatastore] e2e/test-1626462720247166001-271.vmdk" (UniqueName: "kubernetes.io/csi/csi.vsphere.vmware.com^[vsanDatastore] e2e/test-1626462720247166001-271.vmdk") on node "k8s-worker2" )
Jul 16 19:13:21 k8s-worker2 kubelet[2881332]: I0716 19:13:21.916279 2881332 reconciler.go:319] "Volume detached for volume \"csi.vsphere.vmware.com-[vsanDatastore] e2e/test-1626462720247166001-271.vmdk\" (UniqueName: \"kubernetes.io/csi/csi.vsphere.vmware.com^[vsanDatastore] e2e/test-1626462720247166001-271.vmdk\") on node \"k8s-worker2\" DevicePath \"csi-0cf4501058ac832a28bca0d49e847c862376eb88e39048e03c47b06d3f809197\""

For termination of pod

Jul 16 19:13:04 k8s-worker2 kubelet[2881332]: I0716 19:13:04.780042 2881332 operation_generator.go:616] MountVolume.MountDevice succeeded for volume "csi.vsphere.vmware.com-[vsanDatastore] e2e/test-1626462720247166001-271.vmdk" (UniqueName: "kubernetes.io/csi/csi.vsphere.vmware.com^[vsanDatastore] e2e/test-1626462720247166001-271.vmdk") pod "pvc-tester-l2hzk" (UID: "83fb70e4-6f7f-44fe-986a-0bbdbdfe2154") device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/csi.vsphere.vmware.com-[vsanDatastore] e2e/test-1626462720247166001-271.vmdk/globalmount"
Jul 16 19:13:21 k8s-worker2 kubelet[2881332]: E0716 19:13:21.772364 2881332 kuberuntime_container.go:691] "Kill container failed" err="rpc error: code = Unknown desc = Error: No such container: 538c0c53d54734663b1024a9a2ac86416240acc33e3fa536162db85362c0bef7" pod="vcp-2-csi-syncer-8081/pvc-tester-l2hzk" podUID=83fb70e4-6f7f-44fe-986a-0bbdbdfe2154 containerName="write-pod" containerID={Type:docker ID:538c0c53d54734663b1024a9a2ac86416240acc33e3fa536162db85362c0bef7}
Jul 16 19:13:21 k8s-worker2 kubelet[2881332]: E0716 19:13:21.776238 2881332 kubelet_pods.go:1288] "Failed killing the pod" err="failed to \"KillContainer\" for \"write-pod\" with KillContainerError: \"rpc error: code = Unknown desc = Error: No such container: 538c0c53d54734663b1024a9a2ac86416240acc33e3fa536162db85362c0bef7\"" podName="pvc-tester-l2hzk"

What you expected to happen:

Pod to get deleted without manual intervention.

How to reproduce it (as minimally and precisely as possible):

Enabled CSI migration for vsphere in-tree volumes
Created a pod with inline volume
After the pod reaches ready/running state delete the pod

Anything else we need to know?:

Native kubernetes cluster configured using kubeadm.

Environment:

  • Kubernetes version (use kubectl version): v1.21.3
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
root@k8s-control1:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
root@k8s-control1:~#
@sashrith sashrith added the kind/bug Categorizes issue or PR as related to a bug. label Jul 16, 2021
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 16, 2021
@k8s-ci-robot
Copy link
Contributor

@sashrith: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 16, 2021
@sashrith
Copy link
Contributor Author

/cc @chethanv28 @divyenpatel

@sashrith
Copy link
Contributor Author

/sig sig-storage

@k8s-ci-robot
Copy link
Contributor

@sashrith: The label(s) sig/sig-storage cannot be applied, because the repository doesn't have them.

In response to this:

/sig sig-storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sashrith
Copy link
Contributor Author

@kubernetes/sig-storage-bugs

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 16, 2021
@k8s-ci-robot
Copy link
Contributor

@sashrith: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-bugs

In response to this:

@kubernetes/sig-storage-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msau42
Copy link
Member

msau42 commented Jul 16, 2021

/assign @divyenpatel

@divyenpatel
Copy link
Member

Here volume is detached by the CSI Driver.

Event message on the Pod Says

Jul 16 19:13:21 k8s-worker2 kubelet[2881332]: E0716 19:13:21.772364 2881332 kuberuntime_container.go:691] "Kill container failed" err="rpc error: code = Unknown desc = Error: No such container: 538c0c53d54734663b1024a9a2ac86416240acc33e3fa536162db85362c0bef7" pod="vcp-2-csi-syncer-8081/pvc-tester-l2hzk" podUID=83fb70e4-6f7f-44fe-986a-0bbdbdfe2154 containerName="write-pod" containerID={Type:docker ID:538c0c53d54734663b1024a9a2ac86416240acc33e3fa536162db85362c0bef7}
Jul 16 19:13:21 k8s-worker2 kubelet[2881332]: E0716 19:13:21.776238 2881332 kubelet_pods.go:1288] "Failed killing the pod" err="failed to "KillContainer" for "write-pod" with KillContainerError: "rpc error: code = Unknown desc = Error: No such container: 538c0c53d54734663b1024a9a2ac86416240acc33e3fa536162db85362c0bef7"" podName="pvc-tester-l2hzk"

The issue seems related to Docker.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 17, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 16, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

5 participants