Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to detach the vSphere volume from Powered off node #52575

Conversation

BaluDontu
Copy link
Contributor

@BaluDontu BaluDontu commented Sep 15, 2017

With the existing implementation when a vSphere node is powered off, the node is not deleted by the node controller and is in "NotReady" state. Following the approach similar to GCE as mentioned here - #46442.

I observe the following issues:

  • The pods on the powered off node are not instantaneously created on the other available node. Only after 5 minutes timeout, the pods will be created on other available nodes with the volume attached to it. This means an application downtime of around 5 minutes which is not good at all.
  • The volume on the powered off node are not detached at all when the pod with the volume is already moved to other available node. Hence any attempt to restart the powered off node will fail as the same volume is attached to other node which is present on this powered off node. (Please note that the volumes are not automatically detached from powered off in vSphere as opposed to GCE, AWS where volume is automatically detached from when node is powered off).

So inorder to resolve this problem, we have decided to back with the approach where the powered off node will be removed by the Node controller. So the above 2 problems will be resolved as follows:

  • Since the node is deleted, the pod on the powered off node becomes instantaneously available on other available nodes with the volume attached to the new nodes. Hence there is no application downtime at all.
  • After a period of 6 minutes (timeout period), the volumes are automatically detached from the powered off node. Hence any restarts after 6 minutes on the powered off node would work and not cause any problems as volumes are already detached.

For now, we would want to go ahead with deleting the node from node controller when a node is powered off in vCenter until we have a better approach. I think the best possible solution would be to introduce power handler in volume controller to see if the node is powered off before we can take any appropriate for attach/detach operations.

None

@jingxu97 @saad-ali @divyenpatel @luomiao @rohitjogvmw

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 15, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @BaluDontu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 15, 2017
@k8s-github-robot k8s-github-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 15, 2017
@BaluDontu BaluDontu force-pushed the vSphereInstanceNotFoundOnPowerOff branch from 301756b to c7fe832 Compare September 15, 2017 22:46
@xiangpengzhao
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 16, 2017
@jsafrane
Copy link
Member

/unassign
/assign @jingxu97

@k8s-ci-robot k8s-ci-robot assigned jingxu97 and unassigned jsafrane Sep 18, 2017
@BaluDontu
Copy link
Contributor Author

/retest

1 similar comment
@BaluDontu
Copy link
Contributor Author

/retest

@jingxu97
Copy link
Contributor

Do you have tests for this change? What are the test scenarios?

@BaluDontu
Copy link
Contributor Author

@jingxu97 : I plan to add the E2E test once this PR is checked in. The following test scenarios are done for this PR:
Consider a 4 node kubernetes cluster.

  1. Deploy pod using dynamically provisioned volume. For example, consider the pod is deployed on node3 with vol1 attached.
  2. Power off node3 in vCenter.
  3. Check if the node is deleted from kubernetes cluster by node controller (Approximately after 2 mins timeout).
  4. Check if the pod is deployed on other available node say node1 with vol1 attached immediately to the node1. But the vol1 is still attached to node3. Since the node3 is powered off the vol1 can still be attached to other available node which is this case is node1.
  5. Access application to see if it is working fine.
  6. Check if the volume vol1 is detached from powered off node - node3 after 6 minutes.

@BaluDontu
Copy link
Contributor Author

BaluDontu commented Sep 25, 2017

Please find the following logs when node goes into "NotReady" state after node is powered off in vCenter. Overall I don't see pod getting evicted from powered off node nor the volume getting detached from powered off node.

Initially when pod is deployed, it is deployed on node3.

{"log":"I0925 04:13:30.631608       1 reconciler.go:274] Starting attacherDetacher.AttachVolume for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:13:30.631739036Z"}

{"log":"I0925 04:13:30.631836       1 reconciler.go:277] attacherDetacher.AttachVolume started for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:13:30.631926388Z"}

{"log":"I0925 04:13:30.632544       1 vsphere.go:487] Calling AttachDisk with vmDiskPath: [vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk, storagePolicyID: , nodeName: node3\n","stream":"stderr","time":"2017-09-25T04:13:30.632634353Z"}

{"log":"I0925 04:13:39.616474       1 operation_generator.go:278] AttachVolume.Attach succeeded for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:13:39.616527083Z"}

{"log":"I0925 04:13:39.616532       1 actual_state_of_world.go:461] Add new node \"node3\" to nodesToUpdateStatusFor\n","stream":"stderr","time":"2017-09-25T04:13:39.6165624Z"}

{"log":"I0925 04:13:39.666394       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:13:39.666453982Z"}

{"log":"I0925 04:13:39.675867       1 node_status_updater.go:143] Updating status for node \"node3\" succeeded. patchBytes: \"{\\\"status\\\":{\\\"volumesAttached\\\":[{\\\"devicePath\\\":\\\"/dev/disk/by-id/wwn-0x6000c2963a50d71c71b270ffa8728eb4\\\",\\\"name\\\":\\\"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\\\"}]}}\" VolumesAttached: [{kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk /dev/disk/by-id/wwn-0x6000c2963a50d71c71b270ffa8728eb4}]\n","stream":"stderr","time":"2017-09-25T04:13:39.676597437Z"}

Now after node3 is powered off in vCenter, I see the following

>>> kubectl get nodes
NAME      STATUS                     AGE       VERSION
master    Ready,SchedulingDisabled   1h        v1.9.0-alpha.1.276+a61bbac2370413
node1     Ready                      1h        v1.9.0-alpha.1.276+a61bbac2370413
node2     Ready                      1h        v1.9.0-alpha.1.276+a61bbac2370413
node3     NotReady                   1h        v1.9.0-alpha.1.276+a61bbac2370413
node4     Ready                      1h        v1.9.0-alpha.1.276+a61bbac2370413

From the below logs, I see the pod is never evicted from powered off node - node3. But a new pod is created on node1 after 5 mins when node3 is powered off but the old pod is never deleted. Also I don't see the detach call on node3 at all.

{"log":"I0925 04:21:19.463872       1 endpoints_controller.go:570] Pod is out of service: default/balu-master-ccd4cd947-9f84z\n","stream":"stderr","time":"2017-09-25T04:21:19.463900717Z"}

{"log":"I0925 04:21:19.464637       1 endpoints_controller.go:508] Update endpoints for default/balu-master, ready: 0 not ready: 1\n","stream":"stderr","time":"2017-09-25T04:21:19.4647036Z"}

log":"I0925 04:21:19.467327       1 disruption.go:340] updatePod called on pod \"balu-master-ccd4cd947-9f84z\"\n","stream":"stderr","time":"2017-09-25T04:21:19.470231827Z"}

{"log":"I0925 04:21:19.467745       1 disruption.go:403] No PodDisruptionBudgets found for pod balu-master-ccd4cd947-9f84z, PodDisruptionBudget controller will avoid syncing.\n","stream":"stderr","time":"2017-09-25T04:21:19.470235077Z"}

{"log":"I0925 04:21:19.467767       1 disruption.go:343] No matching pdb for pod \"balu-master-ccd4cd947-9f84z\"\n","stream":"stderr","time":"2017-09-25T04:21:19.470238516Z"}

"log":"I0925 04:21:29.854865       1 node_controller.go:975] node node3 hasn't been updated for 50.445153027s. Last ready condition is: {Type:Ready Status:Unknown LastHeartbeatTime:2017-09-25 04:20:39 +0000 UTC LastTransitionTime:2017-09-25 04:21:19 +0000 UTC Reason:NodeStatusUnknown Message:Kubelet stopped posting node status.}\n","stream":"stderr","time":"2017-09-25T04:21:29.855406386Z"}

{"log":"I0925 04:21:55.374329       1 node_controller.go:1009] node node3 hasn't been updated for 1m15.964620522s. Last DiskPressure is: \u0026NodeCondition{Type:DiskPressure,Status:Unknown,LastHeartbeatTime:2017-09-25 04:20:39 +0000 UTC,LastTransitionTime:2017-09-25 04:21:19 +0000 UTC,Reason:NodeStatusUnknown,Message:Kubelet stopped posting node status.,}\n","stream":"stderr","time":"2017-09-25T04:21:55.374426158Z"}

{"log":"E0925 04:21:55.469791       1 node_controller.go:717] Error determining if node node3 exists in cloud: The node \"node3\" is not active\n","stream":"stderr","time":"2017-09-25T04:21:55.469949901Z"}

{"log":"I0925 04:21:57.500153       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:21:57.500271311Z"}

{"log":"E0925 04:24:03.059675       1 node_controller.go:717] Error determining if node node3 exists in cloud: The node \"node3\" is not active\n","stream":"stderr","time":"2017-09-25T04:24:03.059828213Z"}

{"log":"I0925 04:24:08.060931       1 node_controller.go:1009] node node3 hasn't been updated for 3m28.651221549s. Last DiskPressure is: \u0026NodeCondition{Type:DiskPressure,Status:Unknown,LastHeartbeatTime:2017-09-25 04:20:39 +0000 UTC,LastTransitionTime:2017-09-25 04:21:19 +0000 UTC,Reason:NodeStatusUnknown,Message:Kubelet stopped posting node status.,}\n","stream":"stderr","time":"2017-09-25T04:24:08.061064303Z"}

{"log":"I0925 04:24:08.340132       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:24:08.340278537Z"}

{"log":"I0925 04:25:39.881642       1 controller_utils.go:237] Recording Deleting all Pods from Node node3. event message for node node3\n","stream":"stderr","time":"2017-09-25T04:25:39.882654741Z"}

{"log":"I0925 04:25:39.814780       1 node_controller.go:1009] node node3 hasn't been updated for 5m0.40506846s. Last DiskPressure is: \u0026NodeCondition{Type:DiskPressure,Status:Unknown,LastHeartbeatTime:2017-09-25 04:20:39 +0000 UTC,LastTransitionTime:2017-09-25 04:21:19 +0000 UTC,Reason:NodeStatusUnknown,Message:Kubelet stopped posting node status.,}\n","stream":"stderr","time":"2017-09-25T04:25:39.814871559Z"}

{"log":"I0925 04:25:39.814887       1 node_controller.go:683] Node is unresponsive. Adding Pods on Node node3 to eviction queues: 2017-09-25 04:25:39.814833791 +0000 UTC is later than 2017-09-25 04:21:19.428542391 +0000 UTC + 4m20s\n","stream":"stderr","time":"2017-09-25T04:25:39.814936144Z"}

{"log":"I0925 04:25:39.914513       1 controller_utils.go:99] Starting deletion of pod default/balu-master-ccd4cd947-9f84z\n","stream":"stderr","time":"2017-09-25T04:25:39.914566985Z"}

{"log":"I0925 04:25:39.919672       1 disruption.go:340] updatePod called on pod \"balu-master-ccd4cd947-9f84z\"\n","stream":"stderr","time":"2017-09-25T04:25:39.919702808Z"}

{"log":"I0925 04:25:39.919774       1 disruption.go:403] No PodDisruptionBudgets found for pod balu-master-ccd4cd947-9f84z, PodDisruptionBudget controller will avoid syncing.\n","stream":"stderr","time":"2017-09-25T04:25:39.919805761Z"}

{"log":"I0925 04:25:39.919824       1 disruption.go:343] No matching pdb for pod \"balu-master-ccd4cd947-9f84z\"\n","stream":"stderr","time":"2017-09-25T04:25:39.919845942Z"}

{"log":"I0925 04:25:39.919986       1 event.go:218] Event(v1.ObjectReference{Kind:\"Pod\", Namespace:\"default\", Name:\"balu-master-ccd4cd947-9f84z\", UID:\"e36b2fb2-a1a7-11e7-91e0-00505680abc3\", APIVersion:\"v1\", ResourceVersion:\"2338\", FieldPath:\"\"}): type: 'Normal' reason: 'NodeControllerEviction' Marking for deletion Pod balu-master-ccd4cd947-9f84z from Node node3\n","stream":"stderr","time":"2017-09-25T04:25:39.920017781Z"}

{"log":"I0925 04:25:39.968463       1 disruption.go:328] addPod called on pod \"balu-master-ccd4cd947-d4bfd\"\n","stream":"stderr","time":"2017-09-25T04:25:39.970452916Z"}

{"log":"I0925 04:25:39.968605       1 endpoints_controller.go:435] Failed to find an IP for pod default/balu-master-ccd4cd947-d4bfd\n","stream":"stderr","time":"2017-09-25T04:25:39.97052436Z"}

{"log":"I0925 04:25:39.968619       1 endpoints_controller.go:439] Pod is being deleted default/balu-master-ccd4cd947-9f84z\n","stream":"stderr","time":"2017-09-25T04:25:39.970532969Z"}

{"log":"I0925 04:25:40.133268       1 reconciler.go:274] Starting attacherDetacher.AttachVolume for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node2\" \n","stream":"stderr","time":"2017-09-25T04:25:40.133360391Z"}

{"log":"I0925 04:25:40.133323       1 reconciler.go:277] attacherDetacher.AttachVolume started for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node2\" \n","stream":"stderr","time":"2017-09-25T04:25:40.133415342Z"}

{"log":"I0925 04:25:40.133352       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:25:40.133425232Z"}

{"log":"I0925 04:25:40.133441       1 attacher.go:71] vSphere: Attach disk called for node node2\n","stream":"stderr","time":"2017-09-25T04:25:40.133484911Z"}

{"log":"I0925 04:25:40.133480       1 vsphere.go:487] Calling AttachDisk with vmDiskPath: [vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk, storagePolicyID: , nodeName: node2\n","stream":"stderr","time":"2017-09-25T04:25:40.133519732Z"}

{"log":"I0925 04:32:13.959283       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:32:13.959410021Z"}

{"log":"I0925 04:25:46.958658       1 operation_generator.go:278] AttachVolume.Attach succeeded for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node2\" \n","stream":"stderr","time":"2017-09-25T04:25:46.958992104Z"}

{"log":"I0925 04:25:46.958723       1 actual_state_of_world.go:310] Volume \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\" is already added to attachedVolume list to node \"node2\", update device path \"/dev/disk/by-id/wwn-0x6000c2963a50d71c71b270ffa8728eb4\"\n","stream":"stderr","time":"2017-09-25T04:25:46.958996524Z"}

{"log":"I0925 04:25:46.958741       1 actual_state_of_world.go:461] Add new node \"node2\" to nodesToUpdateStatusFor\n","stream":"stderr","time":"2017-09-25T04:25:46.959000086Z"}

{"log":"I0925 04:25:46.958753       1 actual_state_of_world.go:469] Report volume \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\" as attached to node \"node2\"\n","stream":"stderr","time":"2017-09-25T04:25:46.959003698Z"}

{"log":"I0925 04:25:46.963236       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-09-25T04:25:46.963393631Z"}

{"log":"I0925 04:25:46.963265       1 reconciler.go:249] Volume attached--touching for volume \"pvc-146177c0-a1a7-11e7-91e0-00505680abc3\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\") from node \"node2\" \n","stream":"stderr","time":"2017-09-25T04:25:46.963415659Z"}

{"log":"I0925 04:25:46.969057       1 attach_detach_controller.go:490] processVolumesInUse for node \"node2\"\n","stream":"stderr","time":"2017-09-25T04:25:46.969257987Z"}

{"log":"I0925 04:25:46.969076       1 actual_state_of_world.go:358] SetVolumeMountedByNode volume kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk to the node \"node2\" mounted true\n","stream":"stderr","time":"2017-09-25T04:25:46.969272432Z"}

{"log":"I0925 04:25:46.971265       1 node_status_updater.go:143] Updating status for node \"node2\" succeeded. patchBytes: \"{\\\"status\\\":{\\\"volumesAttached\\\":[{\\\"devicePath\\\":\\\"/dev/disk/by-id/wwn-0x6000c2963a50d71c71b270ffa8728eb4\\\",\\\"name\\\":\\\"kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk\\\"}]}}\" VolumesAttached: [{kubernetes.io/vsphere-volume/[vsanDatastore] b09de058-0074-1b5e-328d-020021aa0427/kubernetes-dynamic-pvc-146177c0-a1a7-11e7-91e0-00505680abc3.vmdk /dev/disk/by-id/wwn-0x6000c2963a50d71c71b270ffa8728eb4}]\n","stream":"stderr","time":"2017-09-25T04:25:46.971692569Z"}

{"log":"I0925 04:40:38.564699       1 node_controller.go:1009] node node3 hasn't been updated for 19m59.154990358s. Last DiskPressure is: \u0026NodeCondition{Type:DiskPressure,Status:Unknown,LastHeartbeatTime:2017-09-25 04:20:39 +0000 UTC,LastTransitionTime:2017-09-25 04:21:19 +0000 UTC,Reason:NodeStatusUnknown,Message:Kubelet stopped posting node status.,}\n","stream":"stderr","time":"2017-09-25T04:40:38.564829505Z"}

{"log":"E0925 04:40:38.658155       1 node_controller.go:717] Error determining if node node3 exists in cloud: The node \"node3\" is not active\n","stream":"stderr","time":"2017-09-25T04:40:38.658237473Z"}

{"log":"E0925 04:40:43.767559       1 node_controller.go:717] Error determining if node node3 exists in cloud: The node \"node3\" is not active\n","stream":"stderr","time":"2017-09-25T04:40:43.767724464Z"}

From the below you can see, a new pod is being created on node2 but not evicted from powered off node - node3.

>>> kubectl get pods
NAME                          READY     STATUS    RESTARTS   AGE
balu-master-ccd4cd947-9f84z   1/1       Unknown   0          31m
balu-master-ccd4cd947-d4bfd   1/1       Running   0          19m
balu-slave-dd89f6bd9-kk74l    1/1       Running   0          31m
frontend-685d7ff496-28gf5     1/1       Unknown   0          31m
frontend-685d7ff496-xh8n7     1/1       Running   0          19m

For the old pod - "balu-master-ccd4cd947-9f84z" which went into "Unknown" state, I see the following.


Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----			-------------		--------	------			-------
  32m		32m		1	default-scheduler				Normal		Scheduled		Successfully assigned balu-master-ccd4cd947-9f84z to node3
  32m		32m		1	kubelet, node3					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-2f47c" 
  32m		32m		1	kubelet, node3					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "pvc-146177c0-a1a7-11e7-91e0-00505680abc3" 
  32m		32m		1	kubelet, node3		spec.containers{master}	Normal		Pulling			pulling image "gcr.io/google_containers/redis:e2e"
  27m		27m		1	kubelet, node3		spec.containers{master}	Normal		Pulled			Successfully pulled image "gcr.io/google_containers/redis:e2e"
  27m		27m		1	kubelet, node3		spec.containers{master}	Normal		Created			Created container
  27m		27m		1	kubelet, node3		spec.containers{master}	Normal		Started			Started container
  20m		20m		1	controllermanager				Normal		NodeControllerEviction	Marking for deletion Pod balu-master-ccd4cd947-9f84z from Node node3

For the new pod - balu-master-ccd4cd947-d4bfd which is now in "Ready" state, I see the following events.

Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----			-------------		--------	------			-------
  20m		20m		1	default-scheduler				Normal		Scheduled		Successfully assigned balu-master-ccd4cd947-d4bfd to node2
  20m		20m		1	kubelet, node2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-2f47c" 
  20m		20m		1	kubelet, node2					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "pvc-146177c0-a1a7-11e7-91e0-00505680abc3" 
  20m		20m		1	kubelet, node2		spec.containers{master}	Normal		Pulling			pulling image "gcr.io/google_containers/redis:e2e"
  18m		18m		1	kubelet, node2		spec.containers{master}	Normal		Pulled			Successfully pulled image "gcr.io/google_containers/redis:e2e"
  18m		18m		1	kubelet, node2		spec.containers{master}	Normal		Created			Created container
  18m		18m		1	kubelet, node2		spec.containers{master}	Normal		Started			Started container

@BaluDontu
Copy link
Contributor Author

Few more logs where it creates a new pod on available node but old pod went into unknown state.
I have added kubectl describe pod on unknown status pod which shows containers on pod are still running.
CC @jingxu97

bdontu-m02:redis bdontu$ kubectl get pods
NAME                          READY     STATUS    RESTARTS   AGE
balu-master-ccd4cd947-8vlmn   1/1       Unknown   0          33m
balu-master-ccd4cd947-wt88s   1/1       Running   0          13m
balu-slave-dd89f6bd9-qwgkr    1/1       Running   0          33m
frontend-685d7ff496-8pvdh     1/1       Unknown   0          33m
frontend-685d7ff496-l7s6x     1/1       Running   0          13m
bdontu-m02:redis bdontu$ kubectl describe pods balu-master-ccd4cd947-8vlmn
Name:				balu-master-ccd4cd947-8vlmn
Namespace:			default
Node:				node3/10.192.56.58
Start Time:			Tue, 26 Sep 2017 10:25:01 -0700
Labels:				app=balu
				pod-template-hash=778078503
				role=master
				tier=backend
Annotations:			kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"balu-master-ccd4cd947","uid":"a066a2bb-a2df-11e7-bdb6-0050569cc5...
Status:				Terminating (expires Tue, 26 Sep 2017 10:46:06 -0700)
Termination Grace Period:	30s
Reason:				NodeLost
Message:			Node node3 which was running pod balu-master-ccd4cd947-8vlmn is unresponsive
IP:				172.1.82.3
Created By:			ReplicaSet/balu-master-ccd4cd947
Controlled By:			ReplicaSet/balu-master-ccd4cd947
Containers:
  master:
    Container ID:	docker://914a63286b242a5d8fdb025f7bc5ad1b76293636f9bf304572990ec19b53b21c
    Image:		gcr.io/google_containers/redis:e2e
    Image ID:		docker://sha256:e5e67996c442f903cda78dd983ea6e94bb4e542950fd2eba666b44cbd303df42
    Port:		6379/TCP
    State:		Running
      Started:		Tue, 26 Sep 2017 10:31:44 -0700
    Ready:		True
    Restart Count:	0
    Requests:
      cpu:		100m
      memory:		100Mi
    Environment:	<none>
    Mounts:
      /data from balu-master-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-zsrrs (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  balu-master-data:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	redis-master-claim
    ReadOnly:	false
  default-token-zsrrs:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-zsrrs
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----			-------------		--------	------			-------
  30m		30m		1	default-scheduler				Normal		Scheduled		Successfully assigned balu-master-ccd4cd947-8vlmn to node3
  30m		30m		1	kubelet, node3					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-zsrrs" 
  30m		30m		1	kubelet, node3					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "pvc-8b9a8207-a2df-11e7-bdb6-0050569cc50f" 
  30m		30m		1	kubelet, node3		spec.containers{master}	Normal		Pulling			pulling image "gcr.io/google_containers/redis:e2e"
  24m		24m		1	kubelet, node3		spec.containers{master}	Normal		Pulled			Successfully pulled image "gcr.io/google_containers/redis:e2e"
  24m		24m		1	kubelet, node3		spec.containers{master}	Normal		Created			Created container
  24m		24m		1	kubelet, node3		spec.containers{master}	Normal		Started			Started container
  10m		10m		1	controllermanager				Normal		NodeControllerEvictionMarking for deletion Pod balu-master-ccd4cd947-8vlmn from Node node3

@jingxu97
Copy link
Contributor

Debug with @BaluDontu offline.
When node is powered off, node_controller tries to mark the pod as terminated by putting the DeletionTimestamp. By describing the pod, we can see that

Status:             Terminating (expires Tue, 26 Sep 2017 10:46:06 -0700)
Termination Grace Period:   30s
Reason:             NodeLost
Message:            Node node3 which was running pod balu-master-ccd4cd947-8vlmn is unresponsive

Attach_detach controller should try to detach the volume for terminated pods (after release 1.7). See PR #45286. However, there is a check to see whether a pod is terminated or not in the controller. This function also checks containers' status (not the same as pod status). If container status is still running, the pod will not be considered as terminated.

In this scenario, since node is powered off, kubelet will not update container status so it is still showing as running. So attach_detach controller will not treat this pod as terminated and will not detach volume for this pod.
See the result of describing the terminated pod

Containers:
 master:
   Container ID:   docker://914a63286b242a5d8fdb025f7bc5ad1b76293636f9bf304572990ec19b53b21c
   Image:      gcr.io/google_containers/redis:e2e
   Image ID:       docker://sha256:e5e67996c442f903cda78dd983ea6e94bb4e542950fd2eba666b44cbd303df42
   Port:       6379/TCP
   State:      Running

I think we should remove the check for container status when determining the pod is terminated or not.
cc @yujuhong @saad-ali @gnufied

@jingxu97 jingxu97 added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. labels Sep 26, 2017
@BaluDontu
Copy link
Contributor Author

Thanks @jingxu97 for helping me out on this. I have one more query.
I also see the pod gets scheduled on the available node only after 5 minutes timeout. Until this point the application running in the pod is down or not accessible. Is there any way that we could determine the node is powered off (through an additional handler that can query cloud provider once node is responsive) so that we can move the pods on available nodes way before 5 minutes.

Also wanted to know how does GCE/AWS handle this scenario?

@gnufied
Copy link
Member

gnufied commented Sep 26, 2017

@jingxu97 shouldn't pod enter state Failed after Terminating state and that would cause detach regardless of container state? Or are we having a case where pod never reaches Failed state but is stuck in Terminating state?

So what you are saying would mean detach the volume if pod is in Terminating state and DeletionTimeStamp is set. I am wondering if that would cause problems though and detach volumes before pod has successfully terminated.

@jingxu97
Copy link
Contributor

@gnufied The problem I see is this code https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/volumehelper/volumehelper.go#L102

It checks the container status which is not updated by kubelet since node is down. Even though the pod is in terminating state, this function return false due to incorrect container status (running state)

@jingxu97
Copy link
Contributor

This issue is also related to #46442
Currently we have different behavior for stopped vm instances. For AWS etc, it treats stopped vm as no longer exist in cloud provider, so k8s will delete the node, pods will be garbage collected. For GCE etc, stopped VM is still considered in cloud so k8s will not delete the node. The pods will be in unknown state and will not be garbage collected (deleted).
So I am ok with this change, so basically vsphere will have the similar behavior as AWS with this change. So pods will be deleted and volume will be detached with this change.

@BaluDontu Please also try to have e2e test for this change. You can put it in a different PR

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 28, 2017
@k8s-github-robot
Copy link

/test all

Tests are more than 96 hours old. Re-running tests.

@BaluDontu
Copy link
Contributor Author

@jingxu: I am already working on E2E test and will have a PR ready soon.

@luomiao
Copy link

luomiao commented Sep 28, 2017

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BaluDontu, jingxu97, luomiao

Associated issue: 46442

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@BaluDontu
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 29, 2017

@BaluDontu: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-unit c7fe832 link /test pull-kubernetes-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51311, 52575, 53169). If you want to cherry-pick this change to another branch, please follow the instructions here.

@BaluDontu
Copy link
Contributor Author

@jingxu97 : Though this fix was able to resolve the issue when node was powered off. However it still fails when kubelet was stopped on the node where pod got provisioned. It is because, the node goes into NotReadyState when kubelet is stopped on the node.
I have described the issue in detail in #53408.

k8s-github-robot pushed a commit that referenced this pull request Oct 26, 2017
…52575-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #52131 #52575 upstream release 1.8

Cherry pick of #52131, #52575 on release-1.8

#52131: Implement bulk polling of volumes for vSphere
#52575: Unable to detach the vSphere volume from Powered off node

@BaluDontu 

**Release note**:

```release-note
BulkVerifyVolumes() implementation for vSphere Cloud Provider
Fixed volume detachment from powered off node
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants