Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

nikopen · 2018-08-30T10:07:56Z

Cherry pick of #67825 on release-1.9.

#67825: Fix VMWare VM freezing bug by reverting #51066

nikopen · 2018-08-30T11:00:56Z

/assign @mbohlool

davidz627 · 2018-08-30T15:56:12Z

/ok-to-test

gnufied · 2018-08-30T16:09:50Z

/assign @divyenpatel

someone more familiar with vsphere cloudprovider should approve these for cherry-picks.

mbohlool · 2018-08-31T06:45:24Z

is it possible to add a test for this?

nikopen · 2018-08-31T11:06:21Z

@mbohlool if you mean a repro, it has been tested here: #67825 (comment)

@divyenpatel @gnufied I believe consensus has to be reached for #65392 before merging this PR to 1.9, right?

divyenpatel · 2018-08-31T17:38:26Z

Created Deployment.

# kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
wordpress-mysql-556d6bd778-m9hx7   1/1       Running   0          6m

Pod wordpress-mysql-556d6bd778-m9hx7 is scheduled on Node kubernetes-node2.

# kubectl describe  pods | grep "Node:"
Node:           kubernetes-node2/10.160.222.133

Powered Off Node kubernetes-node2.
New pod is spawned on the node kubernetes-node4

# kubectl get pods
NAME                               READY     STATUS              RESTARTS   AGE
wordpress-mysql-556d6bd778-z52sn   0/1       ContainerCreating   0          2m

# kubectl describe pod wordpress-mysql-556d6bd778-z52sn | grep "Node:"
Node:           kubernetes-node4/10.160.212.13

kubernetes-node2 is removed from the API server.

# kubectl get nodes
NAME                STATUS                     ROLES     AGE       VERSION
kubernetes-master   Ready,SchedulingDisabled   <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node1    Ready                      <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node3    Ready                      <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node4    Ready                      <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a

Pod remains in the ContainerCreating state for about 6 minutes.

Observed following errors.

{"log":"I0831 17:21:26.836461       1 vsphere.go:973] Starting DisksAreAttached API for vSphere with nodeVolumes: %+vmap[kubernetes-node2:[[sharedVMFS-0] kubevols/kubernetes-dynamic-pvc-fc331526-ad40-11e8-a85e-005056915ea0.vmdk]]\n","stream":"stderr","time":"2018-08-31T17:21:26.836554363Z"}
{"log":"I0831 17:21:26.836659       1 nodemanager.go:276] No VM found for node \"kubernetes-node2\". Initiating rediscovery.\n","stream":"stderr","time":"2018-08-31T17:21:26.836722126Z"}
{"log":"E0831 17:21:26.836778       1 nodemanager.go:279] Error \"No VM found\" node info for node \"kubernetes-node2\" not found\n","stream":"stderr","time":"2018-08-31T17:21:26.836809207Z"}
{"log":"E0831 17:21:26.836907       1 vsphere.go:986] Failed to convert volPaths to devicePaths: map[kubernetes-node2:[[sharedVMFS-0] kubevols/kubernetes-dynamic-pvc-fc331526-ad40-11e8-a85e-005056915ea0.vmdk]]. err: No VM found\n","stream":"stderr","time":"2018-08-31T17:21:26.836938773Z"}
{"log":"E0831 17:21:26.837027       1 attacher.go:130] Error checking if volumes are attached to nodes: map[kubernetes-node2:[[sharedVMFS-0] kubevols/kubernetes-dynamic-pvc-fc331526-ad40-11e8-a85e-005056915ea0.vmdk]]. err: No VM found\n","stream":"stderr","time":"2018-08-31T17:21:26.837059154Z"}
{"log":"E0831 17:21:26.837196       1 operation_generator.go:238] BulkVerifyVolume.BulkVerifyVolumes Error checking volumes are attached with No VM found\n","stream":"stderr","time":"2018-08-31T17:21:26.837326349Z"}

After 6 minutes, Pod state changed to running and volumes successfully attached to the new node kubernetes-node4.

# kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
wordpress-mysql-556d6bd778-z52sn   1/1       Running   0          6m

But Volumes never detached from the previous node kubernetes-node2.

To resolve above behavior we will either need to keep powered off node registered with the API server, or fix cloud provider with #63413 so that volumes gets detached from the powered off node which is also removed from API server.

cc: @SandeepPissay

divyenpatel · 2018-08-31T17:56:46Z

when kubelet is stopped on the node we have following behavior.

root@kubernetes-node4 [ ~ ]# systemctl stop kubelet
root@kubernetes-node4 [ ~ ]#

kubernetes-node4 becomes NotReady, but not getting removed from API server.

# kubectl get nodes
NAME                STATUS                     ROLES     AGE       VERSION
kubernetes-master   Ready,SchedulingDisabled   <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node1    Ready                      <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node3    Ready                      <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node4    NotReady                   <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a

# kubectl get pods
NAME                               READY     STATUS              RESTARTS   AGE
wordpress-mysql-556d6bd778-2hllv   0/1       ContainerCreating   0          24s
wordpress-mysql-556d6bd778-z52sn   1/1       Unknown             0          30m

wordpress-mysql-556d6bd778-z52sn marked for delete from Node kubernetes-node4, but remains in the Unknown state.
wordpress-mysql-556d6bd778-2hllv remains in the ContainerCreating state forever with following errors observed on the pod

  Warning  FailedAttachVolume     2m    attachdetach-controller    Multi-Attach error for volume "pvc-fc331526-ad40-11e8-a85e-005056915ea0" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount            48s   kubelet, kubernetes-node3  Unable to mount volumes for pod "wordpress-mysql-556d6bd778-2hllv_default(4410a36f-ad46-11e8-a85e-005056915ea0)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-556d6bd778-2hllv". list of unattached/unmounted volumes=[mysql-persistent-storage]

To resolve this we will need to force detach volumes from the pod with the Unknown state to allow attaching them to new node with the change made in #67847

gnufied · 2018-08-31T18:26:04Z

@divyenpatel so in a nutshell this PR only prevents attach to another node for first 6 minutes. I think what might be happening is, DetachVolume call to vsphere is returning success even when it does not detaches the volumes from actual nodes, so volume is still attached to the node but the volume gets removed from attach/detach controller because DetachDisk returned success and hence it allow Volume to be attached to another node.

I frankly do not see a point in backporting this PR to older releases like 1.9, until we fix real detach issue. Volume is still being attached to multiple nodes, so why bother?

gnufied · 2018-08-31T18:26:32Z

/hold

nikopen · 2018-08-31T19:32:14Z

@divyenpatel thanks for your repro work.

@gnufied the PR fixes vmware-archive#500 / vmware-archive#502, and backporting it fixes it in previous versions as well. It was a custom behaviour for Vsphere on the attachdetach reconciler that forced multi-attach. It's a showstopper for us and we're running 1.9.10, thus the upstream backport. Many shops running older versions would happily bump the minor version instead of doing major upgrades.

Overall, it seems that there are many open PRs that solve the above problems with broken & shutdown nodes and volumes, but there seems to be a stalemate with that holds the PRs because of the discussion for the shutdown taint in #65392. When the deadlock is solved and solutions are merged (and backported), this backport should be good to go.

gnufied · 2018-08-31T19:44:54Z

@nikopen how does this PR prevents multiattach? didn't @divyenpatel just proved we are still multiattaching? am i missing something.

So if this PR does not fixes multiattach, I would like to know what does it actually fixes and why should we backport it?

gnufied · 2018-08-31T19:46:50Z

I understand validity of this PR on master branch btw. I know why we need there. On Master branch, this PR indeed prevents multiattach but on 1.9 branch, afaict this PR does not have much effect beyond delaying attach to another node for 6 minutes.

nikopen · 2018-09-01T14:09:30Z

As described on DM, this is the VM freeze: vmware-archive#500 (comment), step 4.

The observation is that k8s is aggressively attempting to attach a disk to another node without first triggering the detach from the originating VM, causing the new VM to completely freeze as long as the volume is 'unavailable'/stuck elsewhere, starting from 1 minute and up to 3 mins or more.

It's the main purpose of the original PR, as it fixes the freeze. Backporting to all versions affected makes sense to me.

Here are the PRs for 1.10 and 1.11:
#68057
#68058

divyenpatel · 2018-09-01T19:31:28Z

I agree with @nikopen. We should allow this change to be cherry-picked to other branches.
and provide a follow-up fix in either vSphere Cloud Provider or Kubernetes.

At least when kubelet is not responding or not able to communicate with API server, instead of continuous fail attempts to attach volumes on the new node, attachdetach-controller is simply failing with error Volume is already exclusively attached to one node and can't be attached to another.

Without this change, we will see lots of failure messages on the vSphere task console for failed attempt to attach hot disk on the new node.

We can also avoid freezing node VM, which can disturb existing workload running on that node.

nikopen · 2018-09-04T13:21:31Z

Thanks @divyenpatel !

cc @saad-ali @mbohlool for approval

gnufied · 2018-09-04T14:45:07Z

okay, I think this issue still does not handle real problem of actually detaching volumes from shutdown/deleted pods in 1.9 but we can merge this for now.

/hold cancel

childsb · 2018-09-06T14:55:10Z

/kind bug

childsb · 2018-09-06T14:56:39Z

/milestone v1.9

childsb · 2018-09-06T15:06:20Z

/approve

childsb · 2018-09-06T15:06:41Z

/lgtm

k8s-ci-robot · 2018-09-06T15:07:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: childsb, nikopen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/volume/attachdetach/OWNERS~~ [childsb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2018-09-06T18:13:48Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

k8s-github-robot · 2018-09-07T16:23:47Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2018-09-07T17:29:10Z

Automatic merge from submit-queue.

Fix VMWare VM freezing bug by reverting #51066

2e8ae18

k8s-ci-robot added do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 30, 2018

k8s-ci-robot requested review from davidz627 and gnufied August 30, 2018 10:08

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2018

k8s-ci-robot assigned mbohlool Aug 30, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2018

k8s-ci-robot assigned divyenpatel Aug 30, 2018

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 31, 2018

divyenpatel mentioned this pull request Sep 1, 2018

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68058

Merged

nikopen mentioned this pull request Sep 4, 2018

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68057

Merged

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 4, 2018

k8s-ci-robot added this to the v1.9 milestone Sep 6, 2018

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 6, 2018

childsb added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. labels Sep 6, 2018

k8s-ci-robot assigned childsb Sep 6, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2018

childsb added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 7, 2018

k8s-github-robot merged commit 06bf888 into kubernetes:release-1.9 Sep 7, 2018

nikopen mentioned this pull request Sep 24, 2018

REQUEST: New membership for nikopen kubernetes/org#113

Closed

6 tasks

divyenpatel mentioned this pull request Sep 24, 2018

Softlock when moving Persistent Volume between Nodes vmware-archive/kubernetes-archived#509

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

nikopen commented Aug 30, 2018

nikopen commented Aug 30, 2018

davidz627 commented Aug 30, 2018

gnufied commented Aug 30, 2018

mbohlool commented Aug 31, 2018

nikopen commented Aug 31, 2018

divyenpatel commented Aug 31, 2018 •

edited

Loading

divyenpatel commented Aug 31, 2018

gnufied commented Aug 31, 2018

gnufied commented Aug 31, 2018

nikopen commented Aug 31, 2018

gnufied commented Aug 31, 2018

gnufied commented Aug 31, 2018

nikopen commented Sep 1, 2018

divyenpatel commented Sep 1, 2018

nikopen commented Sep 4, 2018

gnufied commented Sep 4, 2018

childsb commented Sep 6, 2018 •

edited

Loading

childsb commented Sep 6, 2018

childsb commented Sep 6, 2018

childsb commented Sep 6, 2018

k8s-ci-robot commented Sep 6, 2018

fejta-bot commented Sep 6, 2018

k8s-github-robot commented Sep 7, 2018

k8s-github-robot commented Sep 7, 2018

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

Conversation

nikopen commented Aug 30, 2018

nikopen commented Aug 30, 2018

davidz627 commented Aug 30, 2018

gnufied commented Aug 30, 2018

mbohlool commented Aug 31, 2018

nikopen commented Aug 31, 2018

divyenpatel commented Aug 31, 2018 • edited Loading

divyenpatel commented Aug 31, 2018

gnufied commented Aug 31, 2018

gnufied commented Aug 31, 2018

nikopen commented Aug 31, 2018

gnufied commented Aug 31, 2018

gnufied commented Aug 31, 2018

nikopen commented Sep 1, 2018

divyenpatel commented Sep 1, 2018

nikopen commented Sep 4, 2018

gnufied commented Sep 4, 2018

childsb commented Sep 6, 2018 • edited Loading

childsb commented Sep 6, 2018

childsb commented Sep 6, 2018

childsb commented Sep 6, 2018

k8s-ci-robot commented Sep 6, 2018

fejta-bot commented Sep 6, 2018

k8s-github-robot commented Sep 7, 2018

k8s-github-robot commented Sep 7, 2018

divyenpatel commented Aug 31, 2018 •

edited

Loading

childsb commented Sep 6, 2018 •

edited

Loading