Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

Merged
merged 1 commit into from
Sep 7, 2018
Merged

Automated cherry pick of #67825: Fix VMWare VM freezing bug by reverting #51066 #68056

merged 1 commit into from
Sep 7, 2018

Conversation

nikopen
Copy link
Contributor

@nikopen nikopen commented Aug 30, 2018

Cherry pick of #67825 on release-1.9.

#67825: Fix VMWare VM freezing bug by reverting #51066

@k8s-ci-robot k8s-ci-robot added do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 30, 2018
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2018
@nikopen
Copy link
Contributor Author

nikopen commented Aug 30, 2018

/assign @mbohlool

@davidz627
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 30, 2018
@gnufied
Copy link
Member

gnufied commented Aug 30, 2018

/assign @divyenpatel

someone more familiar with vsphere cloudprovider should approve these for cherry-picks.

@mbohlool mbohlool added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Aug 31, 2018
@mbohlool
Copy link
Contributor

is it possible to add a test for this?

@nikopen
Copy link
Contributor Author

nikopen commented Aug 31, 2018

@mbohlool if you mean a repro, it has been tested here: #67825 (comment)

@divyenpatel @gnufied I believe consensus has to be reached for #65392 before merging this PR to 1.9, right?

@divyenpatel
Copy link
Member

divyenpatel commented Aug 31, 2018

Created Deployment.

# kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
wordpress-mysql-556d6bd778-m9hx7   1/1       Running   0          6m

Pod wordpress-mysql-556d6bd778-m9hx7 is scheduled on Node kubernetes-node2.

# kubectl describe  pods | grep "Node:"
Node:           kubernetes-node2/10.160.222.133

Powered Off Node kubernetes-node2.
New pod is spawned on the node kubernetes-node4

# kubectl get pods
NAME                               READY     STATUS              RESTARTS   AGE
wordpress-mysql-556d6bd778-z52sn   0/1       ContainerCreating   0          2m
# kubectl describe pod wordpress-mysql-556d6bd778-z52sn | grep "Node:"
Node:           kubernetes-node4/10.160.212.13

kubernetes-node2 is removed from the API server.

# kubectl get nodes
NAME                STATUS                     ROLES     AGE       VERSION
kubernetes-master   Ready,SchedulingDisabled   <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node1    Ready                      <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node3    Ready                      <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node4    Ready                      <none>    19m       v1.9.11-beta.0.21+2e8ae180915b4a

Pod remains in the ContainerCreating state for about 6 minutes.

Observed following errors.

{"log":"I0831 17:21:26.836461       1 vsphere.go:973] Starting DisksAreAttached API for vSphere with nodeVolumes: %+vmap[kubernetes-node2:[[sharedVMFS-0] kubevols/kubernetes-dynamic-pvc-fc331526-ad40-11e8-a85e-005056915ea0.vmdk]]\n","stream":"stderr","time":"2018-08-31T17:21:26.836554363Z"}
{"log":"I0831 17:21:26.836659       1 nodemanager.go:276] No VM found for node \"kubernetes-node2\". Initiating rediscovery.\n","stream":"stderr","time":"2018-08-31T17:21:26.836722126Z"}
{"log":"E0831 17:21:26.836778       1 nodemanager.go:279] Error \"No VM found\" node info for node \"kubernetes-node2\" not found\n","stream":"stderr","time":"2018-08-31T17:21:26.836809207Z"}
{"log":"E0831 17:21:26.836907       1 vsphere.go:986] Failed to convert volPaths to devicePaths: map[kubernetes-node2:[[sharedVMFS-0] kubevols/kubernetes-dynamic-pvc-fc331526-ad40-11e8-a85e-005056915ea0.vmdk]]. err: No VM found\n","stream":"stderr","time":"2018-08-31T17:21:26.836938773Z"}
{"log":"E0831 17:21:26.837027       1 attacher.go:130] Error checking if volumes are attached to nodes: map[kubernetes-node2:[[sharedVMFS-0] kubevols/kubernetes-dynamic-pvc-fc331526-ad40-11e8-a85e-005056915ea0.vmdk]]. err: No VM found\n","stream":"stderr","time":"2018-08-31T17:21:26.837059154Z"}
{"log":"E0831 17:21:26.837196       1 operation_generator.go:238] BulkVerifyVolume.BulkVerifyVolumes Error checking volumes are attached with No VM found\n","stream":"stderr","time":"2018-08-31T17:21:26.837326349Z"}

After 6 minutes, Pod state changed to running and volumes successfully attached to the new node kubernetes-node4.

# kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
wordpress-mysql-556d6bd778-z52sn   1/1       Running   0          6m

But Volumes never detached from the previous node kubernetes-node2.

image

To resolve above behavior we will either need to keep powered off node registered with the API server, or fix cloud provider with #63413 so that volumes gets detached from the powered off node which is also removed from API server.

cc: @SandeepPissay

@divyenpatel
Copy link
Member

when kubelet is stopped on the node we have following behavior.

root@kubernetes-node4 [ ~ ]# systemctl stop kubelet
root@kubernetes-node4 [ ~ ]#

kubernetes-node4 becomes NotReady, but not getting removed from API server.

# kubectl get nodes
NAME                STATUS                     ROLES     AGE       VERSION
kubernetes-master   Ready,SchedulingDisabled   <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node1    Ready                      <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node3    Ready                      <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
kubernetes-node4    NotReady                   <none>    45m       v1.9.11-beta.0.21+2e8ae180915b4a
# kubectl get pods
NAME                               READY     STATUS              RESTARTS   AGE
wordpress-mysql-556d6bd778-2hllv   0/1       ContainerCreating   0          24s
wordpress-mysql-556d6bd778-z52sn   1/1       Unknown             0          30m

wordpress-mysql-556d6bd778-z52sn marked for delete from Node kubernetes-node4, but remains in the Unknown state.
wordpress-mysql-556d6bd778-2hllv remains in the ContainerCreating state forever with following errors observed on the pod

  Warning  FailedAttachVolume     2m    attachdetach-controller    Multi-Attach error for volume "pvc-fc331526-ad40-11e8-a85e-005056915ea0" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount            48s   kubelet, kubernetes-node3  Unable to mount volumes for pod "wordpress-mysql-556d6bd778-2hllv_default(4410a36f-ad46-11e8-a85e-005056915ea0)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-556d6bd778-2hllv". list of unattached/unmounted volumes=[mysql-persistent-storage]

To resolve this we will need to force detach volumes from the pod with the Unknown state to allow attaching them to new node with the change made in #67847

@gnufied
Copy link
Member

gnufied commented Aug 31, 2018

@divyenpatel so in a nutshell this PR only prevents attach to another node for first 6 minutes. I think what might be happening is, DetachVolume call to vsphere is returning success even when it does not detaches the volumes from actual nodes, so volume is still attached to the node but the volume gets removed from attach/detach controller because DetachDisk returned success and hence it allow Volume to be attached to another node.

I frankly do not see a point in backporting this PR to older releases like 1.9, until we fix real detach issue. Volume is still being attached to multiple nodes, so why bother?

@gnufied
Copy link
Member

gnufied commented Aug 31, 2018

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 31, 2018
@nikopen
Copy link
Contributor Author

nikopen commented Aug 31, 2018

@divyenpatel thanks for your repro work.

@gnufied the PR fixes vmware-archive#500 / vmware-archive#502, and backporting it fixes it in previous versions as well. It was a custom behaviour for Vsphere on the attachdetach reconciler that forced multi-attach. It's a showstopper for us and we're running 1.9.10, thus the upstream backport. Many shops running older versions would happily bump the minor version instead of doing major upgrades.

Overall, it seems that there are many open PRs that solve the above problems with broken & shutdown nodes and volumes, but there seems to be a stalemate with that holds the PRs because of the discussion for the shutdown taint in #65392. When the deadlock is solved and solutions are merged (and backported), this backport should be good to go.

@gnufied
Copy link
Member

gnufied commented Aug 31, 2018

@nikopen how does this PR prevents multiattach? didn't @divyenpatel just proved we are still multiattaching? am i missing something.

So if this PR does not fixes multiattach, I would like to know what does it actually fixes and why should we backport it?

@gnufied
Copy link
Member

gnufied commented Aug 31, 2018

I understand validity of this PR on master branch btw. I know why we need there. On Master branch, this PR indeed prevents multiattach but on 1.9 branch, afaict this PR does not have much effect beyond delaying attach to another node for 6 minutes.

@nikopen
Copy link
Contributor Author

nikopen commented Sep 1, 2018

As described on DM, this is the VM freeze: vmware-archive#500 (comment), step 4.

The observation is that k8s is aggressively attempting to attach a disk to another node without first triggering the detach from the originating VM, causing the new VM to completely freeze as long as the volume is 'unavailable'/stuck elsewhere, starting from 1 minute and up to 3 mins or more.

It's the main purpose of the original PR, as it fixes the freeze. Backporting to all versions affected makes sense to me.

Here are the PRs for 1.10 and 1.11:
#68057
#68058

@divyenpatel
Copy link
Member

I agree with @nikopen. We should allow this change to be cherry-picked to other branches.
and provide a follow-up fix in either vSphere Cloud Provider or Kubernetes.

At least when kubelet is not responding or not able to communicate with API server, instead of continuous fail attempts to attach volumes on the new node, attachdetach-controller is simply failing with error Volume is already exclusively attached to one node and can't be attached to another.

Without this change, we will see lots of failure messages on the vSphere task console for failed attempt to attach hot disk on the new node.

We can also avoid freezing node VM, which can disturb existing workload running on that node.

@nikopen
Copy link
Contributor Author

nikopen commented Sep 4, 2018

Thanks @divyenpatel !

cc @saad-ali @mbohlool for approval

@gnufied
Copy link
Member

gnufied commented Sep 4, 2018

okay, I think this issue still does not handle real problem of actually detaching volumes from shutdown/deleted pods in 1.9 but we can merge this for now.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 4, 2018
@childsb
Copy link
Contributor

childsb commented Sep 6, 2018

/kind bug

@childsb
Copy link
Contributor

childsb commented Sep 6, 2018

/milestone v1.9

@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Sep 6, 2018
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 6, 2018
@childsb childsb added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. labels Sep 6, 2018
@childsb
Copy link
Contributor

childsb commented Sep 6, 2018

/approve

@childsb
Copy link
Contributor

childsb commented Sep 6, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: childsb, nikopen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@childsb childsb added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 7, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants