Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure for upgrading due to short drain_grace_period and drain_timeout #1453

Closed
Abdelsalam-Abbas opened this issue Jul 16, 2017 · 1 comment

Comments

@Abdelsalam-Abbas
Copy link
Contributor

Abdelsalam-Abbas commented Jul 16, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug Report

Environment:

  • Cloud provider or hardware configuration:
    azure

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    Linux 4.4.0-81-generic x86_64

  • Version of Ansible (ansible --version):
    ansible 2.3.1.0
    config file = /home/devops/kargo/ansible.cfg
    configured module search path = [u'./library']
    python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]

Kubespray version (commit) (git rev-parse --short HEAD):
02e0fb5

Network plugin used:
flannel

Command used to invoke ansible:
ansible-playbook -i contrib/azurerm/inventory -u devops -b -e "@inventory/group_vars/all.yml" -e "@inventory/group_vars/k8s-cluster.yml" upgrade-cluster.yml -e kube_version=v1.6.2

Output of ansible run:

....
TASK [upgrade/pre-upgrade : Cordon node] **************************************************************************************************************************
Sunday 16 July 2017  19:56:41 +0000 (0:00:00.125)       0:33:59.417 ***********
changed: [minion-2 -> None]

TASK [upgrade/pre-upgrade : Drain node] ***************************************************************************************************************************
Sunday 16 July 2017  19:56:42 +0000 (0:00:00.945)       0:34:00.363 ***********
fatal: [minion-2 -> None]: FAILED! => {"changed": true, "cmd": ["/usr/local/bin/kubectl", "drain", "--force", "--ignore-daemonsets", "--grace-period", "30", "--tim
eout", "40s", "--delete-local-data", "minion-2"], "delta": "0:00:43.120196", "end": "2017-07-16 19:57:26.411965", "failed": true, "rc": 1, "start": "2017-07-16 19:
56:43.291769", "stderr": "WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: flannel-minion-2, kube-proxy-mini
on-2, nginx-proxy-minion-2; Deleting pods with local storage: rook-api-2485995279-7j6vw; Ignoring DaemonSet-managed pods: rook-ceph-osd-09mhn\nWARNING: Deleting po
ds not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: flannel-minion-2, kube-proxy-minion-2, nginx-proxy-minion-2; Deleting pods with
 local storage: rook-api-2485995279-7j6vw; Ignoring DaemonSet-managed pods: rook-ceph-osd-09mhn\nThere are pending pods when an error occurred: Drain did not complete within 40s\npod/wordpress-2634894193-f10kn\npod/kube-dns-2117142060-4ghsl\npod/rook-api-2485995279-7j6vw\npod/rook-ceph-mon2-x9xzn\nerror: Drain did not complete within 40s", "stderr_lines": ["WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: flannel-minion-2, kube-proxy-minion-2, nginx-proxy-minion-2; Deleting pods with local storage: rook-api-2485995279-7j6vw; Ignoring DaemonSet-managed pods: rook-ceph-osd-09mhn", "WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: flannel-minion-2, kube-proxy-minion-2, nginx-proxy-minion-2; Deleting pods with local storage: rook-api-2485995279-7j6vw; Ignoring DaemonSet-managed pods: rook-ceph-osd-09mhn", "There are pending pods when an error occurred: Drain did not complete within 40s", "pod/wordpress-2634894193-f10kn", "pod/kube-dns-2117142060-4ghsl", "pod/rook-api-2485995279-7j6vw", "pod/rook-ceph-mon2-x9xzn", "error: Drain did not complete within 40s"], "stdout": "node \"minion-2\" already cordoned", "stdout_lines": ["node \"minion-2\" already cordoned"]}
        to retry, use: --limit @/home/devops/kargo/upgrade-cluster.retry

Anything else do we need to know:

I would like to extend the drain_timeout and drain_grace_period as PR
#1454

@ykfq
Copy link

ykfq commented Mar 29, 2019

higher the timeouts for draining nodes make no sence, I just ignored the error and continu runing.
Add ignore_errors: yes under the task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants