Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker get stuck #84978

Open
yehaifeng opened this issue Nov 8, 2019 · 1 comment

Comments

@yehaifeng
Copy link

@yehaifeng yehaifeng commented Nov 8, 2019

What happened: NodeNotReady, and docker get stuck

What you expected to happen: NodeReady

How to reproduce it (as minimally and precisely as possible): I don not know, because it's the first time I've had this problem.

Anything else we need to know?:
The cmd docker ps has no response.

kubelet logs:
Nov 08 16:25:22 paas-node010014.99bill.com origin-node[29837]: I1108 16:25:22.761754 29837 kubelet.go:1758] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
Nov 08 16:25:27 paas-node010014.99bill.com origin-node[29837]: I1108 16:25:27.761869 29837 kubelet.go:1758] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
Nov 08 16:25:32 paas-node010014.99bill.com origin-node[29837]: I1108 16:25:32.762007 29837 kubelet.go:1758] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
Nov 08 16:25:37 paas-node010014.99bill.com origin-node[29837]: I1108 16:25:37.762136 29837 kubelet.go:1758] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]

docker debug logs:
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.137030750+08:00" level=debug msg="{Action=version, LoginUID=4294967295, PID=29837}"
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.137424210+08:00" level=debug msg="Calling GET /v1.26/version"
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.137440620+08:00" level=debug msg="Unableto determine container for v1.26"
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.137473772+08:00" level=debug msg="{Action=version, LoginUID=4294967295, PID=29837}"
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.583780896+08:00" level=debug msg="Calling GET /v1.26/images/json"
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.583814581+08:00" level=debug msg="Unableto determine container for images"
Nov 08 16:26:45 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:45.583882248+08:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=29837}"
Nov 08 16:26:49 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:49.299579995+08:00" level=debug msg="Calling GET /v1.26/version"
Nov 08 16:26:49 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:49.299627054+08:00" level=debug msg="Unableto determine container for v1.26"
Nov 08 16:26:49 paas-node010014.99bill.com dockerd-current[1453]: time="2019-11-08T16:26:49.299689399+08:00" level=debug msg="{Action=version, LoginUID=4294967295, PID=29837}"

I find a problem that this is not only one process for a container:
[16:21:47 root@paas-node010014 ~]$ ps -ef | grep docker-containerd-shim-current | grep -v grep | awk '{print $9}' | sort | uniq -c |sort -nrk 1 | head -n 20
94 7b1520e468244a76030643230b1e252e73b9816744a07f2f319b9dd6539b445c
93 eea96e79dcfd440e927f9af661f01143a855a2f719286b0f48a8b965a05add28
93 e38a085ac498746d4e4dec3e498dcd3b884d43fd2353bc13ef24c37de370916b
93 d90ac9232a913b40f607ef23ad48d04f724252ad43028b6b5b5d11f67caa636d
93 d8eb0073c3b3cb0fc19849a1525f9d2e9f142109009e4c0b321c851dfaab513b
93 d0427a80e4ca7e54e11f578727dbc34859b59f33d574c57acf0b224574a31672
93 cb945971b91bce77df1746344ae7102453ad0af191bf474d899f7507d453f22b
93 c440f5130a9e8b481bb23327af62861b6da2a002c2aa9f092edbf7bc001c4a51
93 b884864d9e9b833d8f617d3da862db25037f9757f00834544a1a45f722f17145
93 b53eaa536b25df3f4082703cc3446e329783a8162aee3b59a94ff6565ba51436
93 a3e336103ed52173cc61ef28bbea2e23348f1166f0d4c471d291f6697d94a16d
93 7bd027d50666c31c4b48aa18ee51c37050b62f598fc79c444a77bec859321976
93 5958beefd58e8b4e41b14e606b573408e3ac47d62a910a68fe77c1e2ff2bc83d
93 513c5a3fe78bea3883ef9b3aeb4c2379f8d7e0fac0d41b7b356c61e5ec6b9501
93 4ab5680bc98f290dd0938ae6eadf146b80f1473a9730564502b4bcb405224517
93 28372a890148625f7826c5f4628cae77d593bdd7ed1352056d115f4bc289e26a
92 4bacb9a4a8f1661725fc9e363799985c731c6db1d617146090375339d5d3a375
1 fb23894a6351774b9e644ea3d692ad31a608f74fbe10685846eb413b65eff77c
1 f4b469a2b1430f4bfece83b3cbbccf6f472ad5ba6825d6276831307c4ab86c51
1 ee74bdcf7c0c7428ce92df0735ac8f6de9c3cbd8ae8ef918d25aad47c9591104

[16:26:54 root@paas-node010014 ~]$ ps -ef | grep docker-containerd-shim-current | wc -l
1651

Actually, there are only 86 containers:
[16:29:42 root@paas-node010014 ~]$ /usr/libexec/docker/docker-runc-current list | wc -l
86

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-15T09:45:30Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2019-10-17T19:23:55Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:

  • OS (e.g: cat /etc/os-release):
    NAME="Red Hat Enterprise Linux Server"
    VERSION="7.4 (Maipo)"
    ID="rhel"
    ID_LIKE="fedora"
    VARIANT="Server"
    VARIANT_ID="server"
    VERSION_ID="7.4"
    PRETTY_NAME="Red Hat Enterprise Linux Server 7.4 (Maipo)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:redhat:enterprise_linux:7.4:GA:server"
    HOME_URL="https://www.redhat.com/"
    BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.4"

  • Kernel (e.g. uname -a):
    Linux paas-node010014.99bill.com 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
    [16:04:15 openshift@st2-control-111a27 ~]$ oc version
    oc v3.11.0+62803d0-1
    kubernetes v1.11.0+d4cacc0
    features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://paas-sh-01.99bill.com:443
openshift v3.11.0+bd0bee4-337
kubernetes v1.11.0+d4cacc0

  • Network plugin and version (if this is a network-related bug):
  • Others:
@yehaifeng

This comment has been minimized.

Copy link
Author

@yehaifeng yehaifeng commented Nov 8, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.