Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shareProcessNamespace will cause the container no response, and restart docker will waiting for "Loading containers" #92214

Closed
xieyanker opened this issue Jun 17, 2020 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/usability Categorizes an issue or PR as relevant to SIG Usability.

Comments

@xieyanker
Copy link
Contributor

xieyanker commented Jun 17, 2020

What happened:

shareProcessNamespace will cause the container no response, and restart docker will waiting for "Loading containers"

What you expected to happen:

container has response, and restart docker no problem.

How to reproduce it (as minimally and precisely as possible):

  1. Create nginx pod as follows:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      name: nginx
    spec:
      shareProcessNamespace: true
      nodeName: master1
      containers:
      - image: registry.icp.com:5000/library/common/nginx-amd64:1.17.5
        name: nginx
  1. Find the nginx: master process nginx -g daemon off process, and kill it
root@xuewei81-tgquxn9spy-master-0:~/xjs# kubectl get pod -owide
NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE      NOMINATED NODE   READINESS GATES
nginx-699575b7df-2bxfd   1/1     Running   0          7s    10.151.161.19   master1   <none>           <none>
root@xuewei81-tgquxn9spy-master-0:~/xjs# docker ps | grep nginx-699575b7df-2bxfd
6e2231761cbc        540a289bab6c                                                             "nginx -g 'daemon of…"   14 seconds ago      Up 12 seconds                           k8s_nginx_nginx-699575b7df-2bxfd_default_24c7f7b4-b064-11ea-94b0-fa163e279982_0
849f76682e0a        registry.icp.com:5000/library/cke/kubernetes/pause-amd64:3.1             "/pause"                 17 seconds ago      Up 16 seconds                           k8s_POD_nginx-699575b7df-2bxfd_default_24c7f7b4-b064-11ea-94b0-fa163e279982_0
root@xuewei81-tgquxn9spy-master-0:~/xjs# 
root@xuewei81-tgquxn9spy-master-0:~/xjs# docker inspect 6e2231761cbc | grep -i pid
            "Pid": 2119,
            "PidMode": "container:849f76682e0a322d0109250d0d5eb3767248f18c2ee7457593729339fb400eef",
            "PidsLimit": 0,
root@xuewei81-tgquxn9spy-master-0:~/xjs# ps -ef | grep 2119 | grep -v color
root      2119  2093  0 14:31 ?        00:00:00 nginx: master process nginx -g daemon off;
systemd+  2152  2119  0 14:31 ?        00:00:00 nginx: worker process

Then, the 2152 's parent process will become "/pause"

root@xuewei81-tgquxn9spy-master-0:~/xjs# kill -9 2119
root@xuewei81-tgquxn9spy-master-0:~/xjs# 
root@xuewei81-tgquxn9spy-master-0:~/xjs# ps -f 2152
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
systemd+  2152  1897  0 14:31 ?        S      0:00 nginx: worker process
root@xuewei81-tgquxn9spy-master-0:~/xjs# ps -f 1897
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root      1897  1863  0 14:31 ?        Ss     0:00 /pause
root@xuewei81-tgquxn9spy-master-0:~/xjs# 
  1. At this moment, the commands "docker inspect" or "docker exec" or "docker logs" will no response:
root@xuewei81-tgquxn9spy-master-0:~/xjs# docker ps | grep nginx-699575b7df-2bxfd
6e2231761cbc        540a289bab6c                                                             "nginx -g 'daemon of…"   About a minute ago   Up About a minute                       k8s_nginx_nginx-699575b7df-2bxfd_default_24c7f7b4-b064-11ea-94b0-fa163e279982_0
849f76682e0a        registry.icp.com:5000/library/cke/kubernetes/pause-amd64:3.1             "/pause"                 About a minute ago   Up About a minute                       k8s_POD_nginx-699575b7df-2bxfd_default_24c7f7b4-b064-11ea-94b0-fa163e279982_0
root@xuewei81-tgquxn9spy-master-0:~/xjs# docker inspect 6e2231761cbc

^C
root@xuewei81-tgquxn9spy-master-0:~/xjs# docker exec -it 6e2231761cbc sh

^C
root@xuewei81-tgquxn9spy-master-0:~/xjs# docker logs 6e2231761cbc

^C
  1. And it will wait for "Loading containers" if restart docker:
root@xuewei81-tgquxn9spy-master-0:~/xjs# systemctl restart docker
Job for docker.service failed because a timeout was exceeded.
See "systemctl status docker.service" and "journalctl -xe" for details.

Jun 17 14:33:55 xuewei81-tgquxn9spy-master-0 dockerd[2814]: time="2020-06-17T14:33:55.525517831+08:00" level=info msg="Loading containers: start."
Jun 17 14:34:55 xuewei81-tgquxn9spy-master-0 systemd[1]: docker.service: Start operation timed out. Terminating.
Jun 17 14:34:55 xuewei81-tgquxn9spy-master-0 dockerd[2814]: time="2020-06-17T14:34:55.503401172+08:00" level=info msg="Processing signal 'terminated'"
  1. The docker will restart successfully only I kill the containerd-shim process
root@xuewei81-tgquxn9spy-master-0:~# ps -ef | grep 6e2231761cbc | grep containerd-shim
root      2093 14254  0 14:31 ?        00:00:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/6e2231761cbc4b6ceff36dbcc4cfae67530377616aed47ba34cf0d057f37301d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
root@xuewei81-tgquxn9spy-master-0:~# kill -9 2093
root@xuewei81-tgquxn9spy-master-0:~# 
root@xuewei81-tgquxn9spy-master-0:~# journalctl -u docker -f
-- Logs begin at Wed 2020-01-01 11:46:34 CST. --
Jun 17 14:35:36 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:36.054660644+08:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Jun 17 14:35:36 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:36.054673960+08:00" level=warning msg="Your kernel does not support cgroup blkio weight"
Jun 17 14:35:36 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:36.054684371+08:00" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jun 17 14:35:36 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:36.055271704+08:00" level=info msg="Loading containers: start."
Jun 17 14:35:37 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:37.806790934+08:00" level=info msg="There are old running containers, the network config will not take affect"
Jun 17 14:35:39 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:39.038308094+08:00" level=info msg="Loading containers: done."
Jun 17 14:35:39 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:39.178666784+08:00" level=info msg="Docker daemon" commit=0dd43dd graphdriver(s)=overlay2 version=18.09.8
Jun 17 14:35:39 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:39.178815965+08:00" level=info msg="Daemon has completed initialization"
Jun 17 14:35:39 xuewei81-tgquxn9spy-master-0 dockerd[3409]: time="2020-06-17T14:35:39.258160782+08:00" level=info msg="API listen on /var/run/docker.sock"
Jun 17 14:35:39 xuewei81-tgquxn9spy-master-0 systemd[1]: Started Docker Application Container Engine.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    1.14.3
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
root@xuewei81-tgquxn9spy-master-0:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
root@xuewei81-tgquxn9spy-master-0:~# uname -a
Linux xuewei81-tgquxn9spy-master-0 5.0.0-29-generic #31~18.04.1-Ubuntu SMP Thu Sep 12 18:29:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
    My docker version is:
root@xuewei81-tgquxn9spy-master-0:~# docker version
Client:
 Version:           18.09.8
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        0dd43dd87f
 Built:             Wed Jul 17 17:41:19 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.8
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       0dd43dd
  Built:            Wed Jul 17 17:07:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false
@xieyanker xieyanker added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 17, 2020
@xieyanker
Copy link
Contributor Author

/sig usability

@k8s-ci-robot k8s-ci-robot added sig/usability Categorizes an issue or PR as relevant to SIG Usability. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 17, 2020
@zjj2wry
Copy link
Contributor

zjj2wry commented Jun 18, 2020

/cc
/sig node

@xieyanker
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 18, 2020
@qingwave
Copy link
Contributor

Same promblem, run kill -9 nginx, the nginx daemon exited, but nginx worker is also running (became orphaned process, ppid is containerd-shim), docker will hang when operate it, unless delete all orphaned process or delete pod(destroy pid namespace) or delete parent(containerd-shim).

It seems a docker bug, but I cannot reproduce it by docker with docker run -d --pid ...

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 25, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liupeng0518
Copy link
Member

1.18/1.19 no such problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/usability Categorizes an issue or PR as relevant to SIG Usability.
Projects
None yet
Development

No branches or pull requests

6 participants