Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journald restart crashes Docker #19728

Closed
jeffjohnston opened this issue Jan 26, 2016 · 17 comments · Fixed by #22460
Closed

journald restart crashes Docker #19728

jeffjohnston opened this issue Jan 26, 2016 · 17 comments · Fixed by #22460

Comments

@jeffjohnston
Copy link
Contributor

It appears that a journald restart on Redhat 7.2 crashes Docker. I was getting my logging settings in place when I noticed that after restarting journald my Docker daemon would crash. I backed things out to where even without any containers if I restarted journald Docker crashes and then restarts.

It is very easy to reproduce. Just restart journald.

$ systemctl restart  systemd-journald.service

Then check Docker processes. After just a few times you get this.

$ docker ps
An error occurred trying to connect: Get http:///var/run/docker.sock/v1.21/containers/json: EOF
$ docker ps
Error response from daemon: 404 page not found
$ docker ps
CONTAINER ID        IMAGE ...

You can see it recovers. However, if I have containers running it will stop all of them and even require me to manually remove contatiners that show up as Dead. I also have to remove the containers sometimes as they appear to be in a state where they will not start without errors.

docker version:

Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:25:01 UTC 2015
OS/Arch: linux/amd64

Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:25:01 UTC 2015
OS/Arch: linux/amd64

docker info:

Containers: 13
Images: 62
Server Version: 1.9.1
Storage Driver: devicemapper
Pool Name: docker-253:0-67131623-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem:
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 4.038 GB
Data Space Total: 107.4 GB
Data Space Available: 31.62 GB
Metadata Space Used: 5.886 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.142 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
CPUs: 2
Total Memory: 7.64 GiB
Name: vmtstcon01.roomandboard.com
ID: 7J2R:VPDV:RZRX:NM3X:6JG3:5HGD:ZUIW:FGMW:RCJO:NULN:FS7Z:UD42
WARNING: bridge-nf-call-ip6tables is disabled

uname -a:

Linux vmtstcon01.roomandboard.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):

vmware

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 26, 2016

@jeffjohnston Could you find logs from crashed daemon, please? I don't see such behaviour, more that that - containers continue to write logs and docker logs works with minimal delay.

@jeffjohnston
Copy link
Contributor Author

Where would I find the logs? I also have syslog installed, but in Redhat 7 that is proxied through journald.

Here is an example of running a minimal container. Then I "journalctl -f" to see messages coming through. I bolded the steps to see the errors and then what the logs look like. Usually I run weave and 8 containers, but in this example I am just firing up an ubuntu container. Hope this helps.

First I run this:

$ docker run -t ubuntu

Then I restart journald:

$ systemctl restart  systemd-journald.service
an 26 12:51:02 vmtstcon01.roomandboard.com polkitd[741]: Registered Authentication Agent for unix-process:19257:102060055 (system bus name :1.1511 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd[1]: Stopping Flush Journal to Persistent Storage...
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd-journal[19082]: Journal stopped
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd-journal[19263]: Runtime journal is using 400.0M (max allowed 391.1M, trying to leave 586.7M free of 3.4G available → current limit 400.0M).
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd-journal[19263]: Runtime journal is using 400.0M (max allowed 391.1M, trying to leave 586.7M free of 3.4G available → current limit 400.0M).
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd-journald[19082]: Received SIGTERM from PID 1 (systemd).
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd-journal[19263]: Journal started
Jan 26 12:51:02 vmtstcon01.roomandboard.com polkitd[741]: Unregistered Authentication Agent for unix-process:19257:102060055 (system bus name :1.1511, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd[1]: Starting Flush Journal to Persistent Storage...
Jan 26 12:51:02 vmtstcon01.roomandboard.com systemd[1]: Started Flush Journal to Persistent Storage.

Then I look at docker processes: (sometimes I have to run this a few times before getting the error)

$ docker ps
An error occurred trying to connect: Get http:///var/run/docker.sock/v1.21/containers/json: EOF
Jan 26 12:51:58 vmtstcon01.roomandboard.com kernel: XFS (dm-18): Unmounting Filesystem
Jan 26 12:51:59 vmtstcon01.roomandboard.com kernel: docker0: port 3(veth7176452) entered disabled state
Jan 26 12:51:59 vmtstcon01.roomandboard.com kernel: device veth7176452 left promiscuous mode
Jan 26 12:51:59 vmtstcon01.roomandboard.com kernel: docker0: port 3(veth7176452) entered disabled state
Jan 26 12:51:59 vmtstcon01.roomandboard.com NetworkManager[720]: <info>  (docker0): bridge port veth7176452 was detached
Jan 26 12:51:59 vmtstcon01.roomandboard.com NetworkManager[720]: <info>  (veth7176452): released from master docker0
Jan 26 12:51:59 vmtstcon01.roomandboard.com NetworkManager[720]: <warn>  (veth7176452): failed to disable userspace IPv6LL address handling

Then I run the docker process again:

$ docker ps
Error response from daemon: 404 page not found
Jan 26 12:52:33 vmtstcon01.roomandboard.com systemd[1]: Starting Docker Application Container Engine...
Jan 26 12:52:33 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:33.750567284-06:00" level=warning msg="/!\\ DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING /!\\"
Jan 26 12:52:33 vmtstcon01.roomandboard.com systemd[1]: Device dev-disk-by\x2duuid-2abd3c86\x2d7f6d\x2d4331\x2d9cc0\x2d88f16a0971a5.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-3 and /sys/devices/virtual/block/loop0
Jan 26 12:52:33 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:33.762036601-06:00" level=warning msg="Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man docker` to refer to dm.thinpooldev section."
Jan 26 12:52:33 vmtstcon01.roomandboard.com systemd[1]: Device dev-disk-by\x2duuid-2abd3c86\x2d7f6d\x2d4331\x2d9cc0\x2d88f16a0971a5.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-3 and /sys/devices/virtual/block/dm-20
Jan 26 12:52:33 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:33.786302408-06:00" level=info msg="API listen on /var/run/docker.sock"
Jan 26 12:52:33 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:33.787215087-06:00" level=info msg="API listen on [::]:2375"
Jan 26 12:52:33 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:33.818301206-06:00" level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
Jan 26 12:52:33 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:33.851552649-06:00" level=info msg="Firewalld running: false"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.334913339-06:00" level=error msg="getNetworkFromStore for nid 0f55930e984f0b7795727e4979bfc49e6017e160224797c0d725674665300715 failed while trying to build sandbox for cleanup: network 0f55930e984f0b7795727e4979bfc49e6017e160224797c0d725674665300715 not found"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.335707950-06:00" level=warning msg="Failed getting network for ep 26070a70f81e1abfb16f61640714c8be9c4cc2112449120445ac7f2d55f9fbb6 during sandbox 0ca76370f67beb8494e228248a28d5d0ffda5542d09c9d6095aadd773a476150 delete: network 0f55930e984f0b7795727e4979bfc49e6017e160224797c0d725674665300715 not found"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.335801237-06:00" level=error msg="failed to delete sandbox 0ca76370f67beb8494e228248a28d5d0ffda5542d09c9d6095aadd773a476150 while trying to cleanup: could not cleanup all the endpoints in container b778508f65f7aa463e86714c5be4a77885be91633a3ccc32389ae42798faec71 / sandbox 0ca76370f67beb8494e228248a28d5d0ffda5542d09c9d6095aadd773a476150"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.387897386-06:00" level=error msg="getNetworkFromStore for nid 8639c6f541ee6b0fc73f8fb01c8390ecee8b54db241be32152d8b09d6b932aa5 failed while trying to build sandbox for cleanup: network 8639c6f541ee6b0fc73f8fb01c8390ecee8b54db241be32152d8b09d6b932aa5 not found"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.388041100-06:00" level=warning msg="Failed getting network for ep 9a93f9604bba41d443669e88e954ea3baaffa5cf4abe711240bdb4c5a519a5f2 during sandbox 139c01a72a19ace5799556ae4e30a9360783f854b75fb0bc021ce98e87dd0ea1 delete: network 8639c6f541ee6b0fc73f8fb01c8390ecee8b54db241be32152d8b09d6b932aa5 not found"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.388082597-06:00" level=error msg="failed to delete sandbox 139c01a72a19ace5799556ae4e30a9360783f854b75fb0bc021ce98e87dd0ea1 while trying to cleanup: could not cleanup all the endpoints in container 4f9aa17d20b802450a7dafd69eed082ed763cd1d8b73a450930efe1472215b0b / sandbox 139c01a72a19ace5799556ae4e30a9360783f854b75fb0bc021ce98e87dd0ea1"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.388201190-06:00" level=warning msg="Could not delete local endpoint gateway_b778508f65f7 during endpoint cleanup: endpoint with name gateway_b778508f65f7 id 40a0b5e2ab675e136dff1f5dc6ef2189eff78317a16ca4263937c1141ad13a95 has active containers"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.388267134-06:00" level=warning msg="Could not delete local endpoint gateway_4f9aa17d20b8 during endpoint cleanup: endpoint with name gateway_4f9aa17d20b8 id a2ec01fe990e03836794c1f319f12ac7ae54dad7368430801de34983aa0ce2c3 has active containers"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.411027875-06:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address"
Jan 26 12:52:34 vmtstcon01.roomandboard.com systemd[1]: Scope libcontainer-19298-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.465155549-06:00" level=info msg="Loading containers: start."
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: ...........................time="2016-01-26T12:52:34.476047154-06:00" level=error msg="Error unmounting device a7b7cd46d3ffd65d345af6aad0db9f69b927946e3ccdc523cf9b20e806aec427: UnmountDevice: device not-mounted id a7b7cd46d3ffd65d345af6aad0db9f69b927946e3ccdc523cf9b20e806aec427"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.476557827-06:00" level=info msg="Loading containers: done."
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.476587539-06:00" level=info msg="Daemon has completed initialization"
Jan 26 12:52:34 vmtstcon01.roomandboard.com docker[19298]: time="2016-01-26T12:52:34.476612585-06:00" level=info msg="Docker daemon" commit=a34a1d5 execdriver=native-0.2 graphdriver=devicemapper version=1.9.1
Jan 26 12:52:34 vmtstcon01.roomandboard.com systemd[1]: Started Docker Application Container Engine.

@marcofvera
Copy link

I got the same error after upgrade to new kernel version on CentOS 7.2.

uname -a
Linux dsv-marco01 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

And I can't start docker daemon with that 'driver not supported' error.

docker daemon -D
DEBU[0000] docker group found. gid: 991
DEBU[0000] Server created for HTTP on unix (/var/run/docker.sock)
ERRO[0000] [graphdriver] prior storage driver "devicemapper" failed: driver not supported
FATA[0000] Error starting daemon: error initializing graphdriver: driver not supported
rpm -q docker-engine
docker-engine-1.9.1-1.el7.centos.x86_64

@ghost
Copy link

ghost commented Feb 8, 2016

We are also having this problem with RHEL 7.2.

[rekop@tricia ~]$ docker version 
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:25:01 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:25:01 UTC 2015
 OS/Arch:      linux/amd64

[rekop@tricia ~]$ docker info
Containers: 15
Images: 96
Server Version: 1.9.1
Storage Driver: overlay
 Backing Filesystem: xfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: Red Hat Enterprise Linux
CPUs: 8
Total Memory: 15.51 GiB
Name: tricia.XXXXXX
ID: XXX
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
[rekop@tricia ~]$ date
Mon Feb  8 11:27:04 EET 2016
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[rekop@tricia ~]$ docker run -td XXX/solr:5.3.1 bash
805b9706ceaa2e15254424cf5e95ef2098b2cf445a45c1264e9e6b339acd7431
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED             STATUS              PORTS               NAMES
805b9706ceaa        XXX/solr:5.3.1   "bash"              2 seconds ago       Up 1 seconds        8983/tcp            high_almeida
[rekop@tricia ~]$ sudo systemctl restart systemd-journald.service
[sudo] password for rekop: 
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED              STATUS              PORTS               NAMES
805b9706ceaa        XXX/solr:5.3.1   "bash"              About a minute ago   Up About a minute   8983/tcp            high_almeida
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED              STATUS              PORTS               NAMES
805b9706ceaa        XXX/solr:5.3.1   "bash"              About a minute ago   Up About a minute   8983/tcp            high_almeida
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED              STATUS              PORTS               NAMES
805b9706ceaa        XXX/solr:5.3.1   "bash"              About a minute ago   Up About a minute   8983/tcp            high_almeida
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED              STATUS              PORTS               NAMES
805b9706ceaa        XXX/solr:5.3.1   "bash"              About a minute ago   Up About a minute   8983/tcp            high_almeida
[rekop@tricia ~]$ docker ps
An error occurred trying to connect: Get http:///var/run/docker.sock/v1.21/containers/json: EOF
[rekop@tricia ~]$ docker ps
Error response from daemon: 404 page not found
[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS              PORTS               NAMES
d7203ed0f2e9        rancher/agent:v0.8.2   "/run.sh run"       4 weeks ago         Up 6 seconds                            rancher-agent
[rekop@tricia ~]$ 

Rancher-agent is set to autostart.

sudo journalctl -u docker.service
 ...
Feb 08 11:27:07 tricia.XXX docker[45326]: time="2016-02-08T11:27:07.214570725+02:00" level=info msg="GET /v1.21/containers/json"
Feb 08 11:27:09 tricia.XXX docker[45326]: time="2016-02-08T11:27:09.708630289+02:00" level=info msg="POST /v1.21/containers/create"
Feb 08 11:27:09 tricia.XXX docker[45326]: time="2016-02-08T11:27:09.723236707+02:00" level=info msg="POST /v1.21/containers/805b9706ceaa2e15254424cf5e95ef2098b2cf445a45c1264e9e6b339acd7431/start"
Feb 08 11:27:11 tricia.XXX docker[45326]: time="2016-02-08T11:27:11.695334151+02:00" level=info msg="GET /v1.21/containers/json"
Feb 08 11:28:45 tricia.XXX systemd[1]: Starting Docker Application Container Engine...
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.315280641+02:00" level=info msg="API listen on /var/run/docker.sock"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.334393696+02:00" level=info msg="Firewalld running: true"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.540341761+02:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.623593634+02:00" level=info msg="Loading containers: start."
Feb 08 11:28:45 tricia.XXX docker[46542]: ..............................
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.739554971+02:00" level=info msg="Loading containers: done."
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.739579371+02:00" level=info msg="Daemon has completed initialization"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.739595871+02:00" level=info msg="Docker daemon" commit=a34a1d5 execdriver=native-0.2 graphdriver=overlay version=1.9.1
Feb 08 11:28:45 tricia.XXX systemd[1]: Started Docker Application Container Engine.
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.754576493+02:00" level=info msg="GET /v1.18/containers/rancher-agent/json"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.772093134+02:00" level=info msg="GET /v1.18/containers/rancher-agent-upgrade/json"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.772261236+02:00" level=error msg="Handler for GET /v1.18/containers/rancher-agent-upgrade/json returned error: no such id: rancher-agent-upgrade"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.772286236+02:00" level=error msg="HTTP Error" err="no such id: rancher-agent-upgrade" statusCode=404
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.772602938+02:00" level=info msg="GET /v1.18/images/rancher-agent-upgrade/json"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.772796440+02:00" level=error msg="Handler for GET /v1.18/images/rancher-agent-upgrade/json returned error: No such image: rancher-agent-upgrade"
Feb 08 11:28:45 tricia.XXX docker[46542]: time="2016-02-08T11:28:45.772825340+02:00" level=error msg="HTTP Error" err="No such image: rancher-agent-upgrade" statusCode=404
Feb 08 11:28:52 tricia.XXX docker[46542]: time="2016-02-08T11:28:52.312412811+02:00" level=info msg="GET /v1.21/containers/json"

Also I don't know if this is the same problem but after docker has been running for few days it might crash crashing all containers with it when I run any docker command.

I started our containers on Friday and today I tried to stop 2 containers with docker-compose but docker crashed instead. This has happened 2 times before. Unfortunately I don't have any logs from those events but they are similar:

docker has been untouched for few days and after running a docker command daemon crashes/restarts killing all containers.

Once I had to remove all liferay docker images and re-import the latest one because docker wouldn't start.
I think overlay layer was missing after the crash. Just loading a new image with docker load didn't help. After removing current and all older images and reloading our latest I managed to start liferay containers.

[rekop@tricia ~]$ docker ps
CONTAINER ID        IMAGE                                                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
b07a5c8902ec        XXX/liferay-62-prod-with-tos:20160205_104149   "/sbin/my_init"          2 days ago          Up 2 days           0.0.0.0:8019->8009/tcp, 0.0.0.0:8081->8080/tcp   liferay1
12afcebdf625        XXX/liferay-62-prod-with-tos:20160205_104149   "/sbin/my_init"          2 days ago          Up 2 days           0.0.0.0:8029->8009/tcp, 0.0.0.0:8082->8080/tcp   liferay2
d7203ed0f2e9        rancher/agent:v0.8.2                                     "/run.sh run"            4 weeks ago         Up 2 days                                                            rancher-agent
4e609901fb6d        XXX/kriisi-admin:20160105_150409               "/sbin/my_init"          4 weeks ago         Up 2 days           0.0.0.0:8061->80/tcp, 0.0.0.0:8060->8080/tcp     kriisi
3157d08e8c29        XXX/solr:5.3.1                                 "/opt/solr/bin/solr -"   4 weeks ago         Up 2 days           0.0.0.0:8983->8983/tcp                           solr
e4e6cb04cc42        XXX/galera:10.1.r1                                     "/sbin/my_init -- /bi"   4 weeks ago         Up 2 days           3306/tcp                                         node2
ff94f56903b2        XXX/galera:10.1.r1                                     "/sbin/my_init -- /bi"   4 weeks ago         Up 2 days           3306/tcp                                         node3
196036282d52        XXX/galera:10.1.r1                                     "/sbin/my_init -- /bi"   4 weeks ago         Up 2 days           3306/tcp                                         node1
[rekop@tricia ~]$ cd XXX/
[rekop@tricia XXX]$ docker-compose --project-name XXX --x-networking -f docker-compose-liferays-only.yml stop
ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?

If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
[rekop@tricia XXX]$ docker ps
Error response from daemon: 404 page not found
Feb 05 11:37:59 tricia.XXX docker[15562]: time="2016-02-05T11:37:59.085778938+02:00" level=info msg="GET /v1.21/containers/json"
Feb 05 11:43:44 tricia.XXX docker[15562]: time="2016-02-05T11:43:44.345459103+02:00" level=info msg="GET /v1.21/containers/json"
Feb 08 10:48:19 tricia.XXX systemd[1]: Starting Docker Application Container Engine...
Feb 08 10:48:19 tricia.XXX docker[44493]: time="2016-02-08T10:48:19.444462319+02:00" level=info msg="API listen on /var/run/docker.sock"
Feb 08 10:48:19 tricia.XXX docker[44493]: time="2016-02-08T10:48:19.471564837+02:00" level=info msg="Firewalld running: true"
Feb 08 10:48:20 tricia.XXX docker[44493]: time="2016-02-08T10:48:20.013128696+02:00" level=warning msg="driver error disconnecting container node1 : endpoint not found: 28963db0365d6abdfa2e513722a11e594ff7af9d935828c0383081fcf59ea48f"
Feb 08 10:48:20 tricia.XXX docker[44493]: time="2016-02-08T10:48:20.031113541+02:00" level=warning msg="driver error deleting endpoint node1 : endpoint not found: 28963db0365d6abdfa2e513722a11e594ff7af9d935828c0383081fcf59ea48f"
Feb 08 10:48:20 tricia.XXX docker[44493]: time="2016-02-08T10:48:20.413101616+02:00" level=warning msg="driver error disconnecting container liferay1 : endpoint not found: 7ef5456d962da63083b1f6f5019e8c6ce1f50fc7954e55b4278a1f5accafb21f"
Feb 08 10:48:20 tricia.XXX docker[44493]: time="2016-02-08T10:48:20.428098437+02:00" level=warning msg="driver error deleting endpoint liferay1 : endpoint not found: 7ef5456d962da63083b1f6f5019e8c6ce1f50fc7954e55b4278a1f5accafb21f"
Feb 08 10:48:20 tricia.XXX docker[44493]: time="2016-02-08T10:48:20.814100244+02:00" level=warning msg="driver error disconnecting container node2 : endpoint not found: 3d603c35244f45102024aedfe3fd0f6b8149fbd6c436a967fe598d2d37c4a638"
Feb 08 10:48:20 tricia.XXX docker[44493]: time="2016-02-08T10:48:20.838088937+02:00" level=warning msg="driver error deleting endpoint node2 : endpoint not found: 3d603c35244f45102024aedfe3fd0f6b8149fbd6c436a967fe598d2d37c4a638"
Feb 08 10:48:21 tricia.XXX docker[44493]: time="2016-02-08T10:48:21.203104275+02:00" level=warning msg="driver error disconnecting container solr : endpoint not found: fb7bb67040f3e1f177049b1da7a6102a5a1a8a40a96b952c7ba8995333558a44"
Feb 08 10:48:21 tricia.XXX docker[44493]: time="2016-02-08T10:48:21.218075296+02:00" level=warning msg="driver error deleting endpoint solr : endpoint not found: fb7bb67040f3e1f177049b1da7a6102a5a1a8a40a96b952c7ba8995333558a44"
Feb 08 10:48:21 tricia.XXX docker[44493]: time="2016-02-08T10:48:21.592122007+02:00" level=warning msg="driver error disconnecting container kriisi : endpoint not found: e00e82a516fdadef7ecd8c31043a34ffd9d90c03ffb4afb257e7057cd07f1aca"
Feb 08 10:48:21 tricia.XXX docker[44493]: time="2016-02-08T10:48:21.608091635+02:00" level=warning msg="driver error deleting endpoint kriisi : endpoint not found: e00e82a516fdadef7ecd8c31043a34ffd9d90c03ffb4afb257e7057cd07f1aca"
Feb 08 10:48:21 tricia.XXX docker[44493]: time="2016-02-08T10:48:21.976130898+02:00" level=warning msg="driver error disconnecting container liferay2 : endpoint not found: 81ebb05193fe61c22eddf600d79048d7aba6e47c7bfdedcfc3581cfaa601a730"
Feb 08 10:48:21 tricia.XXX docker[44493]: time="2016-02-08T10:48:21.994095242+02:00" level=warning msg="driver error deleting endpoint liferay2 : endpoint not found: 81ebb05193fe61c22eddf600d79048d7aba6e47c7bfdedcfc3581cfaa601a730"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.368102353+02:00" level=warning msg="driver error disconnecting container node3 : endpoint not found: a3d6deabd9ee0580136116f4874ed814e4e2d9e50229c93db3f5558bfef3a2c0"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.389110922+02:00" level=warning msg="driver error deleting endpoint node3 : endpoint not found: a3d6deabd9ee0580136116f4874ed814e4e2d9e50229c93db3f5558bfef3a2c0"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.433387878+02:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon option --bip can be used to set a preferred IP address"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.525608921+02:00" level=info msg="Loading containers: start."
Feb 08 10:48:22 tricia.XXX docker[44493]: ....................time="2016-02-08T10:48:22.646112991+02:00" level=warning msg="signal: killed"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.687079920+02:00" level=warning msg="wait: no child processes"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.889194347+02:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/d7203ed0f2e98c3f38fa1f07791b5151aa9a309d60511441511d3a60c26ce275/shm: invalid argument\nfailed to umount /var/lib/docker/containers/d7203ed0f2e98c3f38fa1f07791b5151aa9a309d60511441511d3a60c26ce275/mqueue: invalid argument"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.889393749+02:00" level=error msg="Failed to start container d7203ed0f2e98c3f38fa1f07791b5151aa9a309d60511441511d3a60c26ce275: [8] System error: Unit docker-d7203ed0f2e98c3f38fa1f07791b5151aa9a309d60511441511d3a60c26ce275.scope already exists."
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.889422449+02:00" level=info msg="Loading containers: done."
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.889433149+02:00" level=info msg="Daemon has completed initialization"
Feb 08 10:48:22 tricia.XXX docker[44493]: time="2016-02-08T10:48:22.889449649+02:00" level=info msg="Docker daemon" commit=a34a1d5 execdriver=native-0.2 graphdriver=overlay version=1.9.1
Feb 08 10:48:22 tricia.XXX systemd[1]: Started Docker Application Container Engine.

@LK4D4
Copy link
Contributor

LK4D4 commented Feb 8, 2016

I don't see anything related to a crash in logs, so I suppose it's a problem with socket activation or something like this. @nalind @rhatdan could you help us, pls?

@rhatdan
Copy link
Contributor

rhatdan commented Feb 8, 2016

We are investigating this in a bugzilla.

https://bugzilla.redhat.com/show_bug.cgi?id=1300076

Right now it looks like a golang problem. Golang will not ignore sigpipe.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 8, 2016

We are investigating this in a bugzilla.

https://bugzilla.redhat.com/show_bug.cgi?id=1300076

Right now it looks like a golang problem. Golang will not ignore sigpipe.

#7087 also has this problem.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 8, 2016

The basic problem is when you run a unit file systemd hooks up stdout/stderr to the journal, if the journal goes away. These sockets will get closed, and when you write to them you will get a sigpipe. systemd says programs should ignore sigpipe on stdout and stderr, but golang seems to refuse to do this more then 10 times.

This bugzilla explains the behaviour from a systemd/journald point of view.

https://bugs.freedesktop.org/show_bug.cgi?id=84923

@rhatdan
Copy link
Contributor

rhatdan commented Feb 8, 2016

One fix might be to redirect stdout from docker to syslog or journald directly. Most system services do not write the volume of content to stdout/stderr that the docker daemon does, they usually write it to a logging system, which use dgram sockets and would not be effected by this issue.

@LK4D4
Copy link
Contributor

LK4D4 commented Mar 4, 2016

Maybe we can just catch SIGPIPE?

@rhatdan
Copy link
Contributor

rhatdan commented Mar 4, 2016

@LK4D4 Problem is with golang < 1.6. Golang has some wacky code that ignores SIGPIPE 10 times and then dies, no matter what you do in your code.

We are looking to ship a stop gap which would pipe the output of docker directly to journald until we can build with golang 1.6. After we can build with 1.6 this tool will go away.

@LK4D4
Copy link
Contributor

LK4D4 commented Mar 4, 2016

@rhatdan cool, thanks for update.

@dchen1107
Copy link
Contributor

cc/ me

@thaJeztah
Copy link
Member

@rhatdan I see the linked issue was resolved; https://bugzilla.redhat.com/show_bug.cgi?id=1300076 I guess it's the stopgap solution; can you tell me if there's any changes needed in this repository (e.g., updates to our unit-files? https://github.com/docker/docker/tree/master/contrib/init/systemd

@runcom
Copy link
Member

runcom commented Apr 27, 2016

@thaJeztah unfortunately no. We had to add an helper binary which spawns Docker itself and capture the logs.

@runcom
Copy link
Member

runcom commented Apr 27, 2016

I mean, I believe we can update the contrib/ folder and unit files so that this binary is used if you guys wish

@rhatdan
Copy link
Contributor

rhatdan commented Apr 27, 2016

Yes only way to fix this was to get a "C" program that launches docker and grabs its stdout and stderr and sends it to journald. When we get to go 1.6 we will through away the "C" program.

jwhonce added a commit to jwhonce/docker that referenced this issue May 23, 2016
Using golang 1.6, is it now possible to ignore SIGPIPE events on
stdout/stderr.  Previous versions of the golang library cached 10
events and then killed the process receiving the events.

systemd-journald sends SIGPIPE events when jounald is restarted and
the target of the unit file writes to stdout/stderr. Docker logs to stdout/stderr.

This patch silently ignores all SIGPIPE events.

Signed-off-by: Jhon Honce <jhonce@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants