Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker deamon not really started on EL6 #162

Closed
dpetzel opened this issue Jun 4, 2014 · 17 comments
Closed

Docker deamon not really started on EL6 #162

dpetzel opened this issue Jun 4, 2014 · 17 comments

Comments

@dpetzel
Copy link
Contributor

dpetzel commented Jun 4, 2014

After adding the service test in #161, I found the service check fails. The interesting part is that service status reports a running PID, but attempting to list images fails. Manually restarting the service seems to clear things up.

It seems the recipe may need to issue one more restart somewhere along the line?

Here is the output showing what I'm seeing on both RHEL6.5 and Centos 6.4:

[root@host ~]# service docker status
docker (pid  1630) is running...
[root@host ~]# docker images
2014/06/04 03:58:07 Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
[root@host ~]# service docker restart
Stopping docker:                                           [  OK  ]
Starting docker:                                       [  OK  ]
[root@host ~]# service docker status
docker (pid  1873) is running...
[root@host ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
@bflad
Copy link
Contributor

bflad commented Jun 24, 2014

Have you tried this with docker-io 1.0? What is in the Docker logs (/var/log/docker)?

@jperville
Copy link
Contributor

@dpetzel can you check:

  • if the docker daemon is running (ps aux | grep 1630 in your example)
  • then if the process returned by ps is actually the docker process and not a parent shell (an exec in the service file might help)
  • finally if the docker socket (eg. /var/run/docker.sock) is present, so that docker (the client) can communicate with docker (the daemon)

When docker (the client) says that "Cannot connect to the Docker daemon" it usually is because the socket is missing.

@dpetzel
Copy link
Contributor Author

dpetzel commented Jun 24, 2014

I'll try and do this in the next day or so. If you have the facility to do so, running the test kitchen test now that #161 is merged was how I was repro'ing it.

@dpetzel
Copy link
Contributor Author

dpetzel commented Jun 24, 2014

@bflad Have not tried it with Docker 1.0. Will try to do so though.

@jperville
Copy link
Contributor

I just ran the kitchen for the centos 6.5 platform... The chef run systematically fails at the first docker command (docker_image 'docker-test-image'). Restarting the docker service then converging again will be successful, as @dpetzel wrote.

Here is the information that I collected inside the VM:

[root@package-native-centos-65 ~]# docker --version
Docker version 1.0.0, build 63fe64c/1.0.0
[root@package-native-centos-65 ~]# service docker status
docker (pid  3832) is running...
[root@package-native-centos-65 ~]# ps aux | grep -v grep | grep 3832
root      3832  0.0  0.9 296180  9984 ?        Sl   08:39   0:00 /usr/bin/docker -d --host=unix:///var/run/docker.sock --restart=false --selinux-enabled=false
[root@package-native-centos-65 ~]# ls -l /var/run/docker.sock 
srw-rw----. 1 root docker 0 Jun 25 08:39 /var/run/docker.sock
[root@package-native-centos-65 ~]# lsof /var/run/docker.sock 
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
docker  3832 root    8u  unix 0xffff880037ca9980      0t0 51516 /var/run/docker.sock
[root@package-native-centos-65 ~]# docker images
2014/06/25 08:45:30 Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
[root@package-native-centos-65 ~]# service docker restart
Stopping docker:                                           [  OK  ]
Starting docker:                                       [  OK  ]
[root@package-native-centos-65 ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE

I note that the docker daemon is instantiated with exactly the same arguments as on my Ubuntu 14.04, and that the socket exists with the right permissions. It should be working (think about upgrading the kernel to 3.8+ though).

@amaltson
Copy link

amaltson commented Jul 7, 2014

I'm running into the same issue with Docker 1.1.0. As @jperville mentioned, when you manually restart the service it works. Any idea's why this might be?

@amaltson
Copy link

amaltson commented Jul 8, 2014

I've tried adding a manual service restart, but it only helps after a reconverge. Any ideas on a way forward?

@StFS
Copy link

StFS commented Jul 9, 2014

This is really silly. I'm having this problem too.

To be fair though, there seems to be some weirdness with the RPM package too. If I manually install docker (though EPEL) the service is not started automatically. You seem to have to do that yourself. However, when I install with chef using this cookbook I get the same results as has been described here. The service goes up and "service docker status" reports that it's running but any docker command ends in a failure complaining that the daemon is not running.

@amaltson: I ended up having to add a bash block that did the manual restart of docker. So after you include_recipe "docker" put a block similar to this:

# NOTE: For some weird reason the docker service needs to be 
#       restarted after being set up.
bash "Docker service black magic" do
  user "root"
  code <<-EOH
  service docker restart
  touch /var/run/.nc-docker-installed
  EOH
  not_if { File.exists?('/var/run/.nc-docker-installed') }
end

This will fix your problem but of course it's a nasty hack.

@StFS
Copy link

StFS commented Jul 9, 2014

@amaltson I cleaned up the hack and managed to make it look a little bit better. Still annoying to have to do this. So instead of the stuff above, put this in your recipe after including the docker recipe.

# NOTE: For some weird reason the docker service needs to be 
#       restarted after being set up.
#       To maintain idempotency we only restart if "docker info"
#       fails.
service "docker" do
  supports :status => true, :start => true, :stop => true, :restart => true, :reload => true
  action :restart
  not_if 'docker info'
end

@dpetzel
Copy link
Contributor Author

dpetzel commented Jul 9, 2014

I think this approach (as written) is going to result in "cloning resource" warnings in the logs as it will overlap with the existing service definition. Maybe something like this (ensuring the resource name is unique):

service "docker_GH162" do
  action :restart
  not_if 'docker info'
  service_name 'docker'
end

@StFS
Copy link

StFS commented Jul 9, 2014

I just did a run of this again and I can't see any warnings... not that it matters all that much I guess... hopefully this hack will become unnecessary at some point :-)

@amaltson
Copy link

amaltson commented Jul 9, 2014

Thanks @StFS, agreed it's not ideal. I've actually gone with:

log 'Done installing Docker, doing service restart' do
  notifies :restart, 'service[docker]', :delayed
  not_if 'docker info'
end

This seems to work too 😉. Lots of variations, but as you mention, hopefully it won't be needed soon.

P.S. I've also had this cookbook fail on RHEL installations b/c it looks like the cgroups recipe needs to be executed for that platform too.

@StFS
Copy link

StFS commented Jul 10, 2014

Oooh I like yours better... gonna steal it ;)

@amaltson
Copy link

Please do :). I modified the notification to be :delayed, it seems :immediately isn't working when I destroy and re-converge.

@amaltson
Copy link

Just to report back, doing a notify to restart the service means that Docker is successfully up and running when the Chef run finishes, however, what I've found is if you have one run list that installs Docker and then tries to pull down docker images and start up containers, that fails hard b/c Docker isn't up and running yet. The rather painful workaround is to have one Chef run with just the Docker install recipe, then another Chef run that uses docker.

@dermusikman
Copy link
Contributor

It seems discussion over the fundamental issue has ended, but I've determined that there's a FUTEX_WAIT it hangs on:

[root@default-centos-65 ~]# strace -p $(pgrep docker)
Process 2507 attached - interrupt to quit
futex(0x12cfa08, FUTEX_WAIT, 0, NULL^C <unfinished ...>
Process 2507 detached

Don't have the cycles to try to figure out what it is, but figured I'd add what little I'd found to contribute to the fix. Restarting it seems to resolve the problem.

Implementing hack :-p

@someara
Copy link
Contributor

someara commented Jul 14, 2015

Should work as of 0.40.x

@someara someara closed this as completed Jul 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants