Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with lxc-info #4298

Closed
farcaller opened this issue Feb 24, 2014 · 6 comments
Closed

Race condition with lxc-info #4298

farcaller opened this issue Feb 24, 2014 · 6 comments

Comments

@farcaller
Copy link
Contributor

For a while now I have a problem starting docker containers. It was flaky at first, now I hit the problem in 95% of container starts (I guess that might be related to my switch to raid1 on mmcblks, that slowed I/O even more).

docker version:

Client version: 0.8.0
Go version (client): go1.2
Git commit (client): cc3a8c8-dirty

Problem:

$ docker run -t -i cellofellow/rpi-arch /bin/bash
2014/02/24 08:18:37 Error: start: Cannot start container f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839: exit status 1

Docker log:

[debug] api.go:933 Calling POST /containers/create
2014/02/24 08:18:33 POST /v1.9/containers/create
[/var/lib/docker|f64b5fe6] +job create()
[/var/lib/docker|f64b5fe6] -job create() = OK (0)
[debug] api.go:933 Calling POST /containers/{name:.*}/attach
2014/02/24 08:18:37 POST /v1.9/containers/f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839/attach?stderr=1&stdin=1&stdout=1&stream=1
[/var/lib/docker|f64b5fe6] +job inspect(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839, container)
[/var/lib/docker|f64b5fe6] -job inspect(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839, container) = OK (0)
[/var/lib/docker|f64b5fe6] +job attach(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839)
[debug] container.go:421 attach: stdin: begin
[debug] container.go:458 attach: stdout: begin
[debug] container.go:496 attach: stderr: begin
[debug] container.go:542 attach: waiting for job 1/3
[debug] api.go:933 Calling POST /containers/{name:.*}/start
2014/02/24 08:18:37 POST /v1.9/containers/f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839/start
[/var/lib/docker|f64b5fe6] +job start(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839)
[/var/lib/docker|f64b5fe6] +job allocate_interface(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839)
[/var/lib/docker|f64b5fe6] -job allocate_interface(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839) = OK (0)
[debug] container.go:373 startPty: begin of stdout pipe
[debug] container.go:384 startPty: begin of stdin pipe
[error] driver.go:168 Run() failed to Wait [unshare -m -- /bin/sh -c mount --make-rslave /; exec lxc-start -n f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839 -f /var/lib/docker/containers/f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839/config.lxc -- /.dockerinit -driver lxc -g 172.17.42.1 -i 172.17.0.2/16 -mtu 1500 -- /bin/bash]: exit status 1

[error] container.go:1244 Error running container: exit status 1
[/var/lib/docker|f64b5fe6] +job release_interface(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839)
[/var/lib/docker|f64b5fe6] -job release_interface(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839) = OK (0)
[debug] container.go:386 startPty: end of stdin pipe
[debug] container.go:475 attach: stdout: end
[debug] container.go:513 attach: stderr: end
[debug] container.go:547 attach: job 1 completed successfully
[debug] container.go:542 attach: waiting for job 2/3
[debug] container.go:547 attach: job 2 completed successfully
[debug] container.go:542 attach: waiting for job 3/3
[debug] server.go:2234 Closing buffered stdin pipe
[debug] container.go:448 attach: stdin: end
[debug] container.go:547 attach: job 3 completed successfully
[debug] container.go:549 attach: all jobs completed successfully
[/var/lib/docker|f64b5fe6] -job attach(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839) = OK (0)
[/var/lib/docker|f64b5fe6] +job release_interface(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839)
[/var/lib/docker|f64b5fe6] -job release_interface(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839) = OK (0)
[error] container.go:1299 f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839: Error closing Pty master: invalid argument
Cannot start container f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839: exit status 1[/var/lib/docker|f64b5fe6] -job start(f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839) = ERR (1)
[error] api.go:959 Error: start: Cannot start container f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839: exit status 1
[error] api.go:91 HTTP Error: statusCode=500 start: Cannot start container f7f10491c6abe7d4c74d28288d581aaf3a785b5ac0ceb9b642a8f9ffe17df839: exit status 1
[debug] container.go:375 startPty: end of stdout pipe

.dockerinit is confirmed to run up to exec (additional logging to fs, as nothing else seems to be reachable).

I traced the problem up to lxc-info run, which is hard to debug further. My wrapper script

#!/bin/bash

lxc-info-real $@
EX=$?
echo "lxc-info $@: $EX" >> /tmp/lxclog
exit $EX

prevents the problem from happening. Well, even the simplest sleep 1 fixes it, actually.

I guess that either waitForStart logic is broken somehow, or there must be a better trigger criteria to detect "container not booted yet" vs "container died".

@farcaller
Copy link
Contributor Author

I just realised that this is a dup of #3824, please close one of them as you feel appropriate (this issue has more details about actual problem while the former has a bit more overall debugging details).

@farcaller
Copy link
Contributor Author

Upgraded docker to 0.8.1, failed to start containers at all. Downgraded lxc to 0.9, seem to be working in this combo for now.

@orospakr
Copy link

@farcaller, your dup hyperlink links back to this same issue, #4298. ;)

I think I'm having this same issue as well. Thank you very much for the lxc-info clue; I am going to try downgrading to lxc 0.9 and see if that helps (I'm on ArchLinux, btw).

@orospakr
Copy link

That's done it.

Thanks man, you just saved me a good number of hours!

@amackera
Copy link

@farcaller Thanks for the info on lxc!

@crosbymichael crosbymichael modified the milestone: 1.0 May 16, 2014
@cpuguy83
Copy link
Member

Closing because it looks like this is resolved, probably a version conflict with what version of LXC docker supported at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants