Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix travis builds #38340

Merged
merged 1 commit into from
Dec 14, 2016
Merged

Fix travis builds #38340

merged 1 commit into from
Dec 14, 2016

Conversation

alexcrichton
Copy link
Member

After reading some articles 1 2 yesterday about Docker and the "init"
process I got to thinking about the problems that we've been seeing on Travis.
The basic problem is that a Linux system may need an "init" process to work
properly when processes become zombies. Docker by default doesn't handle this
and the root process typically isn't an init process, so this can occasionally
cause quite a few problems.

We've been seeing spurious errors on Travis inside containers which look like
OOM and such, but my guess is that zombie processes were being reparented to the
top-level shell. The shell didn't expect the zombies and then behaved very
strangely.

This commit fixes these problems by using Yelp's "dumb-init" program 2 as the
init process in all of our containers. This ensures that there's a valid init
ready to reap children when they're reparented, which our test suite apparently
generates a bunch of throughout the tests and such.

After reading some articles [1] [2] yesterday about Docker and the "init"
process I got to thinking about the problems that we've been seeing on Travis.
The basic problem is that a Linux system may need an "init" process to work
properly when processes become zombies. Docker by default doesn't handle this
and the root process typically isn't an init process, so this can occasionally
cause quite a few problems.

We've been seeing spurious errors on Travis inside containers which look like
OOM and such, but my guess is that zombie processes were being reparented to the
top-level shell. The shell didn't expect the zombies and then behaved very
strangely.

This commit fixes these problems by using Yelp's "dumb-init" program [2] as the
init process in all of our containers. This ensures that there's a valid init
ready to reap children when they're reparented, which our test suite apparently
generates a bunch of throughout the tests and such.

[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
[2]: https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html
@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

@bors: r+

@bors
Copy link
Contributor

bors commented Dec 13, 2016

📌 Commit 5e991e0 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Dec 14, 2016

⌛ Testing commit 5e991e0 with merge 01d53df...

bors added a commit that referenced this pull request Dec 14, 2016
Fix travis builds

After reading some articles [1] [2] yesterday about Docker and the "init"
process I got to thinking about the problems that we've been seeing on Travis.
The basic problem is that a Linux system may need an "init" process to work
properly when processes become zombies. Docker by default doesn't handle this
and the root process typically isn't an init process, so this can occasionally
cause quite a few problems.

We've been seeing spurious errors on Travis inside containers which look like
OOM and such, but my guess is that zombie processes were being reparented to the
top-level shell. The shell didn't expect the zombies and then behaved very
strangely.

This commit fixes these problems by using Yelp's "dumb-init" program [2] as the
init process in all of our containers. This ensures that there's a valid init
ready to reap children when they're reparented, which our test suite apparently
generates a bunch of throughout the tests and such.

[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
[2]: https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html
@luser
Copy link
Contributor

luser commented Dec 14, 2016

Well I'm glad to see all that digging we did yesterday had some useful outcome. :)

We really should just get your Linux CI moved to Taskcluster. Not that it would have necessarily made a difference for this specific issue, but we have a pretty nice working setup with in-tree Dockerfiles for Firefox builds, and I suspect you'd benefit from some of the other flexibility.

@bors bors merged commit 5e991e0 into rust-lang:master Dec 14, 2016
@alexcrichton alexcrichton deleted the fix-travis branch December 19, 2016 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants