Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCI: We should be able to complete a build in 3 hours even without Docker image cache #49278
Comments
kennytm
added
C-enhancement
A-rustbuild
T-infra
labels
Mar 22, 2018
This comment has been minimized.
This comment has been minimized.
|
I think the worst offenders here are definitely the dist images where we're commonly building entire gcc toolchains for older compat and whatnot. One of the major pros, I think, is that we can test out changes to toolchains before they land. That is, if you update the toolchain in the CentOS image (for whatever reason) it'll get automatically rebuilt and the PR is rejected if it causes a failure. A downside with precompiled binaries is that we typically don't know if they work, so we'd have to run the precompiled build multiple times. That being said I think there's a lot we can do to improve this of course! I'd love to have incremental caching of docker images like we have with Overall it's basically impossible I think for us to get uncached builds to be under 3 hours, mainly because of dist builders (which require building gcc toolchains) and three (!) versions of LLVM. In that sense I see the main takeaway here is that we should improve the caching strategy with docker layers on Travis, which I've always wanted to do for sure! |
This comment has been minimized.
This comment has been minimized.
|
Another idea is that this may be a perfect application for Travis's "stages". If we figure out an easy way to share the docker layers between images (aka fast network transfers) then we could split up the docker build stage from the actual rustc build stage. I think that'd far extend our timeout and we'd always comfortably fit within the final time limit. |
This comment has been minimized.
This comment has been minimized.
|
Er scratch the idea of travis stages, they wouldn't work here. It looks like Travis stages only support parallelism inside one stage, and othewise stages are sequential. That's not quite what we want here... |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 22, 2018
alexcrichton
referenced this issue
Mar 22, 2018
Merged
ci: Don't use Travis caches for docker images #49284
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 22, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 22, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 22, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 22, 2018
bors
added a commit
that referenced
this issue
Mar 22, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 23, 2018
bors
added a commit
that referenced
this issue
Mar 23, 2018
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Mar 23, 2018
bors
added a commit
that referenced
this issue
Mar 25, 2018
This comment has been minimized.
This comment has been minimized.
|
Maybe similarily to #49284 it would be better, to upload all intermediate images (all layers) to S3? |
This comment has been minimized.
This comment has been minimized.
|
@steffengy seems plausible to me! Right now it's only done on success but I think we could do it on failure as well |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Was it an intentional design decision to tie caching to such a hash instead of to the $IMAGE itself?
|
This comment has been minimized.
This comment has been minimized.
|
@steffengy tying it purely to
I'm not sure which strategy is the better, neither is working that great I think :( |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Yeah, the first one is definitely an issue (the second one is essentially the same). Was using a docker registry insteadof S3 for caching discussed before?
Then after each build one would push to e.g. Disclaimer: This is untested and only relies on assumptions and discussions I found regarding this topic. |
This comment has been minimized.
This comment has been minimized.
|
@steffengy oh I'd totally be down for using a docker registry as I think it'd for sure solve most problems, I just have no idea how to run one or how we might set that up! Is this something that'd be relatively lightweight to do? |
This comment has been minimized.
This comment has been minimized.
Setting up a registry is the easier part:
Adjustments to the approach above...Some local tests showed that Resulting Possibilities
@alexcrichton
Let me know what you think. |
This comment has been minimized.
This comment has been minimized.
|
@steffengy hm so I may not be fully understanding what this implies, but isn't the registry approach the same as just using a cache location determined by the image/branch name? We're sort of running a "pseudo registry" today with curl and I could be missing something though! |
This comment has been minimized.
This comment has been minimized.
|
Also take a look at buildkit: it offers more flexibility over |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Yeah, it wouldn't really provide much to justify the additional work over tagging by image&branch on S3. @ishitatsuyuki suggestion of buildkit might be interesting, but seems to require quite a bit of work:
A few more ideas for inspiration:
|
This comment has been minimized.
This comment has been minimized.
|
@steffengy I think for now we should probably jsut switch to Learning the branch isn't the easiest thing in the world right now unfortunately but I imagine we could throw something in there to make it not too bad. |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Maybe it makes sense to combine that with trying to load from both |
This comment has been minimized.
This comment has been minimized.
|
@steffengy perhaps yeah but I've found that the curl + docker load can often take quite awhile for larger images, and they change rarely enough today anyway that I don't think it'd outweigh the cost |
kennytm commentedMar 22, 2018
Our CI builders (except macOS and Windows) use Docker, and we'll cache the Docker repository on Travis. Thanks to the cache, normally the
docker buildcommand only takes a few seconds to complete. However, when the cache is invalidated for whatever reason, the Docker image will need to be actually built, and this may take a very long time.Recently this happened with #49246 — the Docker image cache of
dist-x86_64-linux altbecame stale and thus needs to be built from scratch. One of the step involves compiling GCC. The wholedocker buildcommand thus takes over 40 minutes. Worse, thealtbuilders have assertions enabled, and thus all stage1+rustcinvocations are slower than their normal counterpart. Together, it is impossible to complete within 3 hours. Travis will not update the cache unless the build is successful. Therefore, I need to exclude RLS, Rustfmt and Clippy from the distribution, to ensure the job is passing.I don't think we should entirely rely on Travis's cache for speed. Ideally, the
docker buildcommand should at most spend 10 minutes, assuming good network speed (~2 MB/s on Travis) and reasonable CPU performance (~2.3 GHz × 4 CPUs on Travis).In the
dist-x86_64-linux altcase, if we host the precompiled GCC 4.8.5 for Centos 5, we could have trimmed 32 minutes out of the Docker build time, which allows us to complete the build without removing anything.