Implement the Cargo half of pipelined compilation (take 2) #6883

alexcrichton · 2019-04-26T22:36:27Z

This commit starts to lay the groundwork for #6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an *.rmeta file as well as an *.rlib file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the *.rmeta is ready and not
have to wait for the *.rlib compilation.

Initially attempted in #6864 with a pretty invasive refactoring this
iteration is much more lightweight and fits much more cleanly into
Cargo's backend. The approach taken here is to update the
DependencyQueue structure to carry a piece of data on each dependency
edge. This edge information represents the artifact that one node
requires from another, and then we a node has no outgoing edges it's
ready to build.

A dependency on a metadata file is modeled as just that, a dependency on
just the metadata and not the full build itself. Most of cargo's backend
doesn't really need to know about this edge information so it's
basically just calculated as we insert nodes into the DependencyQueue.
Once that's all in place it's just a few pieces here and there to
identify compilations that can be pipelined and then they're wired up
to depend on the rmeta file instead of the rlib file.

Closes #6660

rust-highfive · 2019-04-26T22:36:38Z

r? @nrc

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2019-04-26T22:41:57Z

r? @ehuss

So I've got one open question on this implementation which is included as a failing test here. The general question is what do we want to happen in this scenario:

In the first compile, a user builds an rlib.
In the next compile, the user builds another rlib that depends on the first.

The test executes this as cargo build -p dependency followed by cargo build -p parent. In this scenario it's currently buggy because the second compilation assumes the first produced metadata, but we currently try to avoid producing metadata where possible. The options to solve this are:

Always produce metadata files for rlibs, whether or not the metadata will be consumed during pipelining
In the second compile recompile the dependency rlib to produce metadata. (we'd handle this by hashing whether metadata is produced into the fingerprint)

Option (1) would take up a bit more disk space while option (2) may take longer, but this seems unlikely to really come up in practice all that often. I'd slightly lean towards option (1), but I'm curious what you'd think as well?

bors · 2019-04-26T23:21:26Z

☔ The latest upstream changes (presumably #6772) made this pull request unmergeable. Please resolve the merge conflicts.

Eh2406 · 2019-04-26T23:47:08Z

Are these the same metadata files need for 'cargo check' to use the results of 'cargo build'?

alexcrichton · 2019-04-29T14:13:35Z

I think in theory they can be compatible, but this doesn't attempt to set things up so cargo check can be shared with cargo build, but we can perhaps tackle that not too difficultly in the future!

ehuss · 2019-04-29T20:16:37Z

I'd slightly lean towards option (1), but I'm curious what you'd think as well?

Either option sounds fine to me. It doesn't seem like a huge amount of extra disk space.

alexcrichton · 2019-04-29T20:41:28Z

Ok, I've implemented that strategy since it's easier!

ehuss · 2019-04-29T22:15:18Z

Looks like a bunch of tests need to be updated now that --emit has been expanded.

alexcrichton · 2019-04-30T15:15:33Z

Oops, updated that

ehuss · 2019-04-30T16:59:36Z

Looks good to me. r=me if you're ready to go. I'm not sure if you want to wait for the rustc changes or not. I figure worst case if things don't work out, this can be reverted?

alexcrichton · 2019-04-30T18:50:46Z

I think it makes sense to wait for an end-to-end story to prove out that it's working. To that end I've pushed up a few more commits which I believe fully implements pipelined compilation in Cargo given the support in upstream rustc for -Z emit-directives. Some caveats apply though:

This'll fail CI since we don't have a nightly which supports emit-directives
This'll have very little benefit for pipelined compilation since rustc doesn't emit the directive until the very end of compilation. The compiler is being changed in Emit metadata files earlier rust#60385 to emit the directive much sooner
Still requires opt in, for example through CARGO_BUILD_PIPELINING=1

alexcrichton · 2019-05-01T14:40:21Z

Ok I think that covers everything and I've verified it works with tonights nightly, so this should be good to go!

bors · 2019-05-01T22:56:21Z

☔ The latest upstream changes (presumably #6896) made this pull request unmergeable. Please resolve the merge conflicts.

alexcrichton · 2019-05-03T18:22:17Z

@ehuss did you want to give this a final once-over before merging for the commits I added after the initial review?

ehuss · 2019-05-05T00:32:55Z

I'm still working through this. I don't see any noticeable improvement in total compilation time with the various projects I've tried. Is there a particular project type or concurrency level that would see a benefit? I'm concerned about moving forward if it ultimately doesn't help.

I've been instrumenting this a little. I've been building some graphs of the queue sizes, which shows that the initial phase is kept more "full", but that drops off quickly. I've also been recording how much faster the rmeta files are emitted. On average, the rmeta file is emitted at about the 70% mark of the total crate build time (ranging from 30% to 98%). I'm not sure if that intuitively seems reasonable. I believe it is currently emitted just before codegen?

I'm going to continue instrumenting and try to get a better picture. Would like to get your perspective.

alexcrichton · 2019-05-06T16:15:11Z

Heh a good point on the investigations! My plan was moreso "land it and make a post asking for users to test it out" and debug from there :)

I was also slightly dismayed at the lack of benefit on Cargo itself in both release and debug builds. I suspect that this is not going to be as large of a win as I had previously hoped. That being said, I'm still hopeful we can get testers on internals to help show some other data.

This also isn't necessarily a win in all cases. The whole point of pipelining is to take advantage of otherwise idle parallelism, and we can't pipeline literally everything and what's here is a coarse approximation of the ideal. Some ingredients necessary to reap benefits from this PR:

You'll at least have to compile more than one crate. Incremental builds that build the top crate will see no benefit.
You'll probably need to be compiling in release mode to get the most benefit. Otherwise codegen is typically quick enough that it won't really feel all that different. (AKA if codegen is "fast enough" in debug mode pipelining won't actually start anything that much sooner).
There may not be a ton of benefit in full crate graph builds. There's lots of edges that still require full synchronization of build process, such as:
- Procedural macros (aka serde_derive) have to be entirely finished before proceeding
- Build scripts (aka building large chunks of C code) must entirely finish as well

Given all that I suspect that the scenario with the biggest win will be incremental release mode builds where a crate a few layers deep in the graph was modified. Incremental builds in debug mode may fit this same profile but will likely not feel all that much faster.

FWIW though I'm personally pretty happy with how low the complexity level turned out for this PR, so I think it'd be fine to land with the intention of backing it out if it doesn't pan out. That being said though I can't think of a use case where this regresses something, so I'm not sure there's a ton of downside to not landing this. (that's exclusively given the low complexity budget spend I'm perceiving here)

stuhood · 2019-05-06T16:44:27Z

I was also slightly dismayed at the lack of benefit on Cargo itself in both release and debug builds. I suspect that this is not going to be as large of a win as I had previously hoped. That being said, I'm still hopeful we can get testers on internals to help show some other data.

From simulations we've done with pipelining/"outlining", the benefits only begin to show on beefier machines with larger graphs. If you don't have enough total work, or if you don't have enough cores to run more of the unblocked work in parallel, then the outlining is overhead (but hopefully a very small amount of overhead).

A case where outlining helps even for smaller workspaces is the "long path" case. In a workspace with a chain of dependencies, if you edit a few deps away from a test/binary, the "path" between them can be compiled in parallel, whereas without outlining it would be sequential.

...So yea. I suspect that @alexcrichton's hypotheses are correct.

src/cargo/core/compiler/mod.rs

src/cargo/util/machine_message.rs

ehuss · 2019-05-06T20:46:45Z

Sounds reasonable.

I'd like to add some timing data to the debug logs, I'll follow up with a separate PR with some ideas.

I think this will address some other issues, like #6674 and #6119.

but it turns out that we unconditionally capture all output from the compiler/rustdoc today, no exceptions

~~This hasn't been true since #6289.~~ EDIT: Or maybe this means that the stdout/stderr are not inherited anymore, nevermind.

alexcrichton · 2019-05-09T19:03:02Z

Updated with @ehuss's comments.

@matthiaskrgr I'm not sure why, but I'm not really shooting for 100% fidelity works everywhere in this PR, that's something I hope we can fix and iterate on afterwards.

ehuss · 2019-05-09T20:25:09Z

Sorry, one more minor nit. I notice the progress bar jumping backwards at times. I think the following will fix it:

diff --git a/src/cargo/core/compiler/job_queue.rs b/src/cargo/core/compiler/job_queue.rs
index bff2d5b6..10ecf47f 100644
--- a/src/cargo/core/compiler/job_queue.rs
+++ b/src/cargo/core/compiler/job_queue.rs
@@ -320,7 +320,8 @@ impl<'a, 'cfg> JobQueue<'a, 'cfg> {
             // unnecessarily.
             let events: Vec<_> = self.rx.try_iter().collect();
             let events = if events.is_empty() {
-                self.show_progress(total);
+                let count = total - self.queue.len() - self.active.len() - queue.len();
+                self.show_progress(count, total);
                 vec![self.rx.recv().unwrap()]
             } else {
                 events
@@ -431,8 +432,7 @@ impl<'a, 'cfg> JobQueue<'a, 'cfg> {
         }
     }

-    fn show_progress(&mut self, total: usize) {
-        let count = total - self.queue.len() - self.active.len();
+    fn show_progress(&mut self, count: usize, total: usize) {
         let active_names = self
             .active
             .values()

Essentially when jobs enter the local queue, they get lost from the count until they get added to active. It also made the progress bar appear to be further along than it actually was.

EDIT: Alternatively, it could keep a running "finished" count.

alexcrichton · 2019-05-10T13:49:26Z

Ah yeah I think I like the idea of a running finished count, so pushed that!

ehuss · 2019-05-10T14:01:25Z

src/cargo/core/compiler/job_queue.rs

@@ -350,6 +351,7 @@ impl<'a, 'cfg> JobQueue<'a, 'cfg> {
                        print.print(&msg)?;
                    }
                    Message::Finish(id, artifact, result) => {
+                        finished += 1;


I think this needs to be moved down to the Artifact::All branch?

Hm yes indeed

Instead of juggling a few counters let's just keep an explicit counter of finished packages.

ehuss · 2019-05-10T14:36:26Z

👍

@bors r+

bors · 2019-05-10T14:36:27Z

📌 Commit dcd7c48 has been approved by ehuss

bors · 2019-05-10T14:36:36Z

⌛ Testing commit dcd7c48 with merge 2e09266...

Implement the Cargo half of pipelined compilation (take 2) This commit starts to lay the groundwork for #6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in #6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file. Closes #6660

bors · 2019-05-10T15:03:09Z

☀️ Test successful - checks-travis, status-appveyor
Approved by: ehuss
Pushing 2e09266 to master...

mark-i-m · 2019-05-10T15:07:57Z

So is pipelining implemented on nightly now?

Eh2406 · 2019-05-10T15:25:12Z

No. It is on master for cargo. Every week or so there is a PR on rust to update the cargo subrepo. The next nightly after that PR is merged will have this functionality. Then the functionality still requires an opt in with the env CARGO_BUILD_PIPELINING.

Update cargo 17 commits in 759b6161a328db1d4863139e90875308ecd25a75..c4fcfb725b4be00c72eb9cf30c7d8b095577c280 2019-05-06 20:47:49 +0000 to 2019-05-15 19:48:47 +0000 - tests: registry: revert readonly permission after running tests. (rust-lang/cargo#6947) - Remove Candidate (rust-lang/cargo#6946) - Fix for "Running cargo update without a Cargo.lock ignores arguments" rust-lang/cargo#6872 (rust-lang/cargo#6904) - Fix a minor mistake in the changelog. (rust-lang/cargo#6944) - Give a better error message when crates.io requests time out (rust-lang/cargo#6936) - Re-enable compatibility with readonly CARGO_HOME (rust-lang/cargo#6940) - Fix version of `ignore`. (rust-lang/cargo#6938) - Stabilize offline mode. (rust-lang/cargo#6934) - zsh: Add doc options to include non-public items documentation (rust-lang/cargo#6929) - zsh: Suggest --lib option as binary template now the default (rust-lang/cargo#6926) - Migrate package include/exclude to gitignore patterns. (rust-lang/cargo#6924) - Implement the Cargo half of pipelined compilation (take 2) (rust-lang/cargo#6883) - Always include `Cargo.toml` when packaging. (rust-lang/cargo#6925) - Remove unnecessary calls to masquerade_as_nightly_cargo. (rust-lang/cargo#6923) - download: fix "Downloaded 1 crates" message (crates -> crate) (rust-lang/cargo#6920) - Changed RUST_LOG usage to CARGO_LOG to avoid confusion. (rust-lang/cargo#6918) - crate download: don't print that a crate was the largest download if it was the only download (rust-lang/cargo#6916)

alexcrichton · 2019-05-17T16:09:59Z

I've posted an internals topic for getting more information here

@oli-obk

Fix compile-test from forcing a rebuild. If the `compile-test` test was run, then running a cargo build command immediately after caused everything to be rebuilt. This is because the `compile-test` test was deleting all `.rmeta` files in the target directory. Cargo recently switched to always generating `.rmeta` files (rust-lang/cargo#6883), and when the files are deleted, it thinks it needs to be rebuilt. I am not very familiar with compiletest or clippy, so please take a close look and test this out (with the most recent nightly). In particular, make sure it doesn't revert the fixes from #3380 (it seems to work for me). Also @oli-obk mentioned something related in rust-lang/rust#60190 (comment), and I want to make sure that is addressed as well. Fixes #4114

rust-highfive assigned nrc Apr 26, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 26, 2019

rust-highfive assigned ehuss and unassigned nrc Apr 26, 2019

alexcrichton mentioned this pull request Apr 26, 2019

Implement the Cargo half of pipelined compilation #6864

Closed

alexcrichton force-pushed the pipelining-v2 branch from 44bcea1 to 7179ecd Compare April 29, 2019 14:13

alexcrichton force-pushed the pipelining-v2 branch from 7179ecd to 25c53cd Compare April 29, 2019 20:35

alexcrichton force-pushed the pipelining-v2 branch from 25c53cd to 22f2740 Compare April 30, 2019 15:14

alexcrichton force-pushed the pipelining-v2 branch from 780d6df to bb67463 Compare May 1, 2019 14:39

alexcrichton force-pushed the pipelining-v2 branch from bb67463 to ab6b348 Compare May 1, 2019 15:19

alexcrichton force-pushed the pipelining-v2 branch from ab6b348 to d137068 Compare May 2, 2019 14:01

ehuss reviewed May 6, 2019

View reviewed changes

ehuss reviewed May 10, 2019

View reviewed changes

Keep better track of finished crates

dcd7c48

Instead of juggling a few counters let's just keep an explicit counter of finished packages.

alexcrichton force-pushed the pipelining-v2 branch from aca32d5 to dcd7c48 Compare May 10, 2019 14:03

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 10, 2019

bors merged commit dcd7c48 into rust-lang:master May 10, 2019

This was referenced May 10, 2019

Show stdout from the compiler with json error mode #6674

Closed

Fix nonconcurrent tests #6900

Merged

alexcrichton deleted the pipelining-v2 branch May 10, 2019 15:39

ehuss mentioned this pull request May 16, 2019

Update cargo rust-lang/rust#60874

Merged

alexcrichton mentioned this pull request May 17, 2019

Nightly Cargo passes wrong rlib path for doctests #6954

Open

ehuss mentioned this pull request May 19, 2019

Fix compile-test from forcing a rebuild. rust-lang/rust-clippy#4115

Merged

Eh2406 mentioned this pull request May 23, 2019

mtime-on-use as a config option #6978

Closed

Eh2406 mentioned this pull request Jun 14, 2019

-Zmtime-on-use causes spurious rebuilds in workspace #6972

Closed

cosmicexplorer mentioned this pull request Nov 26, 2019

build pants with its own rust backend! pantsbuild/pants#8717

Closed

3 tasks

adetaylor mentioned this pull request Jun 6, 2020

rlibs retain reference to proc-macro dependencies - possibly unnecessary? rust-lang/rust#73047

Closed

ehuss added this to the 1.36.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the Cargo half of pipelined compilation (take 2) #6883

Implement the Cargo half of pipelined compilation (take 2) #6883

alexcrichton commented Apr 26, 2019 •

edited

rust-highfive commented Apr 26, 2019

alexcrichton commented Apr 26, 2019

bors commented Apr 26, 2019

Eh2406 commented Apr 26, 2019

alexcrichton commented Apr 29, 2019

ehuss commented Apr 29, 2019

alexcrichton commented Apr 29, 2019

ehuss commented Apr 29, 2019

alexcrichton commented Apr 30, 2019

ehuss commented Apr 30, 2019

alexcrichton commented Apr 30, 2019

alexcrichton commented May 1, 2019

bors commented May 1, 2019

alexcrichton commented May 3, 2019

ehuss commented May 5, 2019

alexcrichton commented May 6, 2019

stuhood commented May 6, 2019 •

edited

ehuss commented May 6, 2019 •

edited

alexcrichton commented May 9, 2019

ehuss commented May 9, 2019 •

edited

alexcrichton commented May 10, 2019

ehuss May 10, 2019

alexcrichton May 10, 2019

ehuss commented May 10, 2019

bors commented May 10, 2019

bors commented May 10, 2019

bors commented May 10, 2019

mark-i-m commented May 10, 2019

Eh2406 commented May 10, 2019

alexcrichton commented May 17, 2019

Implement the Cargo half of pipelined compilation (take 2) #6883

Implement the Cargo half of pipelined compilation (take 2) #6883

Conversation

alexcrichton commented Apr 26, 2019 • edited

rust-highfive commented Apr 26, 2019

alexcrichton commented Apr 26, 2019

bors commented Apr 26, 2019

Eh2406 commented Apr 26, 2019

alexcrichton commented Apr 29, 2019

ehuss commented Apr 29, 2019

alexcrichton commented Apr 29, 2019

ehuss commented Apr 29, 2019

alexcrichton commented Apr 30, 2019

ehuss commented Apr 30, 2019

alexcrichton commented Apr 30, 2019

alexcrichton commented May 1, 2019

bors commented May 1, 2019

alexcrichton commented May 3, 2019

ehuss commented May 5, 2019

alexcrichton commented May 6, 2019

stuhood commented May 6, 2019 • edited

ehuss commented May 6, 2019 • edited

alexcrichton commented May 9, 2019

ehuss commented May 9, 2019 • edited

alexcrichton commented May 10, 2019

ehuss May 10, 2019

Choose a reason for hiding this comment

alexcrichton May 10, 2019

Choose a reason for hiding this comment

ehuss commented May 10, 2019

bors commented May 10, 2019

bors commented May 10, 2019

bors commented May 10, 2019

mark-i-m commented May 10, 2019

Eh2406 commented May 10, 2019

alexcrichton commented May 17, 2019

alexcrichton commented Apr 26, 2019 •

edited

stuhood commented May 6, 2019 •

edited

ehuss commented May 6, 2019 •

edited

ehuss commented May 9, 2019 •

edited