Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Cargo half of pipelined compilation (take 2) #6883

Merged
merged 12 commits into from May 10, 2019

Conversation

Projects
None yet
9 participants
@alexcrichton
Copy link
Member

commented Apr 26, 2019

This commit starts to lay the groundwork for #6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an *.rmeta file as well as an *.rlib file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the *.rmeta is ready and not
have to wait for the *.rlib compilation.

Initially attempted in #6864 with a pretty invasive refactoring this
iteration is much more lightweight and fits much more cleanly into
Cargo's backend. The approach taken here is to update the
DependencyQueue structure to carry a piece of data on each dependency
edge. This edge information represents the artifact that one node
requires from another, and then we a node has no outgoing edges it's
ready to build.

A dependency on a metadata file is modeled as just that, a dependency on
just the metadata and not the full build itself. Most of cargo's backend
doesn't really need to know about this edge information so it's
basically just calculated as we insert nodes into the DependencyQueue.
Once that's all in place it's just a few pieces here and there to
identify compilations that can be pipelined and then they're wired up
to depend on the rmeta file instead of the rlib file.

Closes #6660

@rust-highfive

This comment has been minimized.

Copy link

commented Apr 26, 2019

r? @nrc

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2019

r? @ehuss

So I've got one open question on this implementation which is included as a failing test here. The general question is what do we want to happen in this scenario:

  • In the first compile, a user builds an rlib.
  • In the next compile, the user builds another rlib that depends on the first.

The test executes this as cargo build -p dependency followed by cargo build -p parent. In this scenario it's currently buggy because the second compilation assumes the first produced metadata, but we currently try to avoid producing metadata where possible. The options to solve this are:

  1. Always produce metadata files for rlibs, whether or not the metadata will be consumed during pipelining
  2. In the second compile recompile the dependency rlib to produce metadata. (we'd handle this by hashing whether metadata is produced into the fingerprint)

Option (1) would take up a bit more disk space while option (2) may take longer, but this seems unlikely to really come up in practice all that often. I'd slightly lean towards option (1), but I'm curious what you'd think as well?

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

☔️ The latest upstream changes (presumably #6772) made this pull request unmergeable. Please resolve the merge conflicts.

@Eh2406

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

Are these the same metadata files need for 'cargo check' to use the results of 'cargo build'?

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from 44bcea1 to 7179ecd Apr 29, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

I think in theory they can be compatible, but this doesn't attempt to set things up so cargo check can be shared with cargo build, but we can perhaps tackle that not too difficultly in the future!

@ehuss

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

I'd slightly lean towards option (1), but I'm curious what you'd think as well?

Either option sounds fine to me. It doesn't seem like a huge amount of extra disk space.

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from 7179ecd to 25c53cd Apr 29, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

Ok, I've implemented that strategy since it's easier!

@ehuss

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

Looks like a bunch of tests need to be updated now that --emit has been expanded.

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from 25c53cd to 22f2740 Apr 30, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

Oops, updated that

@ehuss

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Looks good to me. r=me if you're ready to go. I'm not sure if you want to wait for the rustc changes or not. I figure worst case if things don't work out, this can be reverted?

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

I think it makes sense to wait for an end-to-end story to prove out that it's working. To that end I've pushed up a few more commits which I believe fully implements pipelined compilation in Cargo given the support in upstream rustc for -Z emit-directives. Some caveats apply though:

  • This'll fail CI since we don't have a nightly which supports emit-directives
  • This'll have very little benefit for pipelined compilation since rustc doesn't emit the directive until the very end of compilation. The compiler is being changed in rust-lang/rust#60385 to emit the directive much sooner
  • Still requires opt in, for example through CARGO_BUILD_PIPELINING=1

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from 780d6df to bb67463 May 1, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented May 1, 2019

Ok I think that covers everything and I've verified it works with tonights nightly, so this should be good to go!

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from bb67463 to ab6b348 May 1, 2019

@bors

This comment has been minimized.

Copy link
Contributor

commented May 1, 2019

☔️ The latest upstream changes (presumably #6896) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from ab6b348 to d137068 May 2, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented May 3, 2019

@ehuss did you want to give this a final once-over before merging for the commits I added after the initial review?

@ehuss

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

I'm still working through this. I don't see any noticeable improvement in total compilation time with the various projects I've tried. Is there a particular project type or concurrency level that would see a benefit? I'm concerned about moving forward if it ultimately doesn't help.

I've been instrumenting this a little. I've been building some graphs of the queue sizes, which shows that the initial phase is kept more "full", but that drops off quickly. I've also been recording how much faster the rmeta files are emitted. On average, the rmeta file is emitted at about the 70% mark of the total crate build time (ranging from 30% to 98%). I'm not sure if that intuitively seems reasonable. I believe it is currently emitted just before codegen?

I'm going to continue instrumenting and try to get a better picture. Would like to get your perspective.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented May 6, 2019

Heh a good point on the investigations! My plan was moreso "land it and make a post asking for users to test it out" and debug from there :)

I was also slightly dismayed at the lack of benefit on Cargo itself in both release and debug builds. I suspect that this is not going to be as large of a win as I had previously hoped. That being said, I'm still hopeful we can get testers on internals to help show some other data.

This also isn't necessarily a win in all cases. The whole point of pipelining is to take advantage of otherwise idle parallelism, and we can't pipeline literally everything and what's here is a coarse approximation of the ideal. Some ingredients necessary to reap benefits from this PR:

  • You'll at least have to compile more than one crate. Incremental builds that build the top crate will see no benefit.
  • You'll probably need to be compiling in release mode to get the most benefit. Otherwise codegen is typically quick enough that it won't really feel all that different. (AKA if codegen is "fast enough" in debug mode pipelining won't actually start anything that much sooner).
  • There may not be a ton of benefit in full crate graph builds. There's lots of edges that still require full synchronization of build process, such as:
    • Procedural macros (aka serde_derive) have to be entirely finished before proceeding
    • Build scripts (aka building large chunks of C code) must entirely finish as well

Given all that I suspect that the scenario with the biggest win will be incremental release mode builds where a crate a few layers deep in the graph was modified. Incremental builds in debug mode may fit this same profile but will likely not feel all that much faster.

FWIW though I'm personally pretty happy with how low the complexity level turned out for this PR, so I think it'd be fine to land with the intention of backing it out if it doesn't pan out. That being said though I can't think of a use case where this regresses something, so I'm not sure there's a ton of downside to not landing this. (that's exclusively given the low complexity budget spend I'm perceiving here)

@stuhood

This comment has been minimized.

Copy link

commented May 6, 2019

I was also slightly dismayed at the lack of benefit on Cargo itself in both release and debug builds. I suspect that this is not going to be as large of a win as I had previously hoped. That being said, I'm still hopeful we can get testers on internals to help show some other data.

From simulations we've done with pipelining/"outlining", the benefits only begin to show on beefier machines with larger graphs. If you don't have enough total work, or if you don't have enough cores to run more of the unblocked work in parallel, then the outlining is overhead (but hopefully a very small amount of overhead).

A case where outlining helps even for smaller workspaces is the "long path" case. In a workspace with a chain of dependencies, if you edit a few deps away from a test/binary, the "path" between them can be compiled in parallel, whereas without outlining it would be sequential.

...So yea. I suspect that @alexcrichton's hypotheses are correct.

Show resolved Hide resolved src/cargo/core/compiler/mod.rs Outdated
Show resolved Hide resolved src/cargo/core/compiler/mod.rs
Show resolved Hide resolved src/cargo/core/compiler/mod.rs Outdated
Show resolved Hide resolved src/cargo/core/compiler/mod.rs Outdated
Show resolved Hide resolved src/cargo/util/machine_message.rs Outdated
@ehuss

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

Sounds reasonable.

I'd like to add some timing data to the debug logs, I'll follow up with a separate PR with some ideas.

I think this will address some other issues, like #6674 and #6119.

but it turns out that we unconditionally capture all output from the compiler/rustdoc today, no exceptions

This hasn't been true since #6289. EDIT: Or maybe this means that the stdout/stderr are not inherited anymore, nevermind.

@matthiaskrgr

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

It looks like this breaks compatibility with sccache

   Compiling unreachable v1.0.0
sccache: encountered fatal error
sccache: error : Failed to open file for hashing: "/home/matthias/vcs/github/fd/target/debug/deps/libvoid-f01d6c64b2fc03b4.rmeta"
sccache:  cause: Failed to open file for hashing: "/home/matthias/vcs/github/fd/target/debug/deps/libvoid-f01d6c64b2fc03b4.rmeta"
sccache:  cause: No such file or directory (os error 2)
error: Could not compile `unreachable`.

is this expected?

Handle transitive `All` dependency edges
We need to synthesize some dependency edges in the dependency graph to
get everything ordered correctly! (more details in the commit itself)

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from 854e021 to 7892472 May 9, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

Updated with @ehuss's comments.

@matthiaskrgr I'm not sure why, but I'm not really shooting for 100% fidelity works everywhere in this PR, that's something I hope we can fix and iterate on afterwards.

@ehuss

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Sorry, one more minor nit. I notice the progress bar jumping backwards at times. I think the following will fix it:

diff --git a/src/cargo/core/compiler/job_queue.rs b/src/cargo/core/compiler/job_queue.rs
index bff2d5b6..10ecf47f 100644
--- a/src/cargo/core/compiler/job_queue.rs
+++ b/src/cargo/core/compiler/job_queue.rs
@@ -320,7 +320,8 @@ impl<'a, 'cfg> JobQueue<'a, 'cfg> {
             // unnecessarily.
             let events: Vec<_> = self.rx.try_iter().collect();
             let events = if events.is_empty() {
-                self.show_progress(total);
+                let count = total - self.queue.len() - self.active.len() - queue.len();
+                self.show_progress(count, total);
                 vec![self.rx.recv().unwrap()]
             } else {
                 events
@@ -431,8 +432,7 @@ impl<'a, 'cfg> JobQueue<'a, 'cfg> {
         }
     }

-    fn show_progress(&mut self, total: usize) {
-        let count = total - self.queue.len() - self.active.len();
+    fn show_progress(&mut self, count: usize, total: usize) {
         let active_names = self
             .active
             .values()

Essentially when jobs enter the local queue, they get lost from the count until they get added to active. It also made the progress bar appear to be further along than it actually was.

EDIT: Alternatively, it could keep a running "finished" count.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

Ah yeah I think I like the idea of a running finished count, so pushed that!

@@ -350,6 +351,7 @@ impl<'a, 'cfg> JobQueue<'a, 'cfg> {
print.print(&msg)?;
}
Message::Finish(id, artifact, result) => {
finished += 1;

This comment has been minimized.

Copy link
@ehuss

ehuss May 10, 2019

Contributor

I think this needs to be moved down to the Artifact::All branch?

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton May 10, 2019

Author Member

Hm yes indeed

Keep better track of finished crates
Instead of juggling a few counters let's just keep an explicit counter
of finished packages.

@alexcrichton alexcrichton force-pushed the alexcrichton:pipelining-v2 branch from aca32d5 to dcd7c48 May 10, 2019

@ehuss

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

👍

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

📌 Commit dcd7c48 has been approved by ehuss

@bors

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

⌛️ Testing commit dcd7c48 with merge 2e09266...

bors added a commit that referenced this pull request May 10, 2019

Auto merge of #6883 - alexcrichton:pipelining-v2, r=ehuss
Implement the Cargo half of pipelined compilation (take 2)

This commit starts to lay the groundwork for #6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an `*.rmeta` file as well as an `*.rlib` file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the `*.rmeta` is ready and not
have to wait for the `*.rlib` compilation.

Initially attempted in #6864 with a pretty invasive refactoring this
iteration is much more lightweight and fits much more cleanly into
Cargo's backend. The approach taken here is to update the
`DependencyQueue` structure to carry a piece of data on each dependency
edge. This edge information represents the artifact that one node
requires from another, and then we a node has no outgoing edges it's
ready to build.

A dependency on a metadata file is modeled as just that, a dependency on
just the metadata and not the full build itself. Most of cargo's backend
doesn't really need to know about this edge information so it's
basically just calculated as we insert nodes into the `DependencyQueue`.
Once that's all in place it's just a few pieces here and there to
identify compilations that *can* be pipelined and then they're wired up
to depend on the rmeta file instead of the rlib file.

Closes #6660
@bors

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: ehuss
Pushing 2e09266 to master...

@bors bors merged commit dcd7c48 into rust-lang:master May 10, 2019

3 checks passed

Travis CI - Pull Request Build Passed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details
@mark-i-m

This comment has been minimized.

Copy link

commented May 10, 2019

So is pipelining implemented on nightly now?

@Eh2406

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

No. It is on master for cargo. Every week or so there is a PR on rust to update the cargo subrepo. The next nightly after that PR is merged will have this functionality. Then the functionality still requires an opt in with the env CARGO_BUILD_PIPELINING.

@alexcrichton alexcrichton deleted the alexcrichton:pipelining-v2 branch May 10, 2019

@ehuss ehuss referenced this pull request May 16, 2019

Merged

Update cargo #60874

bors added a commit to rust-lang/rust that referenced this pull request May 16, 2019

Auto merge of #60874 - ehuss:update-cargo, r=alexcrichton
Update cargo

17 commits in 759b6161a328db1d4863139e90875308ecd25a75..c4fcfb725b4be00c72eb9cf30c7d8b095577c280
2019-05-06 20:47:49 +0000 to 2019-05-15 19:48:47 +0000
- tests: registry: revert readonly permission after running tests. (rust-lang/cargo#6947)
- Remove Candidate (rust-lang/cargo#6946)
- Fix for "Running cargo update without a Cargo.lock ignores arguments" rust-lang/cargo#6872 (rust-lang/cargo#6904)
- Fix a minor mistake in the changelog. (rust-lang/cargo#6944)
- Give a better error message when crates.io requests time out (rust-lang/cargo#6936)
- Re-enable compatibility with readonly CARGO_HOME (rust-lang/cargo#6940)
- Fix version of `ignore`. (rust-lang/cargo#6938)
- Stabilize offline mode. (rust-lang/cargo#6934)
- zsh: Add doc options to include non-public items documentation (rust-lang/cargo#6929)
- zsh: Suggest --lib option as binary template now the default (rust-lang/cargo#6926)
- Migrate package include/exclude to gitignore patterns. (rust-lang/cargo#6924)
- Implement the Cargo half of pipelined compilation (take 2) (rust-lang/cargo#6883)
- Always include `Cargo.toml` when packaging. (rust-lang/cargo#6925)
- Remove unnecessary calls to masquerade_as_nightly_cargo. (rust-lang/cargo#6923)
- download: fix "Downloaded 1 crates" message (crates -&gt; crate) (rust-lang/cargo#6920)
- Changed RUST_LOG usage to CARGO_LOG to avoid confusion. (rust-lang/cargo#6918)
- crate download: don't print that a crate was the largest download if it was the only download (rust-lang/cargo#6916)
@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented May 17, 2019

I've posted an internals topic for getting more information here

bors added a commit to rust-lang/rust-clippy that referenced this pull request May 19, 2019

Auto merge of #4115 - ehuss:fix-compiletest, r=Manishearth
Fix compile-test from forcing a rebuild.

If the `compile-test` test was run, then running a cargo build command immediately after caused everything to be rebuilt. This is because the `compile-test` test was deleting all `.rmeta` files in the target directory. Cargo recently switched to always generating `.rmeta` files (rust-lang/cargo#6883), and when the files are deleted, it thinks it needs to be rebuilt.

I am not very familiar with compiletest or clippy, so please take a close look and test this out (with the most recent nightly). In particular, make sure it doesn't revert the fixes from #3380 (it seems to work for me). Also @oli-obk mentioned something related in rust-lang/rust#60190 (comment), and I want to make sure that is addressed as well.

Fixes #4114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.