Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Implement support for out-of-process compilation #1536

Merged
merged 1 commit into from Aug 28, 2019

Conversation

Xanewok
Copy link
Member

@Xanewok Xanewok commented Aug 6, 2019

This is quite a lengthy patch, but the gist of it is as follows:

  • rls-ipc crate is introduced which acts as the IPC interface along with a server/client implementation
  • rls-rustc is enhanced with optional support for the IPC
  • RLS can optionally support it via setting RLS_OUT_OF_PROCESS env var (like rls-rustc it needs to be compiled ipc feature)

The IPC is async JSON-RPC running on Tokio using parity-tokio-ipc (UDS on unices and named pipes on Windows)

  • Tokio because I wanted to more or less easily come up with a PoC
  • RPC because I needed a request->response model for VFS IPC function calls
  • uds/pipes because it's somewhat cross-platform and we don't have to worry about rustc potentially polluting stdio (maybe just capturing the output in run_compiler would be enough?)

However, the implementation is far from efficient - it currently starts a thread per requested compilation, which in turn starts a single-threaded async runtime to drive the IPC server for a given compilation.

I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead.

While this gets rid of the global environment lock on each (previously) in-process crate compilation, what still needs to be addressed is the sequential compilation of cached build plan for this implementation to truly benefit from the unlocked parallelization potential.

I did some rough test runs (~5) and on a warm cache had the following results:

  • integration test suite (release) 3.6 +- 0.2s (in-process) vs 3.8 +- 0.3s (out-of-process)
  • rustfmt master whitespace change (release) 6.4 +- 0.2s (in-process) vs 6.6 +- 0.3s (out-of-process)

which at least initially confirms that the performance overhead is somewhat negligible if we can really parallelize the work and leverage process isolation for increased stability.

cc #1307

(I'll squash the commits in the final merge, 30+ commits is a tad too much 😅 )

If possible I'd like to get more eyes on the patch to see if it's a good approach and what might be directly improved:

  • @matklad for potentially shared rustc-with-patched-filesystem
  • @alexheretic for the RLS/implementation itself
  • @alexcrichton @nrc do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow? For now we just rolled our own linear queue here because we didn't need much more but maybe it might be worthwhile to extract the pure execution bits somehow?

@matklad
Copy link
Member

matklad commented Aug 7, 2019

Excellent work, I am super excited about this approach!

I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead.

My gut feeling would be that optimizing here is not really required: we shouldn't be running many compilations concurrently anyway. I am more worried about depending on tokio and the rest of async ecosystem: it centrally is a good choice for getting the stuff up an running, but, longer term, I think that we are unfortunately buying-in into a lot of complexity here. An alternative would be to roll-our-own blocking json-per-line API on top of mkfifo/windows-named-pipe.

do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow?

That's perhaps a naive question, but whey we can't do env RLS_IPC_ENDPOINT=... cargo check? Ie, shell out to Cargo, setting up the required env-vars in such a way that Cargo calls our shim IPC-enabled rustc?

@matklad
Copy link
Member

matklad commented Aug 8, 2019 via email

@alexcrichton
Copy link
Member

do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow

This can be done but it'll likely be somewhat tricky. There's a lot to handle here like:

  • The dependency graph itself
  • Limiting parallelism with a jobserver and -j
  • Handling when one job exits in parallel
  • Lots of concurrent output to make sure gets weaved accordingly

FWIW though I don't have a good understanding of what this PR is doing, so I'm not quite sure exactly how hard this would be! In any case if you're interested we could try to see if logic could be extracted to crates on crates.io or something like that?

@Xanewok Xanewok force-pushed the ipc-everything branch 5 times, most recently from e293bb8 to 4fb6446 Compare August 13, 2019 15:15
@bors
Copy link
Contributor

bors commented Aug 13, 2019

☔ The latest upstream changes (presumably #1541) made this pull request unmergeable. Please resolve the merge conflicts.

@Xanewok Xanewok force-pushed the ipc-everything branch 2 times, most recently from ccb979d to a6e0304 Compare August 13, 2019 15:52
@alexheretic
Copy link
Member

I'm looking forward to the improvements compiling out-of-process can bring. This implementation looks quite clean. I wonder if we could make this simpler though.

On that note I wouldn't have bothered with the "ipc" feature separation, it adds complexity. Is this just to avoid compiling tokio when not used? Although having both in-process & out with a runtime switch makes sense.

I'd like to have diagnostic streaming instead of after-compile-reporting, and for active processes to be killed/cancelled when a new compile is required. But I'm getting ahead of myself a bit.

@bors
Copy link
Contributor

bors commented Aug 26, 2019

☔ The latest upstream changes (presumably #1545) made this pull request unmergeable. Please resolve the merge conflicts.

@Xanewok
Copy link
Member Author

Xanewok commented Aug 28, 2019

@matklad

I am more worried about depending on tokio and the rest of async ecosystem

Yeah, that's fair - ideally I'd like not to depend on all of it just for the sake of IPC. FWIW we do use Tokio already in the development setting for the integration tests and right now the IPC is conditionally compiled under a feature flag, so for now I'd continue with this approach while we polish it as we go in order to ship it in the RLS for other users.

That's perhaps a naive question, but whey we can't do env RLS_IPC_ENDPOINT=... cargo check?

That's the hottest operation right now in the RLS so I'd like to cut down the overhead as much as possible. I'd prefer not to re-run the cargo check in a cached build scenario potentially on every keystroke (and ideally in the long-run we should not depend on the adaptive delay in order to push the results to the user as soon as possible) - it might do disk I/O and most of the time we probably won't build more than 2 crate targets, short of workspaces with a lot of binary/test crates or with a lot of transitive primary dependencies.

I think that we are unfortunately buying-in into a lot of complexity here. An alternative would be to roll-our-own blocking json-per-line API on top of mkfifo/windows-named-pipe.

Oh, this looks interesting:
https://users.rust-lang.org/t/recommended-way-of-ipc-in-rust/31116/9?u=matklad

Time to shave some more yaks it seems 😈

@alexcrichton

In any case if you're interested we could try to see if logic could be extracted to crates on crates.io or something like that?

That'd be interesting to pursue! In general it'd be great if we could just throw a computation DAG/build plan at an 'execution engine' of sorts and handle the output. As I understand it Rayon (which also has support for jobserver) is meant to be used to process data in a single process rather than coordinate multiple processes?

At a first glance this sounds like a useful thing to have in the ecosystem, so maybe it wouldn't be a waste only to expose it for the sake of RLS?

@alexheretic

On that note I wouldn't have bothered with the "ipc" feature separation, it adds complexity. Is this just to avoid compiling tokio when not used?

Yep! (see above)

I'd like to have diagnostic streaming instead of after-compile-reporting, and for active processes to be killed/cancelled when a new compile is required. But I'm getting ahead of myself a bit.

That's a great idea! Let me write that on a to-do list - we should definitely explore it further.

Since the implementation looks okay as you're saying I'll merge this and we'll hopefully iterate and polish it further as we go. Thanks for the review ❤️

@Xanewok
Copy link
Member Author

Xanewok commented Aug 28, 2019

@bors r+

@bors
Copy link
Contributor

bors commented Aug 28, 2019

📌 Commit 3bcfa0f has been approved by Xanewok

@bors
Copy link
Contributor

bors commented Aug 28, 2019

⌛ Testing commit 3bcfa0f with merge 00e4f29...

bors added a commit that referenced this pull request Aug 28, 2019
Implement support for out-of-process compilation

This is quite a lengthy patch, but the gist of it is as follows:
- `rls-ipc` crate is introduced which acts as the IPC interface along with a server/client implementation
- `rls-rustc` is enhanced with optional support for the IPC
- RLS can optionally support it via setting `RLS_OUT_OF_PROCESS` env var (like `rls-rustc` it needs to be compiled `ipc` feature)

The IPC is async JSON-RPC running on Tokio using `parity-tokio-ipc` (UDS on unices and named pipes on Windows)
  - Tokio because I wanted to more or less easily come up with a PoC
  - RPC because I needed a request->response model for VFS IPC function calls
  - uds/pipes because it's somewhat cross-platform and we don't have to worry about `rustc` potentially polluting stdio (maybe just capturing the output in `run_compiler` would be enough?)

However, the implementation is far from efficient - it currently starts a thread per requested compilation, which in turn starts a single-threaded async runtime to drive the IPC server for a given compilation.

I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead.

While this gets rid of the global environment lock on each (previously) in-process crate compilation, what still needs to be addressed is the [sequential compilation](https://github.com/rust-lang/rls/blob/35eba227650eee482bedac7d691a69a8487b2135/rls/src/build/plan.rs#L122-L124) of cached build plan for this implementation to truly benefit from the unlocked parallelization potential.

I did some rough test runs (~5) and on a warm cache had the following results:
- integration test suite (release) 3.6 +- 0.2s (in-process) vs 3.8 +- 0.3s (out-of-process)
- rustfmt master whitespace change (release) 6.4 +- 0.2s (in-process) vs 6.6 +- 0.3s (out-of-process)

which at least initially confirms that the performance overhead is somewhat negligible if we can really parallelize the work and leverage process isolation for increased stability.

cc #1307

(I'll squash the commits in the final merge, 30+ commits is a tad too much 😅 )

If possible I'd like to get more eyes on the patch to see if it's a good approach and what might be directly improved:
- @matklad for potentially shared rustc-with-patched-filesystem
- @alexheretic for the RLS/implementation itself
- @alexcrichton @nrc do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow? For now we just rolled our own linear queue here because we didn't need much more but maybe it might be worthwhile to extract the pure execution bits somehow?
@bors
Copy link
Contributor

bors commented Aug 28, 2019

☀️ Test successful - checks-azure
Approved by: Xanewok
Pushing 00e4f29 to master...

@bors bors merged commit 3bcfa0f into rust-lang:master Aug 28, 2019
@Xanewok Xanewok deleted the ipc-everything branch August 28, 2019 13:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants