Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

RLS should compile in a separate process #1307

Open
alexheretic opened this issue Feb 12, 2019 · 11 comments
Open

RLS should compile in a separate process #1307

alexheretic opened this issue Feb 12, 2019 · 11 comments
Labels
enhancement Indicates new feature requests latency P-high

Comments

@alexheretic
Copy link
Member

alexheretic commented Feb 12, 2019

RLS currently calls into the compiler within it's own process. As I understand it this is done to allow compiling using Vfs, or more plainly compiling a project with unsaved files.

Problems with in-process

There are, however, some drawbacks to in-process compilation. I can think of 3 good ones: static memory isolation, cancellation & stdout isolation

  • Cargo/rustc have been designed as batch tasks run from cli, whereas RLS is a long-lived process. This means static memory usage in the compiler continues to consume memory when RLS is idle, and doing so is fairly reasonable in the compiler but less so in RLS.

    Procedural macros authors assume compilation is a batch job / short lived. We can see this with "conflicting implementation of trait" with enum_dispatch crate #1212 where a proc-macro's use of mutable statics means RLS encounters bugs that are not easily encountered with cargo. It also mean proc-macros may consume memory while RLS is idle.

  • Compiles cannot be cancelled. This is an old annoyance of mine about RLS, in this way RLS is inferior to cargo watch -x check. Processes can be cancelled, and the compiler handles such cancellations pretty well. Threads cannot be cancelled in the same way. It would be a big performance improvement to RLS to be able to simply cancel the current compiler run after a source change, allowing the next compile run to start immediately.

  • Compiler & procedural macros can very easily print to stdout. This basically kills RLS because we use RLS stdout exclusively for LSP comms and the LSP client libs cannot handle garbage.

Vfs cross-process

The main barrier, it seems, then is getting Vfs to work cross-process. If this is possible we can start the compiler in a new process. It would also unlock further improvements.

  • Racer and any such expensive tasks could be run in a separate process. Allowing cancellation and stdout/static memory isolation.

So can we make Vfs work with the compiler in another process?

Are there other benefits of working in-process?

@lijinpei
Copy link
Contributor

I think this depends on which model will cargo/rustc use in the future, multi-process/multi-threading, and whichever their choice, RLS have to follow.

As I understand it this is done to allow compiling using Vfs, or more plainly compiling a project with unsaved files.

It seems like vfs has a small code-base, if needed, I think we can use something like linux shared memory map to make it work across process.

  • Cargo/rustc have been designed as batch tasks run from cli, whereas RLS is a long-lived process. This means static memory usage in the compiler continues to consume memory when RLS is idle, and doing so is fairly reasonable in the compiler but less so in RLS.
    Procedural macros authors assume compilation is a batch job / short lived. We can see this with "conflicting implementation of trait" with enum_dispatch crate #1212 where a proc-macro's use of mutable statics means RLS encounters bugs that are not easily encountered with cargo. It also mean proc-macros may consume memory while RLS is idle.

Let's take llvm/clang as an example, although most llvm/clang's tools are designed to run as a separate process, but llvm's api can be thread-safe if each use it's own context(but I not quite sure does clang's api can be run from separate threads) , and I don't know what's (or what will be) the status for rustc's support for multi-threading.
As for procedural macro, may be in the future procedural will be forbidden to have static states, or receive some context from the compiler or get some notification at the beginning/end of the compilation process for a crate. Concerning enum_dispatch, I don't think it's quite hygienic/respect scoping because if even can not compile this .
If we dlclose() procedure's .so file, maybe we can handle the memory waste.

  • Compiles cannot be cancelled. This is an old annoyance of mine about RLS, in this way RLS is inferior to cargo watch -x check. Processes can be cancelled, and the compiler handles such cancellations pretty well. Threads cannot be cancelled in the same way. It would be a big performance improvement to RLS to be able to simply cancel the current compiler run after a source change, allowing the next compile run to start immediately.

I agree that thread cancelling is tricky.

  • Compiler & procedural macros can very easily print to stdout. This basically kills RLS because we use RLS stdout exclusively for LSP comms and the LSP client libs cannot handle garbage.

I think maybe we can implement something with tokio to support listen on unix-domain/tcp/udp and not communicate through stdin/stdout? I think io-multiplexing with tokio is easy, but maybe we have to determine RLS' threading model to fully make use of tokio.

So can we make Vfs work with the compiler in another process?

Perhaps with shared memory?
http://man7.org/linux/man-pages/man7/shm_overview.7.html
https://docs.microsoft.com/en-us/windows/desktop/memory/creating-named-shared-memory

@lijinpei
Copy link
Contributor

I think the real fix should be on the rustc side, two things needs to be done:

  1. macro-writer should be aware that their procedural-macros may be called multiple-times(whether for the same crate or different crates, whether on same thread or parallelly on multiple threads).
  2. rustc should expose some api to register some callbacks at different stage of the compilation process, to name a few:
  • before/after the works for all the crates
  • for each crates, before/after parsing/macro-expansion/to-mir/mir-to-codegen

The above api's can not be stable because they depends on the details of rustc's internals, but a stable api can be provided for macro-writers, which register callbacks before/after macro-expansion for each crate(and perhaps guaranteed to run the same thread of macro expander for the crate?), this way, procedural can have a chance to manage their mutable states.

@alexheretic
Copy link
Member Author

Yeah I think there are definitely improvements upstream that can be done for proc-macros. Also we can solve all stdout leakage by wrapping RLS itself in a inner process and using explicit comms.

But these can both happen in addition to wrapping the compiler in a process. Though I imagine the latter will become a non-issue if we do, as proc-macro printlns are the most problematic for rls.

I'm personally very keen to have cancelling for compiles and to a lesser extent completion.

Cross-process vfs could be achieved with implementing Serialize for vfs and using ipc-channel. However, this is currently missing a Windows implementation. We can try it out in Linux at least.

@lijinpei
Copy link
Contributor

I'm personally very keen to have cancelling for compiles and to a lesser extent completion.

Hi, @alexheretic I have thought about compile cancelling for a while, and I think perhaps it can be implemented in the following manner:
Currently, rustc have 4 phase to compile a crate (search for phase_ here and you will find them). And it can be controlled by a CompileController to stop after each phase, but unluckily, the decision to stop after a phase has to be made before the compilation starts, which doesn't fit our needs. But you can see that compile_input is not quite complex, it just delegates to other phase functions, handles errors, call callbacks from CompileController, if we implement it ourselves, we can cancel compilation at each phase. This is not as powerful as sending killing-signal to a process and cancel at any point, but it can reduce user's waiting time and wasteful cpu cycles. Besides, with this approach, we can get the intermediate results of the compilation process, such as ast/hir/mir, which may benefits some future features of RLS. Also we can directly get dep-info from rustc, instead of having to pass some command line option to rustc and parse its' output. We may also use our own diagnostic handler instead of parsing rustc's json output(But strictly speaking, this does not require reimplementting compile_input ourselves). What do you think of this approach?
By the way, in fact a build request may involves calling cargo (especially the initial fresh build) besides running some cached rustc commands. We claims to not hug the cpu when doing a build but in fact cargo spawning new process is out of our control(well, in fact, cargo has a job-number config but we didn't make use of it). I think with cargo provided Resolve and rustc provided dep-info, principlely we can control the whole build ourselves (we also need to figure the command used to build each package, but I don't expect that to be extremely difficult).

@alexheretic
Copy link
Member Author

@lijinpei It's a clever idea. I actually experimented with this myself way back when I first started moaning about cancellation in rls. From what I can remember it didn't seem to help much, probably because it didn't affect the cargo run. At the time I decided it was too much added complexity without decent enough benefit.

If you something like this can work with cargo it might be worth a try. I still think separate process is the way to go though.

@lijinpei
Copy link
Contributor

@alexheretic Ok, I will try to explore making servo/ipc-channel work on windows with named pipe.

@Xanewok Xanewok added the enhancement Indicates new feature requests label Mar 3, 2019
@Xanewok
Copy link
Member

Xanewok commented Mar 3, 2019

cc #1148 (comment)

@Xanewok Xanewok added the P-high label Apr 5, 2019
bors added a commit that referenced this issue Aug 28, 2019
Implement support for out-of-process compilation

This is quite a lengthy patch, but the gist of it is as follows:
- `rls-ipc` crate is introduced which acts as the IPC interface along with a server/client implementation
- `rls-rustc` is enhanced with optional support for the IPC
- RLS can optionally support it via setting `RLS_OUT_OF_PROCESS` env var (like `rls-rustc` it needs to be compiled `ipc` feature)

The IPC is async JSON-RPC running on Tokio using `parity-tokio-ipc` (UDS on unices and named pipes on Windows)
  - Tokio because I wanted to more or less easily come up with a PoC
  - RPC because I needed a request->response model for VFS IPC function calls
  - uds/pipes because it's somewhat cross-platform and we don't have to worry about `rustc` potentially polluting stdio (maybe just capturing the output in `run_compiler` would be enough?)

However, the implementation is far from efficient - it currently starts a thread per requested compilation, which in turn starts a single-threaded async runtime to drive the IPC server for a given compilation.

I imagine we could either just initiate the runtime globally and spawn the servers on it and drive them to completion on each compilation to reduce the thread spawn/coordination overhead.

While this gets rid of the global environment lock on each (previously) in-process crate compilation, what still needs to be addressed is the [sequential compilation](https://github.com/rust-lang/rls/blob/35eba227650eee482bedac7d691a69a8487b2135/rls/src/build/plan.rs#L122-L124) of cached build plan for this implementation to truly benefit from the unlocked parallelization potential.

I did some rough test runs (~5) and on a warm cache had the following results:
- integration test suite (release) 3.6 +- 0.2s (in-process) vs 3.8 +- 0.3s (out-of-process)
- rustfmt master whitespace change (release) 6.4 +- 0.2s (in-process) vs 6.6 +- 0.3s (out-of-process)

which at least initially confirms that the performance overhead is somewhat negligible if we can really parallelize the work and leverage process isolation for increased stability.

cc #1307

(I'll squash the commits in the final merge, 30+ commits is a tad too much 😅 )

If possible I'd like to get more eyes on the patch to see if it's a good approach and what might be directly improved:
- @matklad for potentially shared rustc-with-patched-filesystem
- @alexheretic for the RLS/implementation itself
- @alexcrichton @nrc do you have thoughts on if we can share the parallel graph compilation logic with Cargo somehow? For now we just rolled our own linear queue here because we didn't need much more but maybe it might be worthwhile to extract the pure execution bits somehow?
@Xanewok
Copy link
Member

Xanewok commented Aug 28, 2019

Support for compiling in a separate process has landed in the master branch, although you need to explicitly build with an ipc feature and with RLS_OUT_OF_PROCESS set.

@jhpratt
Copy link
Member

jhpratt commented Nov 8, 2019

@Xanewok What is needed for this be on by default?

@programmerjake
Copy link
Member

@Xanewok ping. Hoping this will be set to the default soon at least on linux, I'm getting out-of-memory with RLS even though I have 32G of memory.

@JAicewizard
Copy link

How do I build a version of rls with the ipc feature enabled?
rls is throwing me a lot of errors using enum_dispatch .
I tried building with feature flags using cargo, and came up with env RUSTFLAGS="--cfg feature=\"ipc\"" cargo install rls --force but that doesnt seem to work propperly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Indicates new feature requests latency P-high
Projects
None yet
Development

No branches or pull requests

6 participants