Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run translation and LLVM in parallel when compiling with multiple CGUs #43506

Merged
merged 29 commits into from Aug 1, 2017

Conversation

Projects
None yet
7 participants
@michaelwoerister
Copy link
Contributor

michaelwoerister commented Jul 27, 2017

This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.

This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with -Ccodegen-units=N.

In order to help tuning and debugging the work scheduler, the PR adds the -Ztrans-time-graph flag which spits out html files that show how work packages where scheduled:
Building regex
(red is translation, green is llvm)

One side effect here is that -Ztime-passes might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.

One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.

TODO:

  • Restore -Z time-passes output for LLVM.
  • Update documentation, esp. for work package scheduling.
  • Tune the scheduling algorithm.

cc @alexcrichton @rust-lang/compiler

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jul 27, 2017

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Jul 27, 2017

r? @alexcrichton

Pre-assigning @alexcrichton for review, so he can already start reading :P

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 27, 2017

🎊

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jul 27, 2017

As a future thought for how -Ztime-passes could be implemented in a world where everything is threaded, we could use GetThreadTimes on Windows (and whatever linux equivalent) to measure the CPU time used up by a given thread to more precisely track where time is being spent.

@alexcrichton
Copy link
Member

alexcrichton left a comment

Looking great!

The general framework of who's running what when seemed a little confusing to follow, but I think a comment would go a long way towards helping that.

if let Ok(token) = token {
tokens.push(token);
} else {
shared_emitter.fatal("failed to acquire jobserver token");

This comment has been minimized.

@alexcrichton

alexcrichton Jul 27, 2017

Member

Could the error be emitted here as well?

This comment has been minimized.

@michaelwoerister

michaelwoerister Jul 28, 2017

Author Contributor

What do you mean? Just panic?

This comment has been minimized.

@alexcrichton

alexcrichton Jul 28, 2017

Member

Oh just something like:

match token {
    Ok(token) => tokens.push(token),
    Err(e) => shared_emitter.fatal(&format!("failed to acquire jobserver token: {}", e)),
}

This comment has been minimized.

@michaelwoerister

michaelwoerister Jul 28, 2017

Author Contributor

ah ok, yes.

}

Message::TranslationDone { llvm_work_item, is_last } => {
work_items.insert(0, llvm_work_item);

This comment has been minimized.

@alexcrichton

alexcrichton Jul 27, 2017

Member

How come this is inserted on the front?

This comment has been minimized.

@michaelwoerister

michaelwoerister Jul 28, 2017

Author Contributor

No reason. I changed that during a debugging session. I'll switch it back to a push.

assert_eq!(trans_worker_state, TransWorkerState::LLVMing);
trans_worker_state = TransWorkerState::Idle;
} else {
drop(tokens.pop());

This comment has been minimized.

@alexcrichton

alexcrichton Jul 27, 2017

Member

I think we discussed this awhile ago, but should we perhaps not drop the token here? If we greedily hold on to our tokens then that means we can more quickly finish this compilation, which in theory may be desirable to reduce overall memory usage?

This comment has been minimized.

@michaelwoerister

michaelwoerister Jul 28, 2017

Author Contributor

Yes, good idea. The truncate above should take care of dropping the Token if we actually don't need it.

}
}
} else {
match trans_worker_state {

This comment has been minimized.

@alexcrichton

alexcrichton Jul 27, 2017

Member

I'm currently finding the logic here sort of hard to follow in terms of what this "trans worker" is doing. Could you be sure to add a comment with a high-level architecture of what the relationship is between this coordinator thread, the main translation thread, and the worker codegen threads?

This comment has been minimized.

@alexcrichton

alexcrichton Jul 27, 2017

Member

Ok so to see if I understand this:

  • The documentation above refers to a "translation worker"
  • This trans worker seems to represent the "ephemeral token" that we inherently have to run work from a jobserver
  • The literal thread representing the trans worker can change I think? Sometimes it's the literally the thread doing translation, sometimes it's a spawned worker here to translate an existing module.
  • If we happen to reach 4 translated codegen units but not codegen'd codegen units then we request the translating main thread to stop, and continue its work with a freshly spawned thread to codegen a module.

Does that sound roughly right?

Ah I see now there's also a crucial piece where when any codegen thread finishes we consider it "trans worker" thread now available again. This means after any codegen thread finishes we may be candidate to start translation of another unit of work I think, right?

This comment has been minimized.

@michaelwoerister

michaelwoerister Jul 28, 2017

Author Contributor

Ah I see now there's also a crucial piece where when any codegen thread finishes we consider it "trans worker" thread now available again. This means after any codegen thread finishes we may be candidate to start translation of another unit of work I think, right?

No, but it's an excellent idea :D

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:async-llvm branch from d2cdb54 to b6c9e69 Jul 28, 2017

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 28, 2017

Docs look great! So one overall meta-comment as well now. I'm having some difficulty articulating this but I'm slightly worried about a situation like:

  • We've got N codegen units left to translate, but the main thread is stopped
  • Our heuristic blocks the main thread an spins up a worker for an existing codegen unit.
  • All of a sudden we get an influx of jobserver tokens, but the main thread is still stopped.

I haven't convinced myself this is possible and I think that the various handling makes it ok? I've found the interaction between all these threads and the implicit token a little difficult to follow, but does this make sense to you? Can you think of a case where we accidentally starve the translation thread due to our heuristic?

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:async-llvm branch from b6c9e69 to dbaee99 Jul 28, 2017

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Jul 28, 2017

I haven't convinced myself this is possible and I think that the various handling makes it ok? I've found the interaction between all these threads and the implicit token a little difficult to follow, but does this make sense to you? Can you think of a case where we accidentally starve the translation thread due to our heuristic?

So with the latest change, the main thread will be free again as soon as the first LLVM worker is done (thanks to your suggestion). It's still possible to run into a LLVM work shortage the way you describe though.

I just adapted the strategy to estimate the cost of a LLVM WorkItem. Maybe the main thread should always start the cheapest one available, so it can get back to translating sooner, if needed? I don't think this is much of a problem though (with no data to back this claim up in any way :))

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 28, 2017

Thinking about this a bit more, I think this is my mental model for what's happening: on each turn of the loop we'll have N slots of work to fill up depending on what's currently running and what amount of tokens we have. Given the choice of whether to translate a new unit or codegen an existing unit it seems fine to have a heuristic. Whenever something happens though it'll turn the loop and cause everything to start over.

I think that's roughly what's implemented right now, but I think that it means that we should consider the translation thread idle as soon as we've acquired a new token? That way if the translation thread was blocked and we get a token it should get unblocked? (which I don't think happens today?) I may not be following the code quite right though...

I just adapted the strategy to estimate the cost of a LLVM WorkItem

Neat!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Jul 31, 2017

but I think that it means that we should consider the translation thread idle as soon as we've acquired a new token?

Yes, that makes sense. It's a variation of considering the translation thread idle when a package is finished (another way of getting an additional token).

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:async-llvm branch from dbaee99 to 29989ef Jul 31, 2017

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 1, 2017

Thanks, @kennytm. Compiling with LLVM 3.7 right now...

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 1, 2017

Can't reproduce with LLVM 3.7 either :(

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:async-llvm branch from 3b2af87 to b8d4413 Aug 1, 2017

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 1, 2017

I pushed another change that should give a sensible error message at a likely point of failure. Also adapted the scheduler heuristic slightly. Let's see.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 1, 2017

New error, excellent :D

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Aug 1, 2017

@bors r=alexcrichton

OK, passes travis now.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 1, 2017

📌 Commit 6468cad has been approved by alexcrichton

bors added a commit that referenced this pull request Aug 1, 2017

Auto merge of #43506 - michaelwoerister:async-llvm, r=alexcrichton
Run translation and LLVM in parallel when compiling with multiple CGUs

This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.

This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with `-Ccodegen-units=N`.

In order to help tuning and debugging the work scheduler, the PR adds the `-Ztrans-time-graph` flag which spits out html files that show how work packages where scheduled:
![Building regex](https://user-images.githubusercontent.com/1825894/28679272-f6752bd8-72f2-11e7-8a6c-56207855ce95.png)
(red is translation, green is llvm)

One side effect here is that `-Ztime-passes` might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.

One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.

TODO:
- [x] Restore `-Z time-passes` output for LLVM.
- [x] Update documentation, esp. for work package scheduling.
- [x] Tune the scheduling algorithm.

cc @alexcrichton @rust-lang/compiler
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 1, 2017

⌛️ Testing commit 6468cad with merge e772c28...

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 1, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing e772c28 to master...

@bors bors merged commit 6468cad into rust-lang:master Aug 1, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@bors bors referenced this pull request Aug 1, 2017

Merged

Profile queries #43345

bors added a commit that referenced this pull request Oct 5, 2017

Auto merge of #45019 - aidanhs:aphs-no-trans-worker-panic, r=alexcric…
…hton

Don't unwrap work item results as the panic trace is useless

Fixes #43402 now there's no multithreaded panic printouts

Also update a comment

--------

Likely regressed in #43506, where the code was changed to panic in worker threads on error.

Unwrapping gives zero extra information since the stack trace is so short, so we may as well just surface that there was an error and exit the thread properly. Because there are then no multithreaded printouts, I think it should mean the output of the test for #26199 is deterministic and not interleaved (thanks to @philipc #43402 (comment) for a hint).

Sadly the output is now:
```
thread '<unnamed>' panicked at 'aborting due to worker thread panic', src/librustc_trans/back/write.rs:1643:20
note: Run with `RUST_BACKTRACE=1` for a backtrace.
error: could not write output to : No such file or directory

error: aborting due to previous error
```
but it's an improvement over the multi-panic situation before.

r? @alexcrichton

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 3, 2017

Update to 1.21.0
Changelog:
Version 1.21.0 (2017-10-12)
==========================

Language
--------
- [You can now use static references for literals.][43838]
  Example:
  ```rust
  fn main() {
      let x: &'static u32 = &0;
  }
  ```
- [Relaxed path syntax. Optional `::` before `<` is now allowed in all contexts.][43540]
  Example:
  ```rust
  my_macro!(Vec<i32>::new); // Always worked
  my_macro!(Vec::<i32>::new); // Now works
  ```

Compiler
--------
- [Upgraded jemalloc to 4.5.0][43911]
- [Enabled unwinding panics on Redox][43917]
- [Now runs LLVM in parallel during translation phase.][43506]
  This should reduce peak memory usage.

Libraries
---------
- [Generate builtin impls for `Clone` for all arrays and tuples that
  are `T: Clone`][43690]
- [`Stdin`, `Stdout`, and `Stderr` now implement `AsRawFd`.][43459]
- [`Rc` and `Arc` now implement `From<&[T]> where T: Clone`, `From<str>`,
  `From<String>`, `From<Box<T>> where T: ?Sized`, and `From<Vec<T>>`.][42565]

Stabilized APIs
---------------

[`std::mem::discriminant`]

Cargo
-----
- [You can now call `cargo install` with multiple package names][cargo/4216]
- [Cargo commands inside a virtual workspace will now implicitly
  pass `--all`][cargo/4335]
- [Added a `[patch]` section to `Cargo.toml` to handle
  prepublication dependencies][cargo/4123] [RFC 1969]
- [`include` & `exclude` fields in `Cargo.toml` now accept gitignore
  like patterns][cargo/4270]
- [Added the `--all-targets` option][cargo/4400]
- [Using required dependencies as a feature is now deprecated and emits
  a warning][cargo/4364]


Misc
----
- [Cargo docs are moving][43916]
  to [doc.rust-lang.org/cargo](https://doc.rust-lang.org/cargo)
- [The rustdoc book is now available][43863]
  at [doc.rust-lang.org/rustdoc](https://doc.rust-lang.org/rustdoc)
- [Added a preview of RLS has been made available through rustup][44204]
  Install with `rustup component add rls-preview`
- [`std::os` documentation for Unix, Linux, and Windows now appears on doc.rust-lang.org][43348]
  Previously only showed `std::os::unix`.

Compatibility Notes
-------------------
- [Changes in method matching against higher-ranked types][43880] This may cause
  breakage in subtyping corner cases. [A more in-depth explanation is available.][info/43880]
- [rustc's JSON error output's byte position start at top of file.][42973]
  Was previously relative to the rustc's internal `CodeMap` struct which
  required the unstable library `libsyntax` to correctly use.
- [`unused_results` lint no longer ignores booleans][43728]

[42565]: rust-lang/rust#42565
[42973]: rust-lang/rust#42973
[43348]: rust-lang/rust#43348
[43459]: rust-lang/rust#43459
[43506]: rust-lang/rust#43506
[43540]: rust-lang/rust#43540
[43690]: rust-lang/rust#43690
[43728]: rust-lang/rust#43728
[43838]: rust-lang/rust#43838
[43863]: rust-lang/rust#43863
[43880]: rust-lang/rust#43880
[43911]: rust-lang/rust#43911
[43916]: rust-lang/rust#43916
[43917]: rust-lang/rust#43917
[44204]: rust-lang/rust#44204
[cargo/4123]: rust-lang/cargo#4123
[cargo/4216]: rust-lang/cargo#4216
[cargo/4270]: rust-lang/cargo#4270
[cargo/4335]: rust-lang/cargo#4335
[cargo/4364]: rust-lang/cargo#4364
[cargo/4400]: rust-lang/cargo#4400
[RFC 1969]: rust-lang/rfcs#1969
[info/43880]: rust-lang/rust#44224 (comment)
[`std::mem::discriminant`]: https://doc.rust-lang.org/std/mem/fn.discriminant.html

@andjo403 andjo403 referenced this pull request Jan 9, 2018

Merged

fix faulty comment #47302

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 16, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 16, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.

kennytm added a commit to kennytm/rust that referenced this pull request Jan 17, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.

kennytm added a commit to kennytm/rust that referenced this pull request Jan 17, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.