Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for re-using monomorphizations in upstream crates. #48779

Merged
merged 16 commits into from Apr 6, 2018

Conversation

Projects
None yet
@michaelwoerister
Copy link
Contributor

michaelwoerister commented Mar 6, 2018

Followup to #48611. This implementation is pretty much finished modulo failing tests if there are any. Not quite ready for review yet though.

DESCRIPTION

This PR introduces a share-generics mode for RLIBs and Rust dylibs. When a crate is compiled in this mode, two things will happen:

  • before instantiating a monomorphization in the current crate, the compiler will look for that monomorphization in all upstream crates and link to it, if possible.
  • monomorphizations are not internalized during partitioning. Instead they are added to the list of symbols exported from the crate.

This results in less code being translated and LLVMed. However, there are also downsides:

  • it will impede optimization somewhat, since fewer functions can be internalized, and
  • Rust dylibs will have bigger symbol tables since they'll also export monomorphizations.

Consequently, this PR only enables the shared-generics mode for opt-levels No, Less, Size, and MinSize, and for when incremental compilation is activated. -O2 and -O3 will still generate generic functions per-crate.

Another thing to note is that this has a somewhat similar effect as MIR-only RLIBs, in that monomorphizations are shared, but it is less effective because it cannot share monomorphizations between sibling crates:

         A        <--- defines `fn foo<T>() { .. }`
       /   \
      /     \
     B       C    <--- both call `foo<u32>()`
      \     /
       \   /
         D        <--- calls `foo<u32>()` too

With share-generics, both B and C have to instantiate foo<u32> and only D can re-use it (from either B or C). With MIR-only RLIBs, B and C would not instantiate anything, and in D we would then only instantiate foo<u32> once.
On the other hand, when there are many leaf crates in the graph (e.g. when compiling many individual test binaries) then the share-generics approach will often be more effective.

TODO

  • Add codegen test that makes sure monomorphizations can be internalized in non-Rust binaries.
  • Add codegen-units test that makes sure we share generics.
  • Add run-make test that makes sure we don't export any monomorphizations from non-Rust binaries.
  • Review for reproducible-builds implications.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Mar 6, 2018

r? @estebank

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 6, 2018

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 6, 2018

⌛️ Trying commit 35c9b1f with merge 55221f5...

bors added a commit that referenced this pull request Mar 6, 2018

Auto merge of #48779 - michaelwoerister:share-generics4, r=<try>
WIP: Allow for re-using monomorphizations in upstream crates.

Followup to #48611. This implementation is pretty much finished modulo failing tests if there are any. Not quite ready for review yet though.

### TODO
 - [ ] Write tests.
 - [ ] Review for reproducible-builds implications.
 - [ ] Cleanup `is_translated_item` query.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 6, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 6, 2018

@Mark-Simulacrum, could you do a perf run for this too, please?

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Mar 7, 2018

Perf queued. Probably about 40-45 minutes until it starts.

@michaelwoerister

This comment has been minimized.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 7, 2018

@Mark-Simulacrum, the results don't seem to be available yet. Is it still in the queue or has something gone wrong?

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:share-generics4 branch from 9a1af56 to 710e4d6 Mar 7, 2018

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Mar 7, 2018

Hm, it does look like something went wrong -- I've restarted the build.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 7, 2018

Will the link be the same?

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Mar 7, 2018

Hm, it failed again -- I'm going to try and keep an eye on it and hopefully diagnose why, it also turns out we weren't properly logging the failures for try builds previously so I've now corrected that as well.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Mar 7, 2018

URL works now!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 8, 2018

Thanks, Mark!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 8, 2018

OK, so those numbers look good. I had hoped that they would be even better though. It looks like it's mostly small functions that get re-used. But yeah, -15.9% for tokio-web-push, I'll take it :)

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 8, 2018

@rust-lang/compiler & @alexcrichton, do you have any objections to pursuing this further? There's a description at the top and performance numbers are here: http://perf.rust-lang.org/compare.html?start=6f2100b92cb14fbea2102701af6a3ac5814bd06c&end=55221f5e2d6d5c71f6b89674eb29f2a213f415ef&stat=instructions%3Au

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 8, 2018

Awesome work here @michaelwoerister! It's pretty neat how it's not to difficult to play around with various schemes like this these days :)

One concern I might have here is the size of binaries but given that this only affects debug mode rather than optimized then I guess it doesn't matter too much? We rely on -ffunction-sections to pretty aggressively prune dead code but if the symbols are all exported then I think even for binaries they'll stick around?

In general though seems like a great idea to me to keep pursuing, any bugs or surprises along the way we can probably smooth over!

For the diamond problem you gisted above, is this what all that "link once ODR" stuff is for in LLVM? I feel like that's all basically intended for optimized binaries linking only one copy rather than for debug mode, so it may not benefit us much if we don't turn this on in optimized mode. Speaking of optimized mode though, we may actually be able to get some nice wins here with available_externally in LLVM. That'd allow optimized mode to inline all the upstream copies but we get to completely skip codegen for them. Perhaps something to pursue down the road!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 8, 2018

We rely on -ffunction-sections to pretty aggressively prune dead code but if the symbols are all exported then I think even for binaries they'll stick around?

The monomorphizations are still assigned SymbolExportLevel::Rust, so for executables, cdylibs, and staticlibs, the linker script should hide them. I'll better add a test for that though.

We could look into LinkOnceODR but I remember it causing problems for MingGW. It would only reduce the size of linked artifacts, not compile times though, I think, as we'd still have to translate and optimize twice in the example above. Otoh, it might make symbol naming a bit less complicated.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 8, 2018

Also, thanks for the feedback, @alexcrichton! :)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 8, 2018

Oh right that's true, I'd sort of doubt that LinkOnceODR is portable enough for us to effectively leverage it...

I think that for executables we don't currently pass linker scripts/symbol whitelists, but AFAIK that's because we just never have before. We could likely start now!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Mar 8, 2018

I think that for executables we don't currently pass linker scripts/symbol whitelists

OK, I'll make sure we do as part of the PR.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 8, 2018

We discussed this in the @rust-lang/compiler meeting today. Everybody felt pretty good about it. It'd be nice to land this and possibly do further experimentation to see if we can enable in optimized builds without hurting perf.

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:share-generics4 branch 2 times, most recently from be1e8f6 to d4264dc Mar 13, 2018

@michaelwoerister michaelwoerister changed the title WIP: Allow for re-using monomorphizations in upstream crates. Allow for re-using monomorphizations in upstream crates. Mar 13, 2018

@michaelwoerister michaelwoerister force-pushed the michaelwoerister:share-generics4 branch from 679ba55 to 61991a5 Apr 6, 2018

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Apr 6, 2018

@bors r=alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 6, 2018

📌 Commit 61991a5 has been approved by alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 6, 2018

⌛️ Testing commit 61991a5 with merge 7678d50...

bors added a commit that referenced this pull request Apr 6, 2018

Auto merge of #48779 - michaelwoerister:share-generics4, r=alexcrichton
Allow for re-using monomorphizations in upstream crates.

Followup to #48611. This implementation is pretty much finished modulo failing tests if there are any. Not quite ready for review yet though.

### DESCRIPTION

This PR introduces a `share-generics` mode for RLIBs and Rust dylibs. When a crate is compiled in this mode, two things will happen:
- before instantiating a monomorphization in the current crate, the compiler will look for that monomorphization in all upstream crates and link to it, if possible.
- monomorphizations are not internalized during partitioning. Instead they are added to the list of symbols exported from the crate.

This results in less code being translated and LLVMed. However, there are also downsides:
- it will impede optimization somewhat, since fewer functions can be internalized, and
- Rust dylibs will have bigger symbol tables since they'll also export monomorphizations.

Consequently, this PR only enables the `shared-generics` mode for opt-levels `No`, `Less`, `Size`, and `MinSize`, and for when incremental compilation is activated. `-O2` and `-O3` will still generate generic functions per-crate.

Another thing to note is that this has a somewhat similar effect as MIR-only RLIBs, in that monomorphizations are shared, but it is less effective because it cannot share monomorphizations between sibling crates:

```
         A        <--- defines `fn foo<T>() { .. }`
       /   \
      /     \
     B       C    <--- both call `foo<u32>()`
      \     /
       \   /
         D        <--- calls `foo<u32>()` too
```

With `share-generics`, both `B` and `C` have to instantiate `foo<u32>` and only `D` can re-use it (from either `B` or `C`). With MIR-only RLIBs, `B` and `C` would not instantiate anything, and in `D` we would then only instantiate `foo<u32>` once.
On the other hand, when there are many leaf crates in the graph (e.g. when compiling many individual test binaries) then the `share-generics` approach will often be more effective.

### TODO
 - [x] Add codegen test that makes sure monomorphizations can be internalized in non-Rust binaries.
 - [x] Add codegen-units test that makes sure we share generics.
 - [x] Add run-make test that makes sure we don't export any monomorphizations from non-Rust binaries.
 - [x] Review for reproducible-builds implications.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 6, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 7678d50 to master...

@bors bors merged commit 61991a5 into rust-lang:master Apr 6, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.