Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenable the MergeFunctions pass #49479

Merged
merged 2 commits into from
May 16, 2018
Merged

Reenable the MergeFunctions pass #49479

merged 2 commits into from
May 16, 2018

Conversation

nox
Copy link
Contributor

@nox nox commented Mar 29, 2018

The crash that happened in #23566 doesn't happen anymore with the LLVM mergefunc
pass enabled and it hugely reduces code size (for example it shaves off 10% of the
final Servo executable). This patch reenables it.

For those wondering, here are the docs from LLVM about this pass.

@nox
Copy link
Contributor Author

nox commented Mar 29, 2018

…Aaaand I forgot the obvious: @rust-lang/wg-codegen

@@ -431,7 +431,7 @@ extern "C" void LLVMRustConfigurePassManagerBuilder(
bool MergeFunctions, bool SLPVectorize, bool LoopVectorize,
const char* PGOGenPath, const char* PGOUsePath) {
// Ignore mergefunc for now as enabling it causes crashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this outdated comment, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes… I'll wait for the try to finish and do the amend.

@nagisa
Copy link
Member

nagisa commented Mar 29, 2018 via email

@nagisa
Copy link
Member

nagisa commented Mar 29, 2018 via email

@nagisa
Copy link
Member

nagisa commented Mar 29, 2018

cc @rust-lang/infra for the crater run. I know there’s a bot for this, but no idea what the bot is :)

@kennytm
Copy link
Member

kennytm commented Mar 29, 2018

@bors try

There's no bot running yet, all manual 🙃

@bors
Copy link
Contributor

bors commented Mar 29, 2018

⌛ Trying commit 84135bf862a958722fac2aefffd595f1fa62943b with merge 0dc6a348a2de3433c52166e654ee49d9aef2ca54...

@kennytm kennytm added the S-waiting-on-crater Status: Waiting on a crater run to be completed. label Mar 29, 2018
@bors
Copy link
Contributor

bors commented Mar 29, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nox
Copy link
Contributor Author

nox commented Mar 29, 2018

I removed the obsolete comment.

@aidanhs
Copy link
Member

aidanhs commented Mar 31, 2018

Crater run started.

@aidanhs
Copy link
Member

aidanhs commented Apr 8, 2018

(apologies for the delay, I fell ill last week)

Hi @nagisa (crater requester), @alexcrichton (PR reviewer)! Crater results are at: http://cargobomb-reports.s3.amazonaws.com/pr-49479/index.html. 'Blacklisted' crates (spurious failures etc) can be found here. If you see any spurious failures not on the list, please make a PR against that file.

(interested observers: Crater is a tool for testing the impact of changes on the crates.io ecosystem. You can find out more at the repo if you're curious)

@nox
Copy link
Contributor Author

nox commented Apr 8, 2018

@nagisa None of those regressions seem related to the PR. Am I wrong about that?

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Apr 8, 2018
@nox
Copy link
Contributor Author

nox commented Apr 8, 2018

Actually as @aidanhs points out on IRC, the addr2line 0.5.0 failure is probably related:

Apr 01 16:15:13.014 INFO blam! ---- identity_map stdout ----
Apr 01 16:15:13.014 INFO blam! 	we missed 0x00254c40: (Some("dwarf.c"), None)
Apr 01 16:15:13.014 INFO blam! we missed 0x0024de20: (Some("elf.c"), None)
Apr 01 16:15:13.014 INFO blam! we missed 0x00018320: (Some("crtstuff.c"), None)
Apr 01 16:15:13.014 INFO blam! we missed 0x00252db0: (Some("dwarf.c"), None)
Apr 01 16:15:13.014 INFO blam! thread 'identity_map' panicked at 'assertion failed: `(left == right)`
Apr 01 16:15:13.014 INFO blam!   left: `(Some("/checkout/src/liballoc_jemalloc/../jemalloc/src/prof.c"), Some(2164))`,
Apr 01 16:15:13.014 INFO blam!  right: `(Some("/checkout/src/liballoc_jemalloc/../jemalloc/src/prof.c"), Some(1984))`', tests/integration_test.rs:61:13

That test checks some symbols from the test executable itself, which makes me believe that the discrepancy with the system addr2line tool comes from MergeFunctions doing its job.

Given the executable manages to run (barring the test failure), this can mean various things:

  • MergeFunctions is breaking the executable in a way that the test manages to run and fail;
  • we didn't account for MergeFunctions doing its job in debuginfo stuff;
  • addr2line 0.5.0 has a bug and MergeFunctions running made it apparent;
  • none of the above and I'm out of clue.

@nox
Copy link
Contributor Author

nox commented Apr 8, 2018

Also tasks-framework 0.1.0:

Apr 04 07:47:17.638 INFO blam! test actor_runner_schedule_with_a_primitive_actor_and_a_threadpool_executes_correctly ... FAILED
Apr 04 07:47:17.638 INFO blam! 
Apr 04 07:47:17.638 INFO blam! failures:
Apr 04 07:47:17.638 INFO blam! 
Apr 04 07:47:17.638 INFO blam! ---- actor_runner_schedule_with_a_primitive_actor_and_a_threadpool_executes_correctly stdout ----
Apr 04 07:47:17.638 INFO blam! 	thread 'actor_runner_schedule_with_a_primitive_actor_and_a_threadpool_executes_correctly' panicked at 'assertion failed: `(left == right)`
Apr 04 07:47:17.638 INFO blam!   left: `9995`,
Apr 04 07:47:17.638 INFO blam!  right: `10000`', tests/single_queue_actor.rs:47:5

I'll try to reproduce that locally, though I'm not on Linux.

@nagisa
Copy link
Member

nagisa commented Apr 8, 2018

The glib-0.5.0, nanopow-rs-0.3.3, simple-signal-1.1.1 timeouts seem scary.

pushover-0.3.0 seems to be in the same bucket as tasks-framework in that it seems to return a wrong result?

@nox
Copy link
Contributor Author

nox commented Apr 8, 2018

@nagisa I guess I can't read cargobomb reports correctly, hah! I'll run all of that on macOS and see what I can reproduce during next week.

@aidanhs
Copy link
Member

aidanhs commented Apr 11, 2018

I've seen glib and nanopow timeouts in recent memory on other crater runs so wouldn't worry about those.

@pietroalbini
Copy link
Member

Ping from triage! The crater run on this is now complete, how should we move forward?

@nagisa
Copy link
Member

nagisa commented Apr 16, 2018

This is pending review of issues by @nox. I believe they couldn’t reproduce issues on OS X and were going to try to reproduce on a Linux box.

@Eijebong
Copy link
Contributor

So I tried reproducing those failures on a linux machine.

nightly mergefunc comment
addr2line X X Ran into #49942
borrow-bag V V
glib V V
nanopow V V
net-utils V V
pushover X X Fails on nightly with the same failure as the crater one
serenity V V
simple-signal V V
rust-tasks V V
verilog V V

X: Build or tests failed
V: Tests passed

I also tried the build on some unreleased projects I have, it brought the size down from 2.3MB to 1.5MB 😍

@nox
Copy link
Contributor Author

nox commented Apr 22, 2018

So… LGTM?

@nox
Copy link
Contributor Author

nox commented May 15, 2018

I amended the second commit with an #if LLVM_RUSTLLVM check.

@nagisa
Copy link
Member

nagisa commented May 15, 2018

@bors r+

@bors
Copy link
Contributor

bors commented May 15, 2018

📌 Commit 5701779 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 15, 2018
@bors
Copy link
Contributor

bors commented May 16, 2018

⌛ Testing commit 5701779 with merge b92f6a99452ee0857942658b632fd2e6e351d49b...

@bors
Copy link
Contributor

bors commented May 16, 2018

💔 Test failed - status-travis

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 16, 2018
@rust-highfive
Copy link
Collaborator

The job i686-gnu of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:15:11] [ 89%] Building CXX object tools/lto/CMakeFiles/LTO.dir/LTODisassembler.cpp.o
[00:15:11] [ 89%] Building CXX object tools/lto/CMakeFiles/LTO.dir/lto.cpp.o
[00:15:11] [ 89%] Linking CXX executable ../../bin/llvm-config
[00:15:12] [ 89%] Linking CXX executable ../../bin/llvm-ar
No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

1 similar comment
@rust-highfive
Copy link
Collaborator

The job i686-gnu of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:15:11] [ 89%] Building CXX object tools/lto/CMakeFiles/LTO.dir/LTODisassembler.cpp.o
[00:15:11] [ 89%] Building CXX object tools/lto/CMakeFiles/LTO.dir/lto.cpp.o
[00:15:11] [ 89%] Linking CXX executable ../../bin/llvm-config
[00:15:12] [ 89%] Linking CXX executable ../../bin/llvm-ar
No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@pietroalbini
Copy link
Member

Timeout building LLVM.

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 16, 2018
@kennytm
Copy link
Member

kennytm commented May 16, 2018

@bors p=12

@bors
Copy link
Contributor

bors commented May 16, 2018

⌛ Testing commit 5701779 with merge e1151c9...

bors added a commit that referenced this pull request May 16, 2018
Reenable the MergeFunctions pass

The crash that happened in #23566 doesn't happen anymore with the LLVM mergefunc
pass enabled and it hugely reduces code size (for example it shaves off 10% of the
final Servo executable). This patch reenables it.

For those wondering, [here are the docs from LLVM about this pass](http://llvm.org/docs/MergeFunctions.html).
@bors
Copy link
Contributor

bors commented May 16, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing e1151c9 to master...

@bors bors merged commit 5701779 into rust-lang:master May 16, 2018
@nikic
Copy link
Contributor

nikic commented May 18, 2018

FTR llvm-mirror/llvm@be07815 and llvm-mirror/llvm@63bcfe3 will be part of LLVM 6.0.1, at which point this could work with system llvm as well.

Centril added a commit to Centril/rust that referenced this pull request Jan 15, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 19, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet