New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a target option "merge-functions", and a corresponding -Z flag (works around #57356) #57268

Merged
merged 1 commit into from Jan 19, 2019

Conversation

Projects
None yet
10 participants
@peterhj
Copy link
Contributor

peterhj commented Jan 2, 2019

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around #57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and extern "ptx-kernel" functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler ptxas). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

Examples/more details

Consider an example Rust lib using extern "ptx-kernel" functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions foo and bar have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with ptxas (I'm on the CUDA 9.2 toolchain), we get an assembler error:

ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the bar function to call foo. However, directly calling an extern "ptx-kernel" function from another extern "ptx-kernel" is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1]

self.merge_functions = sess.opts.optimize == config::OptLevel::Default ||

[2]
add("-mergefunc-use-aliases");

[3] #56358
[4] #49479

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 2, 2019

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @zackmdavis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@peterhj

This comment has been minimized.

Copy link
Contributor

peterhj commented Jan 2, 2019

@peterhj peterhj force-pushed the peterhj:peterhj-optmergefunc branch from 7e053a8 to 4aba371 Jan 2, 2019

@gnzlbg

gnzlbg approved these changes Jan 2, 2019

Copy link
Contributor

gnzlbg left a comment

LGTM.

/// Whether the MergeFunctions LLVM pass should run for this target.
/// The MergeFunctions pass is generally useful, but some targets may need
/// to opt out. Defaults to `true`.
pub merge_functions: bool

This comment has been minimized.

@nagisa

nagisa Jan 2, 2019

Contributor

The target option could probably be forward looking and allow specifying any of merge-functions = disabled, merge-functions = trampolines and merge-functions = aliases.

This comment has been minimized.

@peterhj

peterhj Jan 3, 2019

Contributor

The latest commit adds "merge-functions" options for "aliases", "trampolines", and "disabled", as well as a matching -Z flag.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 2, 2019

It would also be nice to gain a -Z flag that controls merge-functions. I wanted it at least once for tests & it seems that it is applicable more widely than I initially expected.

MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and extern "ptx-kernel" functions are currently not compatible with each other.

Does the NVPTX target, perchance, support the non-extern "ptx-kernel" functions at all? In that case this solution is not great as it would disable the optimisation (the trampoline version could still be used, right?) target-wide even for internal functions that use the compatible ABI.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 2, 2019

@bors: r+

This seems like a great start! We can always add more bells and whistles in the future too. I'd arguably say that this is an LLVM bug, but I'm fine fixing it on our end

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 2, 2019

📌 Commit 4aba371 has been approved by alexcrichton

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Jan 2, 2019

Yes, this is definitely an LLVM bug and it would be good to fix it there in the long term.

Would it be possible to do something similar to how weak functions are handled? I.e., when merging functions with the ptx-kernel CC, instead of simply calling one function from the other, we would instead create a new function (with default CC) and make both functions call the new function. Would NVPTX support this kind of scheme?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 2, 2019

Yes, this is definitely an LLVM bug

If we are landing this, then at least an LLVM issue should be filled & cross referenced to a corresponding Rust issue.

@peterhj

This comment has been minimized.

Copy link
Contributor

peterhj commented Jan 3, 2019

It would also be nice to gain a -Z flag that controls merge-functions. I wanted it at least once for tests & it seems that it is applicable more widely than I initially expected.

The latest commit will add a -Z merge-functions as well, and both the -Z flag and the target option have the same options ("disabled", "trampolines", "aliases"). The -Z flag has precedence over the target option.

Does the NVPTX target, perchance, support the non-extern "ptx-kernel" functions at all? In that case this solution is not great as it would disable the optimisation (the trampoline version could still be used, right?) target-wide even for internal functions that use the compatible ABI.
...
Would it be possible to do something similar to how weak functions are handled? I.e., when merging functions with the ptx-kernel CC, instead of simply calling one function from the other, we would instead create a new function (with default CC) and make both functions call the new function. Would NVPTX support this kind of scheme?

Non-extern "ptx-kernel" functions are "device" functions which can call each other and be called by extern "ptx-kernel" functions. My observation though is that the NVPTX backend prefers to inline those functions rather than emit calls, so there should not be much fallout on device functions when disabling MergeFunctions. It definitely warrants some investigation along with the general LLVM issue for MergeFunctions+NVPTX and the potential fix.

@peterhj peterhj changed the title Add a target option "merge-functions" Add a target option "merge-functions", and a corresponding -Z flag Jan 3, 2019

"aliases" => {
add("-mergefunc-use-aliases");
}
k => panic!("unknown merge-functions kind: {}", k),

This comment has been minimized.

@nagisa

nagisa Jan 3, 2019

Contributor

Use bug! instead of panic!.

This comment has been minimized.

@peterhj

peterhj Jan 3, 2019

Contributor

using a MergeFunctions enum now, so no more wildcard arm

@@ -1380,6 +1380,9 @@ options! {DebuggingOptions, DebuggingSetter, basic_debugging_options,
"whether to use the PLT when calling into shared libraries;
only has effect for PIC code on systems with ELF binaries
(default: PLT is disabled if full relro is enabled)"),
merge_functions: Option<String> = (None, parse_opt_string, [TRACKED],
"control the operation of the MergeFunctions LLVM pass, taking
the same values as the target option of the same name"),

This comment has been minimized.

@nagisa

nagisa Jan 3, 2019

Contributor

This is going to cause an ICE later when invalid values are provided as an argument, because argument parsing does not verify the valid values. Consider validating this in argument parsing.

fn parse_panic_strategy(slot: &mut Option<PanicStrategy>, v: Option<&str>) -> bool {
match v {
Some("unwind") => *slot = Some(PanicStrategy::Unwind),
Some("abort") => *slot = Some(PanicStrategy::Abort),
_ => return false
}
true
}

is a good example of how arguments should be handled.

This comment has been minimized.

@peterhj

peterhj Jan 3, 2019

Contributor

Fixed with a MergeFunctions enum

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 3, 2019

@bors r+ Thanks!

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 3, 2019

📌 Commit 3cd4013 has been approved by nagisa

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Jan 5, 2019

I've reported https://bugs.llvm.org/show_bug.cgi?id=40232 upstream to track this issue.

Add a target option "merge-functions" taking values in ("disabled",
"trampolines", or "aliases (the default)) to allow targets to opt out of
the MergeFunctions LLVM pass. Also add a corresponding -Z option with
the same name and values.

This works around: #57356

Motivation:

Basically, the problem is that the MergeFunctions pass, which rustc
currently enables by default at -O2 and -O3, and `extern "ptx-kernel"`
functions (specific to the NVPTX target) are currently not compatible
with each other. If the MergeFunctions pass is allowed to run, rustc can
generate invalid PTX assembly (i.e. a PTX file that is not accepted by
the native PTX assembler ptxas). Therefore we would like a way to opt
out of the MergeFunctions pass, which is what our target option does.

Related work:

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3,
and also to enable the use of function aliases within MergeFunctions.
MergeFunctions both with and without function aliases is incompatible with
the NVPTX target.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to
the MergeFunctions pass, but it is not enabled by default.

@peterhj peterhj force-pushed the peterhj:peterhj-optmergefunc branch from 3cd4013 to b91d211 Jan 5, 2019

@peterhj peterhj changed the title Add a target option "merge-functions", and a corresponding -Z flag Add a target option "merge-functions", and a corresponding -Z flag (works around #57356) Jan 5, 2019

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 15, 2019

@bors r=nagisa

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 15, 2019

📌 Commit b91d211 has been approved by nagisa

Centril added a commit to Centril/rust that referenced this pull request Jan 15, 2019

Rollup merge of rust-lang#57268 - peterhj:peterhj-optmergefunc, r=nagisa
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479

bors added a commit that referenced this pull request Jan 16, 2019

Auto merge of #57637 - Centril:rollup, r=Centril
Rollup of 6 pull requests

Successful merges:

 - #56884 (rustdoc: overhaul code block lexing errors)
 - #57065 (Optimize try_mark_green and eliminate the lock on dep node colors)
 - #57107 (Add a regression test for mutating a non-mut #[thread_local])
 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57551 (resolve: Add a test for issue #57539)
 - #57598 (Add missing unpretty option help message)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019

Rollup merge of rust-lang#57268 - peterhj:peterhj-optmergefunc, r=nagisa
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479

Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019

Rollup merge of rust-lang#57268 - peterhj:peterhj-optmergefunc, r=nagisa
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479

Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019

Rollup merge of rust-lang#57268 - peterhj:peterhj-optmergefunc, r=nagisa
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479

bors added a commit that referenced this pull request Jan 17, 2019

Auto merge of #57690 - Centril:rollup, r=Centril
Rollup of 18 pull requests

Successful merges:

 - #56594 (Remove confusing comment about ideally using `!` for `c_void`)
 - #56996 (Move spin_loop_hint to core::hint module)
 - #57065 (Optimize try_mark_green and eliminate the lock on dep node colors)
 - #57107 (Add a regression test for mutating a non-mut #[thread_local])
 - #57253 (Make privacy checking, intrinsic checking and liveness checking incremental)
 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57340 (Use correct tracking issue for c_variadic)
 - #57357 (Cleanup PartialEq docs.)
 - #57370 (Support passing cflags/cxxflags/ldflags to LLVM build)
 - #57501 (High priority resolutions for associated variants)
 - #57551 (resolve: Add a test for issue #57539)
 - #57610 (Fix nested `?` matchers)
 - #57635 (use structured macro and path resolve suggestions)
 - #57636 (Fix sources sidebar not showing up)
 - #57646 (Fixes text becoming invisible when element targetted)
 - #57654 (Add some links in std::fs.)
 - #57655 (OSX: fix #57534 registering thread dtors while running thread dtors)
 - #57659 (Fix release manifest generation)

Failed merges:

r? @ghost

bors added a commit that referenced this pull request Jan 17, 2019

Auto merge of #57690 - Centril:rollup, r=Centril
Rollup of 18 pull requests

Successful merges:

 - #56594 (Remove confusing comment about ideally using `!` for `c_void`)
 - #56996 (Move spin_loop_hint to core::hint module)
 - #57065 (Optimize try_mark_green and eliminate the lock on dep node colors)
 - #57107 (Add a regression test for mutating a non-mut #[thread_local])
 - #57253 (Make privacy checking, intrinsic checking and liveness checking incremental)
 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57340 (Use correct tracking issue for c_variadic)
 - #57357 (Cleanup PartialEq docs.)
 - #57370 (Support passing cflags/cxxflags/ldflags to LLVM build)
 - #57501 (High priority resolutions for associated variants)
 - #57551 (resolve: Add a test for issue #57539)
 - #57610 (Fix nested `?` matchers)
 - #57635 (use structured macro and path resolve suggestions)
 - #57636 (Fix sources sidebar not showing up)
 - #57646 (Fixes text becoming invisible when element targetted)
 - #57654 (Add some links in std::fs.)
 - #57655 (OSX: fix #57534 registering thread dtors while running thread dtors)
 - #57659 (Fix release manifest generation)

Failed merges:

r? @ghost

bors added a commit that referenced this pull request Jan 18, 2019

Auto merge of #57690 - Centril:rollup, r=Centril
Rollup of 18 pull requests

Successful merges:

 - #56594 (Remove confusing comment about ideally using `!` for `c_void`)
 - #56996 (Move spin_loop_hint to core::hint module)
 - #57065 (Optimize try_mark_green and eliminate the lock on dep node colors)
 - #57107 (Add a regression test for mutating a non-mut #[thread_local])
 - #57253 (Make privacy checking, intrinsic checking and liveness checking incremental)
 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57340 (Use correct tracking issue for c_variadic)
 - #57357 (Cleanup PartialEq docs.)
 - #57370 (Support passing cflags/cxxflags/ldflags to LLVM build)
 - #57501 (High priority resolutions for associated variants)
 - #57551 (resolve: Add a test for issue #57539)
 - #57610 (Fix nested `?` matchers)
 - #57635 (use structured macro and path resolve suggestions)
 - #57636 (Fix sources sidebar not showing up)
 - #57646 (Fixes text becoming invisible when element targetted)
 - #57654 (Add some links in std::fs.)
 - #57655 (OSX: fix #57534 registering thread dtors while running thread dtors)
 - #57659 (Fix release manifest generation)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Jan 19, 2019

Rollup merge of rust-lang#57268 - peterhj:peterhj-optmergefunc, r=nagisa
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479

bors added a commit that referenced this pull request Jan 19, 2019

Auto merge of #57752 - Centril:rollup, r=Centril
Rollup of 10 pull requests

Successful merges:

 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57476 (Move glob map use to query and get rid of CrateAnalysis)
 - #57501 (High priority resolutions for associated variants)
 - #57573 (Querify `entry_fn`)
 - #57610 (Fix nested `?` matchers)
 - #57634 (Remove an unused function argument)
 - #57653 (Make the contribution doc reference the guide more)
 - #57666 (Generalize `huge-enum.rs` test and expected stderr for more cross platform cases)
 - #57698 (Fix typo bug in DepGraph::try_mark_green().)
 - #57746 (Update README.md)

Failed merges:

r? @ghost
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 19, 2019

⌛️ Testing commit b91d211 with merge c87144f...

@bors bors merged commit b91d211 into rust-lang:master Jan 19, 2019

1 of 2 checks passed

homu Testing commit b91d211b40300a3c026b330e50a6e3e19d71351c with merge c87144f3caf9a1580e8734d4d1604e723a5bd6e6...
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment