New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for disabling PLT for better function call performance #54592

Merged
merged 1 commit into from Oct 11, 2018

Conversation

@GabrielMajeri
Contributor

GabrielMajeri commented Sep 26, 2018

This PR gives rustc the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

This is a little known feature which is supported by GCC and Clang as -fno-plt (some Linux distros enable it by default for all builds).

Implementation inspired by this patch which adds -fno-plt support to Clang.

Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a clap benchmark:

 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup 
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03 
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03 
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02 
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03 
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03 
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02 
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02 

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. This comment suggests that, in some cases, -fno-plt could improve PIC/PIE code performance by 10%.

Security benefits

Bonus: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for retpoline.

Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with CFLAGS=-fno-plt CXXFLAGS=-fno-plt removes them.

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Sep 26, 2018

Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Sep 26, 2018

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@GabrielMajeri GabrielMajeri changed the title from Support for disabling PLT for better function call performance to [WIP] Support for disabling PLT for better function call performance Sep 26, 2018

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Sep 26, 2018

Contributor

cc @rust-lang/compiler — I'm not expert on this, but based on the description, seems like a "no brainer". Is there a catch?

Contributor

nikomatsakis commented Sep 26, 2018

cc @rust-lang/compiler — I'm not expert on this, but based on the description, seems like a "no brainer". Is there a catch?

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Sep 26, 2018

Contributor

@rfcbot fcp merge

I move that we merge this PR. As I wrote before, I'm not an expert on this stuff; the fact though that some distros enable the flag by default suggests we might as well do it. I'm curious whether anyone knows of any downsides or reasons not to do it.

Contributor

nikomatsakis commented Sep 26, 2018

@rfcbot fcp merge

I move that we merge this PR. As I wrote before, I'm not an expert on this stuff; the fact though that some distros enable the flag by default suggests we might as well do it. I'm curious whether anyone knows of any downsides or reasons not to do it.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Sep 26, 2018

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented Sep 26, 2018

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Sep 26, 2018

Contributor

cc @cuviper — seems like something you might know about :)

Contributor

nikomatsakis commented Sep 26, 2018

cc @cuviper — seems like something you might know about :)

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb
Member

eddyb commented Sep 26, 2018

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 26, 2018

Member

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

Note that PPC64 is only defaulting to partial relro due to an old ld.so bug in bind-now.
https://github.com/rust-lang/rust/pull/43170/files#diff-b2d51315427bd679ca33d47167e82171R20

There's also an option for -Z relro-level={full,partial,off}. I'll try to see if similar PPC64 issues arise with -fno-plt, but my initial feeling is that we should only enable this in conjunction with relro-level=full.

Member

cuviper commented Sep 26, 2018

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

Note that PPC64 is only defaulting to partial relro due to an old ld.so bug in bind-now.
https://github.com/rust-lang/rust/pull/43170/files#diff-b2d51315427bd679ca33d47167e82171R20

There's also an option for -Z relro-level={full,partial,off}. I'll try to see if similar PPC64 issues arise with -fno-plt, but my initial feeling is that we should only enable this in conjunction with relro-level=full.

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Sep 26, 2018

Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:55:36] ....................................................................................................
[00:55:39] ..............................................................i.....................................
[00:55:42] ....................................................................................................
[00:55:45] ....................................................................................................
[00:55:48] ...........iiiiiiiii................................................................................
[00:55:53] ....................................................................................................
[00:55:57] ...............................................................................................i....
[00:56:00] ....................................................................................................
[00:56:03] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:04:25] 
[01:04:25] running 107 tests
[01:04:28] i..ii...iii....i...i............iii...........i....Fi....ii...i.i.ii..............i...ii..ii.i....ii
[01:04:28] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:04:28] failures:
[01:04:28] 
[01:04:28] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:04:28] 
[01:04:28] 
[01:04:28] error: verification with 'FileCheck' failed
[01:04:28] status: exit code: 1
[01:04:28] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] stderr:
[01:04:28] stderr:
[01:04:28] ------------------------------------------
[01:04:28] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:04:28] // CHECK: Function Attrs: naked uwtable
[01:04:28]           ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:04:28] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:04:28] ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:04:28] ; Function Attrs: naked nonlazybind uwtable
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:04:28] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:04:28] 
[01:04:28] 
[01:04:28] 
[01:04:28] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:04:28] 
[01:04:28] 
[01:04:28] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Makefile:58: recipe for target 'check' failed
[01:04:28] make: *** [check] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0d3ba4a0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:0204f06f:start=1537985051202083688,finish=1537985051358059111,duration=155975423
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1243b65a
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:23d227fe
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Collaborator

rust-highfive commented Sep 26, 2018

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:55:36] ....................................................................................................
[00:55:39] ..............................................................i.....................................
[00:55:42] ....................................................................................................
[00:55:45] ....................................................................................................
[00:55:48] ...........iiiiiiiii................................................................................
[00:55:53] ....................................................................................................
[00:55:57] ...............................................................................................i....
[00:56:00] ....................................................................................................
[00:56:03] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:04:25] 
[01:04:25] running 107 tests
[01:04:28] i..ii...iii....i...i............iii...........i....Fi....ii...i.i.ii..............i...ii..ii.i....ii
[01:04:28] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:04:28] failures:
[01:04:28] 
[01:04:28] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:04:28] 
[01:04:28] 
[01:04:28] error: verification with 'FileCheck' failed
[01:04:28] status: exit code: 1
[01:04:28] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] stderr:
[01:04:28] stderr:
[01:04:28] ------------------------------------------
[01:04:28] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:04:28] // CHECK: Function Attrs: naked uwtable
[01:04:28]           ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:04:28] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:04:28] ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:04:28] ; Function Attrs: naked nonlazybind uwtable
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:04:28] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:04:28] 
[01:04:28] 
[01:04:28] 
[01:04:28] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:04:28] 
[01:04:28] 
[01:04:28] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Makefile:58: recipe for target 'check' failed
[01:04:28] make: *** [check] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0d3ba4a0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:0204f06f:start=1537985051202083688,finish=1537985051358059111,duration=155975423
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1243b65a
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:23d227fe
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@GabrielMajeri

This comment has been minimized.

Show comment
Hide comment
@GabrielMajeri

GabrielMajeri Sep 26, 2018

Contributor

I'm not sure if this is the right place in the codegen to enable this attribute, or if we'd better enable it somewhere else.

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

Contributor

GabrielMajeri commented Sep 26, 2018

I'm not sure if this is the right place in the codegen to enable this attribute, or if we'd better enable it somewhere else.

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Sep 26, 2018

Contributor

Please add a flag that controls this behaviour. For now it can be a debug -Z flag, similar to other such flags (e.g. -Zmutable-noalias).

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

CHECK lines only check for matching line prefix. That test in particular seems to be testing for naked only, in which case you can probably remove the other attribute from the CHECK line to have it pass. Alternatively, you might have some success with pattern matching syntax.

Contributor

nagisa commented Sep 26, 2018

Please add a flag that controls this behaviour. For now it can be a debug -Z flag, similar to other such flags (e.g. -Zmutable-noalias).

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

CHECK lines only check for matching line prefix. That test in particular seems to be testing for naked only, in which case you can probably remove the other attribute from the CHECK line to have it pass. Alternatively, you might have some success with pattern matching syntax.

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Sep 26, 2018

Contributor

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

A large number of @PLT symbols likely come from outside the rust ecosystem (e.g. glibc, llvm, etc.). Those might need to be taken care of independently (by changing build system configuration, perhaps?). You might want to submit a similar patch to the cc crate.


(Addressed not to author, but somebody who knows how to do perf runs) I also think a perf run would be great, but not sure how to start it.

Contributor

nagisa commented Sep 26, 2018

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

A large number of @PLT symbols likely come from outside the rust ecosystem (e.g. glibc, llvm, etc.). Those might need to be taken care of independently (by changing build system configuration, perhaps?). You might want to submit a similar patch to the cc crate.


(Addressed not to author, but somebody who knows how to do perf runs) I also think a perf run would be great, but not sure how to start it.

@varkor

This comment has been minimized.

Show comment
Hide comment
@varkor

varkor Sep 26, 2018

Contributor

@bors try

Contributor

varkor commented Sep 26, 2018

@bors try

bors added a commit that referenced this pull request Sep 26, 2018

Auto merge of #54592 - GabrielMajeri:no-plt, r=<try>
[WIP] Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## To do

- [ ] Do a perf run to see the effect this has on the compiler (cc @michaelwoerister),
  and possibly run benchmarks on some more crates

- [ ] Add a code gen test

- [ ] Should this be always enabled or should it be behind a command line option?
  If so, what should it be called? `-Z no-plt`? `-Z plt=no`?
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 26, 2018

Contributor

⌛️ Trying commit ddf98c1 with merge 5747631...

Contributor

bors commented Sep 26, 2018

⌛️ Trying commit ddf98c1 with merge 5747631...

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Sep 26, 2018

Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:58:24] ....................................................................................................
[00:58:27] ..............................................................i.....................................
[00:58:30] ....................................................................................................
[00:58:33] ....................................................................................................
[00:58:36] ............iiiiiiiii...............................................................................
[00:58:42] ....................................................................................................
[00:58:46] ...............................................................................................i....
[00:58:49] ....................................................................................................
[00:58:52] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:07:32] 
[01:07:32] running 107 tests
[01:07:35] i..ii...iii....i...i............iii...........i.....iF...ii...i.i.ii..............i...ii..ii.i....ii
[01:07:35] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:07:35] failures:
[01:07:35] 
[01:07:35] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:07:35] 
[01:07:35] 
[01:07:35] error: verification with 'FileCheck' failed
[01:07:35] status: exit code: 1
[01:07:35] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] stderr:
[01:07:35] stderr:
[01:07:35] ------------------------------------------
[01:07:35] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:07:35] // CHECK: Function Attrs: naked uwtable
[01:07:35]           ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:07:35] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:07:35] ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:07:35] ; Function Attrs: naked nonlazybind uwtable
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:07:35] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:07:35] 
[01:07:35] 
[01:07:35] 
[01:07:35] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:07:35] 
[01:07:35] 
[01:07:35] failed to run: /checkout/obj/build/756 ./src/tools/lldb/www
37080 ./obj/build/x86_64-unknown-linux-gnu/stage0-std/release
---
travis_time:end:0c6f72ba:start=1537990447136677365,finish=1537990447141035866,duration=4358501
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:017849be
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then print

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Collaborator

rust-highfive commented Sep 26, 2018

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:58:24] ....................................................................................................
[00:58:27] ..............................................................i.....................................
[00:58:30] ....................................................................................................
[00:58:33] ....................................................................................................
[00:58:36] ............iiiiiiiii...............................................................................
[00:58:42] ....................................................................................................
[00:58:46] ...............................................................................................i....
[00:58:49] ....................................................................................................
[00:58:52] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:07:32] 
[01:07:32] running 107 tests
[01:07:35] i..ii...iii....i...i............iii...........i.....iF...ii...i.i.ii..............i...ii..ii.i....ii
[01:07:35] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:07:35] failures:
[01:07:35] 
[01:07:35] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:07:35] 
[01:07:35] 
[01:07:35] error: verification with 'FileCheck' failed
[01:07:35] status: exit code: 1
[01:07:35] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] stderr:
[01:07:35] stderr:
[01:07:35] ------------------------------------------
[01:07:35] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:07:35] // CHECK: Function Attrs: naked uwtable
[01:07:35]           ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:07:35] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:07:35] ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:07:35] ; Function Attrs: naked nonlazybind uwtable
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:07:35] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:07:35] 
[01:07:35] 
[01:07:35] 
[01:07:35] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:07:35] 
[01:07:35] 
[01:07:35] failed to run: /checkout/obj/build/756 ./src/tools/lldb/www
37080 ./obj/build/x86_64-unknown-linux-gnu/stage0-std/release
---
travis_time:end:0c6f72ba:start=1537990447136677365,finish=1537990447141035866,duration=4358501
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:017849be
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then print

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 26, 2018

Contributor

☀️ Test successful - status-travis
State: approved= try=True

Contributor

bors commented Sep 26, 2018

☀️ Test successful - status-travis
State: approved= try=True

@varkor

This comment has been minimized.

Show comment
Hide comment
@varkor
Contributor

varkor commented Sep 26, 2018

@rust-timer

This comment has been minimized.

Show comment
Hide comment
@rust-timer

rust-timer Sep 26, 2018

Please provide the full 40 character commit hash.

rust-timer commented Sep 26, 2018

Please provide the full 40 character commit hash.

@rust-timer

This comment has been minimized.

Show comment
Hide comment
@rust-timer

rust-timer Sep 26, 2018

Success: Queued 5747631 with parent 6846f22, comparison URL.

rust-timer commented Sep 26, 2018

Success: Queued 5747631 with parent 6846f22, comparison URL.

@GabrielMajeri

This comment has been minimized.

Show comment
Hide comment
@GabrielMajeri

GabrielMajeri Sep 27, 2018

Contributor

Perf results are in, nice improvements on wall time. From what I've seen, the patch currently only removes about 20% of the total PLT calls, there's probably still some more performance to be gained.

@nagisa

Please add a flag that controls this behaviour.

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

symbols likely come from outside the rust ecosystem

Thanks for the tip, but I'm still unable to get rid of the PLT. I've added CFLAGS=-fno-plt to my system, and rebuilt the compiler from source, but rustc still generates lots of calls which use the PLT.

If I build C binaries on my system, the final binary doesn't even have a .plt section, it is completly removed.

EDIT: it seems we need to set some module-level metadata to ensure this also works for intrinsics.

Contributor

GabrielMajeri commented Sep 27, 2018

Perf results are in, nice improvements on wall time. From what I've seen, the patch currently only removes about 20% of the total PLT calls, there's probably still some more performance to be gained.

@nagisa

Please add a flag that controls this behaviour.

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

symbols likely come from outside the rust ecosystem

Thanks for the tip, but I'm still unable to get rid of the PLT. I've added CFLAGS=-fno-plt to my system, and rebuilt the compiler from source, but rustc still generates lots of calls which use the PLT.

If I build C binaries on my system, the final binary doesn't even have a .plt section, it is completly removed.

EDIT: it seems we need to set some module-level metadata to ensure this also works for intrinsics.

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Sep 27, 2018

Contributor
Contributor

nagisa commented Sep 27, 2018

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 27, 2018

Member

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

I think the way you've documented it is fine, "(default: PLT is disabled if full relro is enabled)".

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

Don't ignore it. These are advanced options -- if the user asks for plt=off without full relro, let them deal with the implications. So the check is something like plt.unwrap_or(relro != Full).

Member

cuviper commented Sep 27, 2018

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

I think the way you've documented it is fine, "(default: PLT is disabled if full relro is enabled)".

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

Don't ignore it. These are advanced options -- if the user asks for plt=off without full relro, let them deal with the implications. So the check is something like plt.unwrap_or(relro != Full).

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 10, 2018

Contributor

⌛️ Testing commit 7a190e0 with merge 7287f9d...

Contributor

bors commented Oct 10, 2018

⌛️ Testing commit 7a190e0 with merge 7287f9d...

bors added a commit that referenced this pull request Oct 10, 2018

Auto merge of #54592 - GabrielMajeri:no-plt, r=nagisa
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Security benefits

**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 10, 2018

Contributor

💔 Test failed - status-travis

Contributor

bors commented Oct 10, 2018

💔 Test failed - status-travis

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Oct 10, 2018

Collaborator

The job dist-various-2 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:53:50]    Compiling unwind v0.0.0 (/checkout/src/libunwind)
[00:53:50]    Compiling compiler_builtins v0.0.0 (/checkout/src/rustc/compiler_builtins_shim)
[00:53:50]    Compiling alloc_jemalloc v0.0.0 (/checkout/src/liballoc_jemalloc)
[00:54:14]    Compiling std v0.0.0 (/checkout/src/libstd)
[00:54:21] LLVM ERROR: Cannot select: 0x7fe8e97384e0: ch,glue = X86ISD::CALL 0x7fe8e9502958, 0x7fe8e95022d8, Register:i32 $edi, RegisterMask:Untyped, 0x7fe8e9502958:1, libcore/ptr.rs:1563:12
[00:54:21]   0x7fe8e95022d8: i32,ch = load<(load 4 from got)> 0x7fe8e9614758, 0x7fe8e9502680, undef:i32, libcore/ptr.rs:1563:12
[00:54:21]     0x7fe8e9502680: i32 = X86ISD::WrapperRIP TargetGlobalAddress:i32<void ({ [0 x i32], { [0 x i8]*, i32 }, [0 x i32], { [0 x i8]*, i32 }, [0 x i32], i32, [0 x i32], i32, [0 x i32] }*)* @_ZN4core9panicking5panic17he0a0446e9b6bf934E> 0 [TF=5], libcore/ptr.rs:1563:12
[00:54:21]       0x7fe8e9502138: i32 = TargetGlobalAddress<void ({ [0 x i32], { [0 x i8]*, i32 }, [0 x i32], { [0 x i8]*, i32 }, [0 x i32], i32, [0 x i32], i32, [0 x i32] }*)* @_ZN4core9panicking5panic17he0a0446e9b6bf934E> 0 [TF=5], libcore/ptr.rs:1563:12
[00:54:21]     0x7fe8e95020d0: i32 = undef
[00:54:21]   0x7fe8e95027b8: i32 = Register $edi
[00:54:21]   0x7fe8e95021a0: Untyped = RegisterMask
[00:54:21]   0x7fe8e9502958: ch,glue = CopyToReg 0x7fe8e95025b0, Register:i32 $edi, 0x7fe8e95026e8, libcore/ptr.rs:1563:12
[00:54:21]     0x7fe8e95027b8: i32 = Register $edi
[00:54:21]     0x7fe8e95026e8: i32 = X86ISD::WrapperRIP TargetGlobalAddress:i32<<{ i8*, [4 x i8], i8*, [12 x i8] }>* @anon.edb5cb93134b159a66f2de0e3c5659b0.2.llvm.10114106681850442039> 0, libcore/ptr.rs:1563:12
[00:54:21]       0x7fe8e9502af8: i32 = TargetGlobalAddress<<{ i8*, [4 x i8], i8*, [12 x i8] }>* @anon.edb5cb93134b159a66f2de0e3c5659b0.2.llvm.10114106681850442039> 0, libcore/ptr.rs:1563:12
[00:54:21] In function: _ZN4core3ptr33_$LT$impl$u20$$BP$const$u20$T$GT$12align_offset17h88359f35680ecfe2E
[00:54:21] error: Could not compile `core`.
[00:54:21] 
[00:54:21] To learn more, run the command again with --verbose.
[00:54:21] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnux32" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:54:21] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnux32" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:54:21] expected success, got: exit code: 101
[00:54:21] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1112:9
[00:54:21] travis_fold:end:stage2-std

[00:54:21] travis_time:end:stage2-std:start=1539196004625854356,finish=1539196036058284933,duration=31432430577

---
travis_time:end:0bda4d28:start=1539196037257181001,finish=1539196037264406946,duration=7225945
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:01ab913a
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:065694e7
travis_time:start:065694e7
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:115ee313
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Collaborator

rust-highfive commented Oct 10, 2018

The job dist-various-2 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:53:50]    Compiling unwind v0.0.0 (/checkout/src/libunwind)
[00:53:50]    Compiling compiler_builtins v0.0.0 (/checkout/src/rustc/compiler_builtins_shim)
[00:53:50]    Compiling alloc_jemalloc v0.0.0 (/checkout/src/liballoc_jemalloc)
[00:54:14]    Compiling std v0.0.0 (/checkout/src/libstd)
[00:54:21] LLVM ERROR: Cannot select: 0x7fe8e97384e0: ch,glue = X86ISD::CALL 0x7fe8e9502958, 0x7fe8e95022d8, Register:i32 $edi, RegisterMask:Untyped, 0x7fe8e9502958:1, libcore/ptr.rs:1563:12
[00:54:21]   0x7fe8e95022d8: i32,ch = load<(load 4 from got)> 0x7fe8e9614758, 0x7fe8e9502680, undef:i32, libcore/ptr.rs:1563:12
[00:54:21]     0x7fe8e9502680: i32 = X86ISD::WrapperRIP TargetGlobalAddress:i32<void ({ [0 x i32], { [0 x i8]*, i32 }, [0 x i32], { [0 x i8]*, i32 }, [0 x i32], i32, [0 x i32], i32, [0 x i32] }*)* @_ZN4core9panicking5panic17he0a0446e9b6bf934E> 0 [TF=5], libcore/ptr.rs:1563:12
[00:54:21]       0x7fe8e9502138: i32 = TargetGlobalAddress<void ({ [0 x i32], { [0 x i8]*, i32 }, [0 x i32], { [0 x i8]*, i32 }, [0 x i32], i32, [0 x i32], i32, [0 x i32] }*)* @_ZN4core9panicking5panic17he0a0446e9b6bf934E> 0 [TF=5], libcore/ptr.rs:1563:12
[00:54:21]     0x7fe8e95020d0: i32 = undef
[00:54:21]   0x7fe8e95027b8: i32 = Register $edi
[00:54:21]   0x7fe8e95021a0: Untyped = RegisterMask
[00:54:21]   0x7fe8e9502958: ch,glue = CopyToReg 0x7fe8e95025b0, Register:i32 $edi, 0x7fe8e95026e8, libcore/ptr.rs:1563:12
[00:54:21]     0x7fe8e95027b8: i32 = Register $edi
[00:54:21]     0x7fe8e95026e8: i32 = X86ISD::WrapperRIP TargetGlobalAddress:i32<<{ i8*, [4 x i8], i8*, [12 x i8] }>* @anon.edb5cb93134b159a66f2de0e3c5659b0.2.llvm.10114106681850442039> 0, libcore/ptr.rs:1563:12
[00:54:21]       0x7fe8e9502af8: i32 = TargetGlobalAddress<<{ i8*, [4 x i8], i8*, [12 x i8] }>* @anon.edb5cb93134b159a66f2de0e3c5659b0.2.llvm.10114106681850442039> 0, libcore/ptr.rs:1563:12
[00:54:21] In function: _ZN4core3ptr33_$LT$impl$u20$$BP$const$u20$T$GT$12align_offset17h88359f35680ecfe2E
[00:54:21] error: Could not compile `core`.
[00:54:21] 
[00:54:21] To learn more, run the command again with --verbose.
[00:54:21] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnux32" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:54:21] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnux32" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:54:21] expected success, got: exit code: 101
[00:54:21] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1112:9
[00:54:21] travis_fold:end:stage2-std

[00:54:21] travis_time:end:stage2-std:start=1539196004625854356,finish=1539196036058284933,duration=31432430577

---
travis_time:end:0bda4d28:start=1539196037257181001,finish=1539196037264406946,duration=7225945
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:01ab913a
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:065694e7
travis_time:start:065694e7
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:115ee313
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@GabrielMajeri

This comment has been minimized.

Show comment
Hide comment
@GabrielMajeri

GabrielMajeri Oct 10, 2018

Contributor

Well I don't know what's wrong with gnux32, but in my defense Clang also fails to compile when I pass -fPIC -fno-plt on the gnux32 target.
And it's apparently a know LLVM bug.

Should we just blacklist gnux32 for the -Z plt option?

Contributor

GabrielMajeri commented Oct 10, 2018

Well I don't know what's wrong with gnux32, but in my defense Clang also fails to compile when I pass -fPIC -fno-plt on the gnux32 target.
And it's apparently a know LLVM bug.

Should we just blacklist gnux32 for the -Z plt option?

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Oct 10, 2018

Contributor

Simply ignoring the test for the target should be fine.

Contributor

nagisa commented Oct 10, 2018

Simply ignoring the test for the target should be fine.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Oct 10, 2018

Member

Simply ignoring the test for the target should be fine.

Wouldn't that still mean the default no-plt is broken on gnux32?

Member

cuviper commented Oct 10, 2018

Simply ignoring the test for the target should be fine.

Wouldn't that still mean the default no-plt is broken on gnux32?

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Oct 10, 2018

Contributor

Oh right, I neglected to remember we’re making -Zplt=off by default.

In that case the default should be indicated in the target specification and the gnux32 should explicitly enable plt by default.

Thinking about it now, indicating PLT default in the target specification is a good idea nevertheless, but unlike with the flag target specifications aren’t covered by the stability, and would become insta-stable…

Contributor

nagisa commented Oct 10, 2018

Oh right, I neglected to remember we’re making -Zplt=off by default.

In that case the default should be indicated in the target specification and the gnux32 should explicitly enable plt by default.

Thinking about it now, indicating PLT default in the target specification is a good idea nevertheless, but unlike with the flag target specifications aren’t covered by the stability, and would become insta-stable…

@GabrielMajeri

This comment has been minimized.

Show comment
Hide comment
@GabrielMajeri

GabrielMajeri Oct 11, 2018

Contributor

The way I see it, -Z plt=off is an optimization, we don't guarantee it does anything (it's a best effort kind of thing). For now, I changed the code to unconditionally disable the optimization on gnux32 and always enable the PLT on that target, at least until LLVM gets fixed.

Contributor

GabrielMajeri commented Oct 11, 2018

The way I see it, -Z plt=off is an optimization, we don't guarantee it does anything (it's a best effort kind of thing). For now, I changed the code to unconditionally disable the optimization on gnux32 and always enable the PLT on that target, at least until LLVM gets fixed.

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Oct 11, 2018

Contributor
Contributor

nagisa commented Oct 11, 2018

@GabrielMajeri

This comment has been minimized.

Show comment
Hide comment
@GabrielMajeri

GabrielMajeri Oct 11, 2018

Contributor

@nagisa As far as I understand, even if somebody defines some external target with this ABI, there shouldn't be an issue.

The code checks for the (custom) target's llvm-target attribute to see if it contains gnux32. This is as far as we know the only ABI where LLVM currently has an issue (due to a bug).

For example, Clang accepts this option for all targets and ABIs (even Windows). On targets where it doesn't do anything, it emits the attributes, and LLVM just ignores them (except for this buggy ABI which crashes).

Contributor

GabrielMajeri commented Oct 11, 2018

@nagisa As far as I understand, even if somebody defines some external target with this ABI, there shouldn't be an issue.

The code checks for the (custom) target's llvm-target attribute to see if it contains gnux32. This is as far as we know the only ABI where LLVM currently has an issue (due to a bug).

For example, Clang accepts this option for all targets and ABIs (even Windows). On targets where it doesn't do anything, it emits the attributes, and LLVM just ignores them (except for this buggy ABI which crashes).

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Oct 11, 2018

Contributor
Contributor

nagisa commented Oct 11, 2018

@pnkfelix pnkfelix removed the I-nominated label Oct 11, 2018

@pnkfelix

This comment has been minimized.

Show comment
Hide comment
@pnkfelix

pnkfelix Oct 11, 2018

Member

(@nagisa said at T-compiler meeting that we can un-nominate this)

Member

pnkfelix commented Oct 11, 2018

(@nagisa said at T-compiler meeting that we can un-nominate this)

Support for disabling the PLT on ELF targets
Disable the PLT where possible to improve performance
for indirect calls into shared libraries.

This optimization is enabled by default where possible.

- Add the `NonLazyBind` attribute to `rustllvm`:
  This attribute informs LLVM to skip PLT calls in codegen.

- Disable PLT unconditionally:
  Apply the `NonLazyBind` attribute on every function.

- Only enable no-plt when full relro is enabled:
  Ensures we only enable it when we have linker support.

- Add `-Z plt` as a compiler option
@GabrielMajeri

This comment has been minimized.

Show comment
Hide comment
@GabrielMajeri

GabrielMajeri Oct 11, 2018

Contributor

@nagisa ok, I've added a needs_plt target option which can be customized for each target. It is used to help determine a default for the PLT option (and -Z plt always overrides the setting).

Contributor

GabrielMajeri commented Oct 11, 2018

@nagisa ok, I've added a needs_plt target option which can be customized for each target. It is used to help determine a default for the PLT option (and -Z plt always overrides the setting).

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Oct 11, 2018

Contributor

Perfect. Thanks!

@bors r+

Contributor

nagisa commented Oct 11, 2018

Perfect. Thanks!

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 11, 2018

Contributor

📌 Commit 6009da0 has been approved by nagisa

Contributor

bors commented Oct 11, 2018

📌 Commit 6009da0 has been approved by nagisa

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 11, 2018

Contributor

⌛️ Testing commit 6009da0 with merge 77af314...

Contributor

bors commented Oct 11, 2018

⌛️ Testing commit 6009da0 with merge 77af314...

bors added a commit that referenced this pull request Oct 11, 2018

Auto merge of #54592 - GabrielMajeri:no-plt, r=nagisa
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Security benefits

**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 11, 2018

Contributor

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing 77af314 to master...

Contributor

bors commented Oct 11, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing 77af314 to master...

@bors bors merged commit 6009da0 into rust-lang:master Oct 11, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@bors bors referenced this pull request Oct 11, 2018

Merged

Cleanup rustc/session #54963

@GabrielMajeri GabrielMajeri deleted the GabrielMajeri:no-plt branch Oct 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment