Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full RELRO #43170

Merged
merged 5 commits into from Jul 19, 2017
Merged

Add support for full RELRO #43170

merged 5 commits into from Jul 19, 2017

Conversation

kyrias
Copy link
Contributor

@kyrias kyrias commented Jul 11, 2017

This commit adds support for full RELRO, and enables it for the
platforms I know have support for it.

Full RELRO makes the PLT+GOT data read-only on startup, preventing it
from being overwritten.

http://tk-blog.blogspot.com/2009/02/relro-not-so-well-known-memory.html

Fixes #29877.


I'm not entirely certain if this is the best way to do it, but I figured mimicking the way it's done for PIE seemed like a good start at least. I'm not sure whether we want to have it enabled by default globally and then disabling it explicitly for targets that don't support it though. I'm also not sure whether the full_relro function should call bug!() or something like it for linkers that don't support it rather than no-opping.

This commit adds support for full RELRO, and enables it for the
platforms I know have support for it.

Full RELRO makes the PLT+GOT data read-only on startup, preventing it
from being overwritten.

http://tk-blog.blogspot.com/2009/02/relro-not-so-well-known-memory.html

Fixes rust-lang#29877.

Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
@rust-highfive
Copy link
Collaborator

rust-highfive commented Jul 11, 2017

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @eddyb (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@eddyb
Copy link
Member

eddyb commented Jul 11, 2017

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned eddyb Jul 11, 2017
@cuviper
Copy link
Member

cuviper commented Jul 11, 2017

As I noted in #29877 (comment), I do add these flags on Fedora and EPEL7 (for RHEL7). However, I encountered bizarre linking bugs with this on RHEL6 for a few architectures -- I think that was on powerpc64 specifically. So that makes me really hesitate whether this should be the default behavior of rustc.

@alexcrichton
Copy link
Member

alexcrichton commented Jul 11, 2017

Thanks for the PR @kyrias! This seems reasonable to me to try to set these by default, but @cuviper's concerns worry me! @cuviper you wouldn't happen still have any of those link error logs on hand, would you?

@kyrias do you know when these linker flags were introduced? One concern on the related issue is that not all linkers may support this, but my guess is that all recent-ish linkers should probably support it, just curious how far back it goes!

@cuviper
Copy link
Member

cuviper commented Jul 11, 2017

I don't have those old logs anymore. I'm trying to reproduce it anew...

@arielb1 arielb1 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 11, 2017
@kyrias
Copy link
Contributor Author

kyrias commented Jul 12, 2017

GNU binutils seems to have added them in 2.16 which was released in 2005, and I'm not sure if it'd be worth it to support 12+ year old linkers. All Linux and BSD targets we want to support should be fine with this as it is. It's not enabled on e.g. Solaris though, since they use their own linker which doesn't seem to have support for RELRO.

@alexcrichton
Copy link
Member

alexcrichton commented Jul 12, 2017

For comparison glibc 2.5 (the oldest one we support) was released on 2006-09-29, so I think we're probably covered here! Let's just wait and see if @cuviper's reproduction brings up something new, otherwise this seems fine to merge to me.

@parched
Copy link
Contributor

parched commented Jul 12, 2017

Is there any worry about increased program load times with this?

@kyrias
Copy link
Contributor Author

kyrias commented Jul 12, 2017

So, with full RELRO there is a slight performance impact when it resolves all the symbols, but in practice it's pretty much not noticeable unless the program has a very large number of symbols.

@kyrias
Copy link
Contributor Author

kyrias commented Jul 12, 2017

It would possibly make sense to add a compiler flag to disable full BIND_NOW for specific packages though, so you'd end up with partial RELRO without the performance impact of BIND_NOW.

@cuviper
Copy link
Member

cuviper commented Jul 12, 2017

Since Rust's kernel/glibc baseline is basically RHEL5, we can note that has binutils-2.17 too.

Also notable, since 2.27 ld defaults to enabling RELRO "for all Linux targets except FRV, HPPA, IA64 and MIPS." I don't think it defaults BIND_NOW though.

I'm still working on the reproducing my earlier issues. I couldn't get it manually, but I just found the git history of my package scripts where I turned it off, so I'm trying to build right before that again now.

@cuviper
Copy link
Member

cuviper commented Jul 13, 2017

OK, not a link error, "just" a segfault:

running: "/builddir/build/BUILD/rust-1.17.0-powerpc64-unknown-linux-gnu/usr/bin/cargo" "build" "-j" "32" "--target" "powerpc64-unknown-linux-gnu" "-v" "--release" "--frozen" "--features" "panic-unwind backtrace" "--manifest-path" "/builddir/build/BUILD/rustc-1.17.0-src/src/libstd/Cargo.toml"
error: process didn't exit successfully: `/builddir/build/BUILD/rustc-1.17.0-src/build/bootstrap/debug/rustc -vV` (signal: 11, SIGSEGV: invalid memory reference)

I don't remember how I connected that to RELRO/BIND_NOW, and this attempt now was in the build farm so I don't actually have access to those failed artifacts. I'll start another build where I actually have a shell, using current master.

@alexcrichton
Copy link
Member

alexcrichton commented Jul 13, 2017

@cuviper oh thanks for that! I wonder, does that mean that we should perhaps pass -z,relro but not -z,now? And maybe we shouldn't pass that on mips?

@cuviper
Copy link
Member

cuviper commented Jul 14, 2017

I finally found my notes from before! We determined this to be a bug in the dynamic loader (ld.so), but sadly that bugzilla got marked private. I'll see if we can get a public bug opened...

I can still reproduce the issue while building 1.18, but 1.19-beta and later seem to run OK. I'm not especially confident in that though, given that the bug wasn't really rust's fault, so that "fix" may be a fluke.

@kyrias
Copy link
Contributor Author

kyrias commented Jul 14, 2017

I think it would make sense to enable both for everything other than the ones binutils don't have it enabled for, and ppc64.

@cuviper Does it reproduce with partial RELRO as well, or just full RELRO?

@cuviper
Copy link
Member

cuviper commented Jul 14, 2017

My bug just got closed as a duplicate of this one which is public:
https://bugzilla.redhat.com/show_bug.cgi?id=1398716

And indeed in rustc's case, it's a similar resolve_ifunc crash hitting gettimeofday before libc.so is relocated. I've asked for clarification about affected arches, as we may just be getting lucky that others are working.

@cuviper Does it reproduce with partial RELRO as well, or just full RELRO?

Using -z,relro alone works for me. It appears to be a problem only with -z,now / BIND_NOW. You can even induce it on existing rustc binaries by setting the environment LD_BIND_NOW=1.

@kyrias
Copy link
Contributor Author

kyrias commented Jul 14, 2017

Any chance you could backport that commit to the glibc version you're using and seeing if that actually fixes it or not? If it does in fact fix it, I'm not sure what the best way forward would be. It would sort of suck to not be able to enable it due to a bug that got fixed 6 years ago, but if rustc wants to support RHEL5 as a baseline, which is 10 years old by now, I'm sort of uncertain.

I think it might be a good idea to have it on by default, and then have a built-time option for very out-of-date operating systems to disable it in that case.


Anyway, I could probably start by reworking this PR so that it treats partial and full RELRO separately, so that it would be easy to selectively disable full RELRO.

@cuviper
Copy link
Member

cuviper commented Jul 14, 2017

Any chance you could backport that commit to the glibc version you're using and seeing if that actually fixes it or not? If it does in fact fix it, I'm not sure what the best way forward would be.

I have tested it, but I can't predict how soon an update may be released. Of course we have to be careful that it doesn't cause regressions elsewhere...

It would sort of suck to not be able to enable it due to a bug that got fixed 6 years ago, but if rustc wants to support RHEL5 as a baseline, which is 10 years old by now, I'm sort of uncertain.

FWIW, I don't know if this affects RHEL5 too. Comment 25 indicates that this might have regressed from 6.6 to 6.7, but that could have been some happenstance change in load order. Also, RHEL5 is in the extended life phase, so I doubt this would be considered severe enough for an update.

I think it might be a good idea to have it on by default, and then have a built-time option for very out-of-date operating systems to disable it in that case.

You'd have to disable it on rustc/etc. for those systems to run upstream compiler binaries at all.

I think relro alone is OK, and perhaps -z,relro and -z,now could both get nice -C options to enable them, rather than having to use raw -Clink-arg.

@cuviper
Copy link
Member

cuviper commented Jul 14, 2017

FWIW, I don't know if this affects RHEL5 too.

Actually, rust's dist-powerpc64-linux is only configured for ~RHEL6, so it would have problems with glibc symbol versions if you tried to run on RHEL5-ppc64 at all.

I think only the x86 builders are older, and they're running on native CentOS 5 images, so they would probably fail directly while bootstrapping if that glibc had BIND_NOW problems.

@alexcrichton
Copy link
Member

alexcrichton commented Jul 14, 2017

If this ends up being an powerpc64-specific bug then I think it's also fine to just disable it there with a comment indicating why.

It also seems like the risk of breakage from this is pretty low, so we could always land it and see what happens? Backing out the change would be pretty easy as well, so I think it may not cause too much of a splash to implement.

If we end up having problems during the x86 bootstrap within CentOS 5 the bors will discover that anyway as this won't be able to land!

@cuviper what do you think of the "land and let's see" approach? Should we disable full relro on powerpc64 for now but leave partial relro enabled?

I think regardless we'll hold off on this for a few days to land this after the beta branching on this coming Monday just to make sure we have a maximal amount of time to detect regressions.

@cuviper
Copy link
Member

cuviper commented Jul 14, 2017

I'm not positive that bootstrap will exercise this in the CentOS 5 builders. In stage0 we'll build a stage1 without relro, then stage1 will build stage2 with relro. But then I think we don't actually execute stage2 for anything unless you run tests, right? (Even just rustc -V will do...)

Anyway, if we want to "land and let's see", I can make a point of trying the first relro nightly on the various arches that Red Hat and Fedora support. I would prefer we avoid BIND_NOW on powerpc64 though, since we already know that's problematic.

Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
@kyrias
Copy link
Contributor Author

kyrias commented Jul 14, 2017

Hokay, so I've now:

  • Added support for both full and partial RELRO
  • Added a -C flag called relro-level that lets you select between full, partial, or off
  • Made ppc64 targets use partial RELRO by default rather than full.


pub fn target() -> TargetResult {
let mut base = super::linux_base::opts();
base.cpu = "ppc64le".to_string();
base.pre_link_args.get_mut(&LinkerFlavor::Gcc).unwrap().push("-m64".to_string());
base.max_atomic_width = Some(64);

// ld.so in at least RHEL6 on ppc64 has a bug related to BIND_NOW, so only enable partial RELRO
// for now. https://github.com/rust-lang/rust/pull/43170#issuecomment-315411474
base.relro_level = RelroLevel::Partial;
Copy link
Member

@cuviper cuviper Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppc64le didn't exist until RHEL7, so I think we can leave that full.

Copy link
Contributor Author

@kyrias kyrias Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, will undo that part, thanks.

On at least RHEL6 there is a segfault caused by the older ld.so version
when BIND_NOW is used, so use partial RELRO by default on ppc64
architectures for now.

Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
match s {
"full" => base.options.$key_name = RelroLevel::Full,
"partial" => base.options.$key_name = RelroLevel::Partial,
"off" => base.options.$key_name = RelroLevel::Off,
Copy link
Member

@alexcrichton alexcrichton Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be deduplicated with parse_relro_level above?

Copy link
Contributor Author

@kyrias kyrias Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just duplicating the way it was done for PanicStrategy, but I can look into deduplicating it later this week.

Copy link
Member

@alexcrichton alexcrichton Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh heh good point! We should deduplicate that one as well at some point...

kyrias added 2 commits Jul 17, 2017
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
@alexcrichton
Copy link
Member

alexcrichton commented Jul 18, 2017

@bors: r+

Thanks @kyrias!

@bors
Copy link
Contributor

bors commented Jul 18, 2017

📌 Commit 2161fb2 has been approved by alexcrichton

@arielb1 arielb1 added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 18, 2017
@bors
Copy link
Contributor

bors commented Jul 19, 2017

Testing commit 2161fb2 with merge 4e56bbe...

bors added a commit that referenced this pull request Jul 19, 2017
Add support for full RELRO

This commit adds support for full RELRO, and enables it for the
platforms I know have support for it.

Full RELRO makes the PLT+GOT data read-only on startup, preventing it
from being overwritten.

http://tk-blog.blogspot.com/2009/02/relro-not-so-well-known-memory.html

Fixes #29877.

---

I'm not entirely certain if this is the best way to do it, but I figured mimicking the way it's done for PIE seemed like a good start at least.  I'm not sure whether we want to have it enabled by default globally and then disabling it explicitly for targets that don't support it though.  I'm also not sure whether the `full_relro` function should call `bug!()` or something like it for linkers that don't support it rather than no-opping.
@bors
Copy link
Contributor

bors commented Jul 19, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 4e56bbe to master...

@bors bors merged commit 2161fb2 into rust-lang:master Jul 19, 2017
@kyrias kyrias deleted the full-relro branch Jul 20, 2017
@alexcrichton alexcrichton added the relnotes Marks issues that should be documented in the release notes of the next release. label Jul 20, 2017
@lilianmoraru
Copy link

lilianmoraru commented Aug 19, 2017

@kyrias I try to use this and it doesn't seem to work for me.
RUSTFLAGS="-C relro-level=full" cargo +nightly build
or
rustc +nightly -C relro-level=help

rustc version: rustc 1.21.0-nightly (df511d554 2017-08-14)
Am I using the flags incorrectly?

@kyrias
Copy link
Contributor Author

kyrias commented Aug 19, 2017

@lilianmoraru It's currently under -Z to mark it as an unstable flag.

@kyrias kyrias mentioned this pull request Feb 20, 2018
bors added a commit that referenced this pull request Mar 8, 2018
Add relro-level tests

The `relro-level` debugging flag was added in #43170 which was merged in July 2017.  This PR moves this flag to be a proper codegen flag.
bors added a commit that referenced this pull request Mar 9, 2018
Add relro-level tests

The `relro-level` debugging flag was added in #43170 which was merged in July 2017.  This PR moves this flag to be a proper codegen flag.
bors added a commit that referenced this pull request Mar 10, 2018
Add relro-level tests

The `relro-level` debugging flag was added in #43170 which was merged in July 2017.  This PR moves this flag to be a proper codegen flag.
bors added a commit that referenced this pull request Mar 10, 2018
Add relro-level tests

The `relro-level` debugging flag was added in #43170 which was merged in July 2017.  This PR moves this flag to be a proper codegen flag.
bors added a commit that referenced this pull request Sep 26, 2018
[WIP] Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## To do

- [ ] Do a perf run to see the effect this has on the compiler (cc @michaelwoerister),
  and possibly run benchmarks on some more crates

- [ ] Add a code gen test

- [ ] Should this be always enabled or should it be behind a command line option?
  If so, what should it be called? `-Z no-plt`? `-Z plt=no`?
bors added a commit that referenced this pull request Sep 28, 2018
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
bors added a commit that referenced this pull request Sep 28, 2018
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
bors added a commit that referenced this pull request Sep 29, 2018
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Security benefits

**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
bors added a commit that referenced this pull request Oct 10, 2018
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Security benefits

**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
bors added a commit that referenced this pull request Oct 11, 2018
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Security benefits

**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants