Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ballooning compile time with LVI mitigations #74632

Closed
jberci opened this issue Jul 22, 2020 · 20 comments
Closed

Ballooning compile time with LVI mitigations #74632

jberci opened this issue Jul 22, 2020 · 20 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times.

Comments

@jberci
Copy link

jberci commented Jul 22, 2020

There seems to be a significant compile time regression when using the LVI mitigation passes on some crates. The smallest case I could come up with is:

#[macro_use] extern crate crunchy;
#[macro_use] extern crate uint;
//construct_uint!(UINT, 1);
//construct_uint!(UINT, 2);
//construct_uint!(UINT, 4);
//construct_uint!(UINT, 8);
construct_uint!(UINT, 16);
fn main() {
	let a = UINT::from_dec_str("10").unwrap();
	let b = UINT::from_dec_str("20").unwrap();
	println!("a * b = {}", ((a * b - b ) / a).as_u32());
}

TOML:

[package]
name = "lvitest"
version = "0.1.0"
edition = "2018"
[dependencies]
crunchy = { version = "0.1" }
uint = { version = "0.2.1" }

This happens with the x86_64-fortanix-unknown-sgx target, but also without it with the following codegen options (.cargo/config.toml):

[build]
rustflags = ["-C", "link-dead-code", "-C", "llvm-args=--x86-experimental-lvi-inline-asm-hardening", "-C", "target-feature=+lvi-cfi,+lvi-load-hardening"]

Expected:
Without the config.toml on a vanilla rustup-installed nightly, a cargo clean && cargo build takes a second or two.

Instead:
With LVI mitigations enabled, compile time explodes to about 35 minutes on my machine.

If I disable linking dead code, it drops to 2 minutes, although that's mostly because the sample above is simple. Using shorter uints (e.g. the ones commented out above) makes it drop further still, but compile time still noticeably depends on the uint length.

Using a newer version of the uint crate (e.g. 0.4.1) also seems to solve the issue. The main difference that I could find is that the older version uses inline assembly. Not sure if that's a red herring or not.

Meta

rustc --version --verbose:

rustc 1.47.0-nightly (8ad7bc3f4 2020-07-21)
binary: rustc
commit-hash: 8ad7bc3f428300aee6764f6e23527e19eb235e81
commit-date: 2020-07-21
host: x86_64-unknown-linux-gnu
release: 1.47.0-nightly
LLVM version: 10.0
@jberci jberci added the C-bug Category: This is a bug. label Jul 22, 2020
@jonas-schievink
Copy link
Contributor

These are LLVM passes. Is there any reason to believe this is a rustc bug and not an LLVM bug?

@jonas-schievink jonas-schievink added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jul 22, 2020
@jberci
Copy link
Author

jberci commented Jul 22, 2020

I wouldn't say so, but a rust reproducer is the only one I have and I couldn't find a way to open an issue on the rust-lang/llvm-project fork. Apologies if I missed it.

@nikic
Copy link
Contributor

nikic commented Jul 22, 2020

cc @jethrogb

@jethrogb
Copy link
Contributor

cc @scottconstable @raoulstrackx

@scottconstable
Copy link

Are these crates being compiled with the MIP optimization plugin?

https://github.com/intel/lvi-llvm-optimization-plugin

@jethrogb
Copy link
Contributor

Probably not

@scottconstable
Copy link

Then I think the culprit is probably this issue in CodeGen/RDF that I reported back in April: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141332.html. Looks like it hasn't been fixed yet. I'll open up a bug report and ping Krzysztof.

@scottconstable
Copy link

Bug report created: https://bugs.llvm.org/show_bug.cgi?id=46808. Hopefully this will address the issue reported by @jberci

@jethrogb
Copy link
Contributor

I've reduced the example from the initial post to one file without external dependencies: main.rs

$ time rustc main.rs
real	0m2.223s
user	0m2.115s
sys	0m0.107s
$ time rustc -C llvm-args=--x86-experimental-lvi-inline-asm-hardening -C target-feature=+lvi-cfi,+lvi-load-hardening main.rs
real	1m47.199s
user	1m47.015s
sys	0m0.158s

@scottconstable
Copy link

@jethrogb Would it be possible to run that example with Linux perf to see which function is responsible for adding all those cycles?

@jethrogb
Copy link
Contributor

jethrogb commented Jul 22, 2020

    63.13%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] std::_Function_handler<void (llvm::ImmutableGraph<llvm::MachineInstr*, int>::Node const*, bool), (anonymous namespace)::X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes((anonymous namespace)::MachineGadgetGraph&, llvm::ImmutableGraph<llvm::MachineInstr*, int>::EdgeSet&, llvm::ImmutableGraph<llvm::MachineInstr*, int>::NodeSet&) const::$_3>::_M_invoke
     7.65%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] std::_Rb_tree<std::pair<unsigned int, llvm::LaneBitmask>, std::pair<unsigned int, llvm::LaneBitmask>, std::_Identity<std::pair<unsigned int, llvm::LaneBitmask> >, std::less<std::pair<unsigned int, llvm::LaneBitmask> >, std::allocator<std::pair<unsigned int, llvm::LaneBitmask> > >::_M_insert_unique<std::pair<unsigned int, llvm::LaneBitmask> >
     6.35%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::RegisterAggr::makeRegRef
     5.97%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::Liveness::computePhiInfo
     2.84%  rustc     rustc                                [.] malloc
     2.55%  rustc     rustc                                [.] free
     1.46%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] std::_Rb_tree_increment
     1.42%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::RegisterAggr::insert
     1.32%  rustc     libc-2.23.so                         [.] __memset_avx2
     1.19%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::RegisterAggr::clearIn
     0.96%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] (anonymous namespace)::X86LoadValueInjectionLoadHardeningPass::runOnMachineFunction
     0.85%  rustc     librustc_driver-582725d49f41b219.so  [.] rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations
     0.61%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] (anonymous namespace)::X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes

@scottconstable
Copy link

So it looks like ~63% is due to the LVI load hardening pass (specifically, the greedy heuristic for inserting LFENCEs), and another ~24% comes from RDF. I take it that Rust probably does a lot of aggressive inlining and LTO, and the function being mitigated is just absolutely huge?

@jethrogb
Copy link
Contributor

I take it that Rust probably does a lot of aggressive inlining and LTO, and the function being mitigated is just absolutely huge?

Yes to all of the above. Most of the 5000-line file is one function. Is it unreasonable to expect such a case to ever be fast with LVI mitigations?

@scottconstable
Copy link

I think it should still be possible to have fast compilation for very large functions. I'm curious about the observation that a newer uint crate seems to solve the issue, as @jberci had observed. Does the updated crate also contain a 5,000 LoC function? There might be something more subtle going on.

@scottconstable
Copy link

I created a patch that should fix the overhead caused by X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes()
https://reviews.llvm.org/D84471

A complete solution will also require a patch to address the complexity issues in RDF.

@jethrogb
Copy link
Contributor

@jberci would you be able to test your build with Scott's patch?

@jyn514 jyn514 added the I-compiletime Issue: Problems and improvements with respect to compile times. label Jul 27, 2020
@jberci
Copy link
Author

jberci commented Jul 27, 2020

Definitely. I built a plain rustc from commit 8ad7bc3 (mostly to check if I'm even doing it right) and another one with the patch from D84471 applied on top of that.

For uint 0.2.1, the plain one took around half an hour to compile my sample, as before, and the one with D84471 applied took just under 4 minutes. (And disabling the LVI passes brings that down to 10s). With the newer 0.4 version of uint, both rustc builds behave about the same (~45s with LVI passes, ~25s without).
I'm still working on building the original project that surfaced this with the patched rustc, but from the rest of what I could test so far locally, the effect should be the same as for the sample here.

topperc pushed a commit to llvm/llvm-project that referenced this issue Jul 31, 2020
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
hanswinderix pushed a commit to hanswinderix/llvm-project that referenced this issue Aug 5, 2020
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
@scottconstable
Copy link

To summarize where we stand now, we have committed a patch to address the slowdown in the LVI LFENCE insertion algorithm. Krzysztof has committed several patches on the RDF side that reduce but may not completely eliminate the overhead for live variables analysis (see https://bugs.llvm.org/show_bug.cgi?id=46808). Maybe it would be good to take stock of where we stand right now with the compile time overhead. If the overhead remains too high, then we may need to do some more involved reworking of some of the RDF algorithms to make them more efficient.

jethrogb pushed a commit to fortanix/llvm-project that referenced this issue Sep 8, 2020
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
@jethrogb jethrogb mentioned this issue Sep 16, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Sep 19, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Sep 19, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Sep 19, 2020
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 21, 2020
Update LLVM

This (partially?) addresses rust-lang#74632

r? `@cuviper`
cuviper pushed a commit to rust-lang/llvm-project that referenced this issue Sep 22, 2020
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
@jethrogb
Copy link
Contributor

Ok we now have a Rust nightly that includes the patches from @scottconstable and Krzysztof. @jberci are you happy with the compile times now?

cuviper pushed a commit to rust-lang/llvm-project that referenced this issue Oct 14, 2020
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
cuviper pushed a commit to rust-lang/llvm-project that referenced this issue Jan 7, 2021
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Mar 21, 2021
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
@nikic
Copy link
Contributor

nikic commented Feb 4, 2022

As far as I can tell, this has been resolved, so closing the issue. Please let me know if this is not the case.

@nikic nikic closed this as completed Feb 4, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times.
Projects
None yet
Development

No branches or pull requests

6 participants