Compiling small program with -C opt-level=2 takes hours or never finishes without -Cinline-threshold flag #86870

ajdust · 2021-07-04T19:04:31Z

Hi!

Background

I generated a small identical program in Rust and C++. See for instance:

Benchmarking the compilation of this code (using hyperfine) I find:

# equivalent C++ code compiles in under 4 seconds to compile 500 functions with clang
test> hyperfine 'clang++ -O2 test_500.cpp -Wno-xor-used-as-pow' --warmup 1
Benchmark 1: clang++ -O2 test_500.cpp -Wno-xor-used-as-pow
  Time (mean ± σ):      3.224 s ±  0.066 s    [User: 2.8 ms, System: 7.3 ms]
  Range (min … max):    3.181 s …  3.348 s    10 runs

# trying to compile with equivalent Rust with 500 functions goes nowhere... cancelled after leaving this running for 12 hours
test> hyperfine 'rustc -C opt-level=2 test_500.rs' --min-runs 2
Benchmark 1: rustc -C opt-level=2 test_500.rs
 | Initial time measurement       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:00

# code compiles quickly with -Cinline-threshold passed, back to under 4 seconds like clang
test> hyperfine 'rustc -C opt-level=2 test_500.rs -Cinline-threshold=1000' --min-runs 2
Benchmark 1: rustc -C opt-level=2 test_500.rs -Cinline-threshold=1000
  Time (mean ± σ):      3.888 s ±  0.246 s    [User: 0.0 ms, System: 5.6 ms]
  Range (min … max):    3.714 s …  4.062 s    2 runs

# smaller sample for 20 functions takes a second to compile with clang
test> hyperfine 'clang++ -O2 test_20.cpp -Wno-xor-used-as-pow' --warmup 1
Benchmark 1: clang++ -O2 test_20.cpp -Wno-xor-used-as-pow
  Time (mean ± σ):      1.143 s ±  0.022 s    [User: 2.7 ms, System: 6.6 ms]
  Range (min … max):    1.122 s …  1.201 s    10 runs

# smaller sample with 20 functions takes nearly two minutes to compile with rustc
test> hyperfine 'rustc -C opt-level=2 test_20.rs' --min-runs 2
Benchmark 1: rustc -C opt-level=2 test_20.rs
  Time (mean ± σ):     100.216 s ±  0.008 s    [User: 0.0 ms, System: 11.6 ms]
  Range (min … max):   100.210 s … 100.221 s    2 runs

Expectation vs. reality

I would expect Rust to have similar optimizing performance to clang in this case. That is, compiling in seconds without an obscure compiler flag. Instead, it effectively fails to compile by never finishing. As a Rust newb this behavior is not very friendly. Is there a reasonable limit like comes into play with the -Cinline-threshold flag that should be included in the optimization flags (such opt-level=2) by default, or a way to otherwise stop the compiler from falling down an endless or futile pit?

Thank you

Thank you for you time, I hope this issue can improve Rust. There is the accompanying StackOverflow posting where I learned about the -Cinline-threshold flag and user bluss recommended a bug report here.

rustc versions used; I previously replicated this on Linux and OS X as well.

rustc 1.53.0 (53cb7b09b 2021-06-17)
binary: rustc
commit-hash: 53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b
commit-date: 2021-06-17
host: x86_64-pc-windows-msvc
release: 1.53.0
LLVM version: 12.0.1

rustc 1.55.0-nightly (71b8742bb 2021-07-03)
binary: rustc
commit-hash: 71b8742bbcbed2cd908dbc031d6552d8b528c037
commit-date: 2021-07-03
host: x86_64-pc-windows-msvc
release: 1.55.0-nightly
LLVM version: 12.0.1

Here is the 20 function Rust code for reference if the gists ever stops working:

//$ rustc -C opt-level=2 test_20.rs
//  takes over a minute to compile, rustc 1.55.0
//  see https://gist.github.com/johnsabr/5e92cab52ffab5ea2a52edbd47aa348a
#![allow(unused_parens)]

fn f0(p: i32) -> i32 {
    let x1: i32 = (p - ((((21 | 1) | p) ^ 84) & ((48 ^ (52 | (p & (2 ^ 61)))) - 67)));
    let x2: i32 = x1;
    let mut x3: i32 = 54;
    let x4: i32 = 75;
    let x5: i32 = (77 & 39);
    let x6: i32 = (x2 * x5);
    let x7: i32 = (88 * (8 + x1));
    x3 = (x3 + 60);
    ((((((((32 * p) & x1) ^ x2) - x3) ^ x4) & x5) | x6) | x7)
}

fn f1(p: i32) -> i32 {
    let mut x1: i32 = f0(78);
    x1 = (x1 ^ p);
    let mut x2: i32 = f0(x1);
    x2 = (x2 * 3);
    let x3: i32 = f0(x1);
    let x4: i32 = ((21 & (x3 - ((93 * (x3 - (f0(x3) - (x2 - (f0(x1) | 43))))) | (f0(p) - f0(x1))))) * 41);
    ((((((f0(p) | x2) ^ p) & x1) ^ x2) | x3) - x4)
}

fn f2(p: i32) -> i32 {
    let mut x1: i32 = f1(50);
    x1 = (x1 * p);
    x1 = (x1 | f0(p));
    let mut x2: i32 = f1(x1);
    x2 = (x2 | f1(x2));
    let mut x3: i32 = (24 * f0(x1));
    x2 = (x2 & f0(p));
    x3 = (x3 ^ x1);
    let x4: i32 = x1;
    (((((x4 ^ p) | x1) * x2) + x3) | x4)
}

fn f3(p: i32) -> i32 {
    let mut x1: i32 = f2(75);
    let x2: i32 = x1;
    x1 = (x1 & x2);
    let x3: i32 = f0(p);
    let x4: i32 = ((f1(x3) ^ f1(x2)) + 92);
    x1 = (x1 | (x2 ^ 94));
    x1 = (x1 * x2);
    let x5: i32 = (f0(x1) & (3 ^ (f0(x1) * f2(x4))));
    x1 = (x1 + x2);
    (((((((x1 * x5) * p) - x1) | x2) * x3) - x4) - x5)
}

fn f4(p: i32) -> i32 {
    let mut x1: i32 = f3(14);
    x1 = (x1 + f0(p));
    let mut x2: i32 = f1(x1);
    x1 = (x1 - 41);
    x2 = (x2 ^ 61);
    let x3: i32 = f2(p);
    x2 = (x2 ^ p);
    let x4: i32 = x2;
    x1 = (x1 - p);
    x1 = (x1 * x4);
    ((((((88 & 11) & p) - x1) * x2) ^ x3) | x4)
}

fn f5(p: i32) -> i32 {
    let mut x1: i32 = f4(50);
    x1 = (x1 ^ 13);
    ((35 + p) | x1)
}

fn f6(p: i32) -> i32 {
    let mut x1: i32 = f5(51);
    x1 = (x1 + 27);
    let x2: i32 = (p + (p | f1(x1)));
    x1 = (x1 + f0(x2));
    let x3: i32 = f0(x1);
    let mut x4: i32 = 48;
    x1 = (x1 ^ f0(p));
    x1 = (x1 & 26);
    x4 = (x4 * f1(x4));
    (((((99 - p) * x1) ^ x2) & x3) + x4)
}

fn f7(p: i32) -> i32 {
    let mut x1: i32 = f6(71);
    x1 = (x1 & 66);
    x1 = (x1 & p);
    let x2: i32 = 57;
    x1 = (x1 * 26);
    let x3: i32 = (21 & p);
    let x4: i32 = (f0(x1) & (f3(p) * f2(p)));
    let x5: i32 = f6(x3);
    ((((((x5 + p) | x1) + x2) - x3) & x4) * x5)
}

fn f8(p: i32) -> i32 {
    let mut x1: i32 = f7(57);
    x1 = (x1 & f5(p));
    x1 = (x1 ^ (x1 & f1(p)));
    let x2: i32 = 25;
    let x3: i32 = f5(x1);
    ((((x1 - p) * x1) & x2) ^ x3)
}

fn f9(p: i32) -> i32 {
    let mut x1: i32 = f8(23);
    x1 = (x1 | (((26 | f4(x1)) - f0(p)) | f8(p)));
    let x2: i32 = x1;
    let mut x3: i32 = 58;
    x3 = (x3 - p);
    let x4: i32 = f7(x1);
    let x5: i32 = f7(x2);
    let x6: i32 = (f7(x1) & 79);
    (((((((33 | p) - x1) + x2) + x3) * x4) ^ x5) + x6)
}

fn f10(p: i32) -> i32 {
    let mut x1: i32 = f9(75);
    x1 = (x1 | 37);
    (((f8(x1) + f3(x1)) | p) * x1)
}

fn f11(p: i32) -> i32 {
    let mut x1: i32 = f10(8);
    x1 = (x1 ^ f6(x1));
    let mut x2: i32 = p;
    x2 = (x2 ^ 84);
    let x3: i32 = (f5(p) ^ f5(p));
    x1 = (x1 * f5(p));
    x1 = (x1 | f1(x2));
    x1 = (x1 * f8(p));
    ((((((f0(x3) | f9(p)) - f4(x1)) + p) & x1) & x2) - x3)
}

fn f12(p: i32) -> i32 {
    let mut x1: i32 = f11(33);
    x1 = (x1 * 84);
    let mut x2: i32 = (67 - f0(p));
    x2 = (x2 | x1);
    x1 = (x1 - 67);
    x2 = (x2 - f6(p));
    (((p - p) * x1) | x2)
}

fn f13(p: i32) -> i32 {
    let mut x1: i32 = f12(90);
    x1 = (x1 + (f6(x1) - f4(p)));
    x1 = (x1 - 19);
    let x2: i32 = 92;
    let mut x3: i32 = f9(x1);
    let mut x4: i32 = x3;
    x4 = (x4 - (87 | f5(x3)));
    x3 = (x3 | 49);
    let x5: i32 = 25;
    let x6: i32 = x3;
    (((((((2 & p) - x1) ^ x2) ^ x3) ^ x4) | x5) | x6)
}

fn f14(p: i32) -> i32 {
    let mut x1: i32 = f13(66);
    let x2: i32 = f2(p);
    x1 = (x1 - 11);
    let mut x3: i32 = 69;
    x3 = (x3 * x2);
    let x4: i32 = 91;
    (((((19 * p) + x1) | x2) ^ x3) & x4)
}

fn f15(p: i32) -> i32 {
    let mut x1: i32 = f14(79);
    x1 = (x1 + (f8(p) & p));
    let x2: i32 = p;
    x1 = (x1 | ((f5(p) & x2) ^ x2));
    let mut x3: i32 = x1;
    x1 = (x1 - p);
    x3 = (x3 * p);
    ((((40 * p) ^ x1) + x2) + x3)
}

fn f16(p: i32) -> i32 {
    let x1: i32 = f15(77);
    let mut x2: i32 = 5;
    let mut x3: i32 = x1;
    let x4: i32 = p;
    x2 = (x2 + p);
    let x5: i32 = x4;
    x3 = (x3 | f9(x4));
    let x6: i32 = (68 ^ (61 ^ (24 * f14(x4))));
    (((((((88 + p) - x1) & x2) | x3) & x4) ^ x5) | x6)
}

fn f17(p: i32) -> i32 {
    let mut x1: i32 = f16(41);
    x1 = (x1 | 4);
    let mut x2: i32 = x1;
    x1 = (x1 | 52);
    x1 = (x1 & 49);
    x2 = (x2 & (f8(x2) ^ p));
    let mut x3: i32 = x2;
    x3 = (x3 ^ ((x1 ^ x2) + f15(x2)));
    let mut x4: i32 = (f13(x2) ^ 73);
    x4 = (x4 - f12(x1));
    (((((x3 - p) + x1) ^ x2) + x3) | x4)
}

fn f18(p: i32) -> i32 {
    let mut x1: i32 = f17(3);
    x1 = (x1 & (p - ((33 * (95 | 87)) | (9 - f1(x1)))));
    x1 = (x1 & (80 - f16(x1)));
    x1 = (x1 & p);
    x1 = (x1 + p);
    x1 = (x1 | (82 - ((81 ^ p) - 97)));
    ((20 - p) * x1)
}

fn f19(p: i32) -> i32 {
    let x1: i32 = f18(24);
    let x2: i32 = (p & p);
    let mut x3: i32 = 82;
    let x4: i32 = (4 + x1);
    x3 = (x3 | ((f10(p) + (f16(x3) - 34)) - f10(x1)));
    let x5: i32 = (x4 | (x1 * (((f16(x1) + f4(x4)) - 43) & f7(x3))));
    (((((((f14(x3) | f9(x5)) - p) & x1) * x2) & x3) * x4) + x5)
}

fn f20(p: i32) -> i32 {
    let x1: i32 = f19(78);
    let x2: i32 = 81;
    let x3: i32 = (x2 + (59 & x1));
    (((((f9(x3) ^ f11(x3)) * p) * x1) - x2) ^ x3)
}

fn main() {
    let mut x0: i32 = f20(65);
    x0 = (x0 * (53 + 37));
    let mut x1: i32 = (x0 - ((41 | ((f20(x0) * f9(x0)) + ((((f20(x0) + (77 + (f14(x0) ^ 60))) * 27) & 62) + x0))) & f20(x0)));
    let x2: i32 = f15(x1);
    x1 = (x1 | (x0 * (4 ^ 37)));
    let m: i32 = (((x2 | x0) | x1) | x2);
    println!("{}", m);

}

The text was updated successfully, but these errors were encountered:

klensy · 2021-07-04T22:59:04Z

With opt-level 0 or 1 it compiles instant both versions. With 0 level it reports

thread 'main' panicked at 'attempt to multiply with overflow', test_500.rs:100:7
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Is it expected for your code?

ajdust · 2021-07-05T00:06:45Z

Is it expected for your code?

That's right klensy. This generated code was more about stressing the compiler than actually running the resulting program. With low optimization or debug compilation the code triggers 'attempt to multiply with overflow'. The higher opt-level such as 2 turns off that check I believe. The 20 function versions - test_20.rs which takes a minute+ to compile by default and test_20.cpp which takes a second - when compiled with optimization both print the value "-148260" as the result.

Note the 500 function (or more) versions are scaled up version intended to stress a compiler.

If a larger version that still outputs a result is useful, Here's a 300 function version where the executable produced by rustc and clang++ when compiled under optimization prints "-1247309868": test_rs_300.rs and test_cpp_300.

Cryptjar · 2021-07-05T00:42:00Z

You know, I though if the arithmetic should wrap around, why not use std::num::Wrapping, right?

So, I, actually, try it out and replaced all the i32 with Wrapping<i32> (and wrapped the literals too). And what should I say, it just compiles 3.2 s on my machine, whereas the C++ code with clang++ takes me 2.3 s, so fairly comparable.

But I don't really understand, why this makes such a big difference for the compiler.

Rust version with Wrapping:

$ hyperfine 'rustc -C opt-level=2 main_500.rs'
Benchmark #1: rustc -C opt-level=2 main_500.rs
  Time (mean ± σ):      3.277 s ±  0.044 s    [User: 3.272 s, System: 0.067 s]
  Range (min … max):    3.212 s …  3.370 s    10 runs

Original C++ version:

$ hyperfine 'clang++ -O2 test_500.cpp -Wno-xor-used-as-pow'
Benchmark #1: clang++ -O2 test_500.cpp -Wno-xor-used-as-pow
  Time (mean ± σ):      2.347 s ±  0.079 s    [User: 2.271 s, System: 0.053 s]
  Range (min … max):    2.249 s …  2.462 s    10 runs

Edit: I get similar results for the 300 functions test, that is 2.0 s for wrapping-Rust compilation and 1.6 s for C++. But I could also test the resulting binaries, which give the same output of course, the Rust binary runs for 3.6 s whereas the C++ binary run for 8.7 s, on my machine.

ajdust · 2021-07-06T01:08:58Z

Nice Cryptjar! I hadn't thought about using Wrapping<i32>. Sure enough that brings the compile time back down just like using the -Cinline-threshold=1000 flag does.

To add to that, on my machine at least, compiling with -Cinline-threshold=1000 is a bit speedier than the Wrapped version, and Wrapped+-Cinline-threshold=1000 is the fastest.

Actually running the 300 functions test executable produced, the -Cinline-threshold=1000 version is nearly instant at 5ms, whereas the Wrapped executable and clang executable take times similar to what you got (3.5 / 8.5 seconds).

Stargateur · 2021-07-06T19:02:34Z

maybe compiler use bigint to compile time the result

nagisa · 2021-07-06T20:37:03Z

The code necessary to panic for each checked operation is going to be quite large, and introduce a fair number of additional basic blocks. Ideally compilation would still be linear wrt the amount of code (rather than superlinear).

kirill-dev-pro · 2023-06-27T15:43:10Z

So whats going on with this issue?

ajdust added the C-bug Category: This is a bug. label Jul 4, 2021

ChrisDenton added the needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. label Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling small program with -C opt-level=2 takes hours or never finishes without -Cinline-threshold flag #86870

Compiling small program with -C opt-level=2 takes hours or never finishes without -Cinline-threshold flag #86870

ajdust commented Jul 4, 2021 •

edited

klensy commented Jul 4, 2021

ajdust commented Jul 5, 2021

Cryptjar commented Jul 5, 2021 •

edited

ajdust commented Jul 6, 2021

Stargateur commented Jul 6, 2021

nagisa commented Jul 6, 2021

kirill-dev-pro commented Jun 27, 2023 •

edited

Compiling small program with -C opt-level=2 takes hours or never finishes without -Cinline-threshold flag #86870

Compiling small program with -C opt-level=2 takes hours or never finishes without -Cinline-threshold flag #86870

Comments

ajdust commented Jul 4, 2021 • edited

Background

Expectation vs. reality

Thank you

klensy commented Jul 4, 2021

ajdust commented Jul 5, 2021

Cryptjar commented Jul 5, 2021 • edited

ajdust commented Jul 6, 2021

Stargateur commented Jul 6, 2021

nagisa commented Jul 6, 2021

kirill-dev-pro commented Jun 27, 2023 • edited

ajdust commented Jul 4, 2021 •

edited

Cryptjar commented Jul 5, 2021 •

edited

kirill-dev-pro commented Jun 27, 2023 •

edited