Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling small program with -C opt-level=2 takes hours or never finishes without -Cinline-threshold flag #86870

Open
ajdust opened this issue Jul 4, 2021 · 7 comments
Labels
C-bug Category: This is a bug. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ajdust
Copy link

ajdust commented Jul 4, 2021

Hi!

Background

I generated a small identical program in Rust and C++. See for instance:

Benchmarking the compilation of this code (using hyperfine) I find:

# equivalent C++ code compiles in under 4 seconds to compile 500 functions with clang
test> hyperfine 'clang++ -O2 test_500.cpp -Wno-xor-used-as-pow' --warmup 1
Benchmark 1: clang++ -O2 test_500.cpp -Wno-xor-used-as-pow
  Time (mean ± σ):      3.224 s ±  0.066 s    [User: 2.8 ms, System: 7.3 ms]
  Range (min … max):    3.181 s …  3.348 s    10 runs

# trying to compile with equivalent Rust with 500 functions goes nowhere... cancelled after leaving this running for 12 hours
test> hyperfine 'rustc -C opt-level=2 test_500.rs' --min-runs 2
Benchmark 1: rustc -C opt-level=2 test_500.rs
 | Initial time measurement       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:00

# code compiles quickly with -Cinline-threshold passed, back to under 4 seconds like clang
test> hyperfine 'rustc -C opt-level=2 test_500.rs -Cinline-threshold=1000' --min-runs 2
Benchmark 1: rustc -C opt-level=2 test_500.rs -Cinline-threshold=1000
  Time (mean ± σ):      3.888 s ±  0.246 s    [User: 0.0 ms, System: 5.6 ms]
  Range (min … max):    3.714 s …  4.062 s    2 runs

# smaller sample for 20 functions takes a second to compile with clang
test> hyperfine 'clang++ -O2 test_20.cpp -Wno-xor-used-as-pow' --warmup 1
Benchmark 1: clang++ -O2 test_20.cpp -Wno-xor-used-as-pow
  Time (mean ± σ):      1.143 s ±  0.022 s    [User: 2.7 ms, System: 6.6 ms]
  Range (min … max):    1.122 s …  1.201 s    10 runs

# smaller sample with 20 functions takes nearly two minutes to compile with rustc
test> hyperfine 'rustc -C opt-level=2 test_20.rs' --min-runs 2
Benchmark 1: rustc -C opt-level=2 test_20.rs
  Time (mean ± σ):     100.216 s ±  0.008 s    [User: 0.0 ms, System: 11.6 ms]
  Range (min … max):   100.210 s … 100.221 s    2 runs

Expectation vs. reality

I would expect Rust to have similar optimizing performance to clang in this case. That is, compiling in seconds without an obscure compiler flag. Instead, it effectively fails to compile by never finishing. As a Rust newb this behavior is not very friendly. Is there a reasonable limit like comes into play with the -Cinline-threshold flag that should be included in the optimization flags (such opt-level=2) by default, or a way to otherwise stop the compiler from falling down an endless or futile pit?

Thank you

Thank you for you time, I hope this issue can improve Rust. There is the accompanying StackOverflow posting where I learned about the -Cinline-threshold flag and user bluss recommended a bug report here.

rustc versions used; I previously replicated this on Linux and OS X as well.

rustc 1.53.0 (53cb7b09b 2021-06-17)
binary: rustc
commit-hash: 53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b
commit-date: 2021-06-17
host: x86_64-pc-windows-msvc
release: 1.53.0
LLVM version: 12.0.1

rustc 1.55.0-nightly (71b8742bb 2021-07-03)
binary: rustc
commit-hash: 71b8742bbcbed2cd908dbc031d6552d8b528c037
commit-date: 2021-07-03
host: x86_64-pc-windows-msvc
release: 1.55.0-nightly
LLVM version: 12.0.1

Here is the 20 function Rust code for reference if the gists ever stops working:

//$ rustc -C opt-level=2 test_20.rs
//  takes over a minute to compile, rustc 1.55.0
//  see https://gist.github.com/johnsabr/5e92cab52ffab5ea2a52edbd47aa348a
#![allow(unused_parens)]

fn f0(p: i32) -> i32 {
    let x1: i32 = (p - ((((21 | 1) | p) ^ 84) & ((48 ^ (52 | (p & (2 ^ 61)))) - 67)));
    let x2: i32 = x1;
    let mut x3: i32 = 54;
    let x4: i32 = 75;
    let x5: i32 = (77 & 39);
    let x6: i32 = (x2 * x5);
    let x7: i32 = (88 * (8 + x1));
    x3 = (x3 + 60);
    ((((((((32 * p) & x1) ^ x2) - x3) ^ x4) & x5) | x6) | x7)
}

fn f1(p: i32) -> i32 {
    let mut x1: i32 = f0(78);
    x1 = (x1 ^ p);
    let mut x2: i32 = f0(x1);
    x2 = (x2 * 3);
    let x3: i32 = f0(x1);
    let x4: i32 = ((21 & (x3 - ((93 * (x3 - (f0(x3) - (x2 - (f0(x1) | 43))))) | (f0(p) - f0(x1))))) * 41);
    ((((((f0(p) | x2) ^ p) & x1) ^ x2) | x3) - x4)
}

fn f2(p: i32) -> i32 {
    let mut x1: i32 = f1(50);
    x1 = (x1 * p);
    x1 = (x1 | f0(p));
    let mut x2: i32 = f1(x1);
    x2 = (x2 | f1(x2));
    let mut x3: i32 = (24 * f0(x1));
    x2 = (x2 & f0(p));
    x3 = (x3 ^ x1);
    let x4: i32 = x1;
    (((((x4 ^ p) | x1) * x2) + x3) | x4)
}

fn f3(p: i32) -> i32 {
    let mut x1: i32 = f2(75);
    let x2: i32 = x1;
    x1 = (x1 & x2);
    let x3: i32 = f0(p);
    let x4: i32 = ((f1(x3) ^ f1(x2)) + 92);
    x1 = (x1 | (x2 ^ 94));
    x1 = (x1 * x2);
    let x5: i32 = (f0(x1) & (3 ^ (f0(x1) * f2(x4))));
    x1 = (x1 + x2);
    (((((((x1 * x5) * p) - x1) | x2) * x3) - x4) - x5)
}

fn f4(p: i32) -> i32 {
    let mut x1: i32 = f3(14);
    x1 = (x1 + f0(p));
    let mut x2: i32 = f1(x1);
    x1 = (x1 - 41);
    x2 = (x2 ^ 61);
    let x3: i32 = f2(p);
    x2 = (x2 ^ p);
    let x4: i32 = x2;
    x1 = (x1 - p);
    x1 = (x1 * x4);
    ((((((88 & 11) & p) - x1) * x2) ^ x3) | x4)
}

fn f5(p: i32) -> i32 {
    let mut x1: i32 = f4(50);
    x1 = (x1 ^ 13);
    ((35 + p) | x1)
}

fn f6(p: i32) -> i32 {
    let mut x1: i32 = f5(51);
    x1 = (x1 + 27);
    let x2: i32 = (p + (p | f1(x1)));
    x1 = (x1 + f0(x2));
    let x3: i32 = f0(x1);
    let mut x4: i32 = 48;
    x1 = (x1 ^ f0(p));
    x1 = (x1 & 26);
    x4 = (x4 * f1(x4));
    (((((99 - p) * x1) ^ x2) & x3) + x4)
}

fn f7(p: i32) -> i32 {
    let mut x1: i32 = f6(71);
    x1 = (x1 & 66);
    x1 = (x1 & p);
    let x2: i32 = 57;
    x1 = (x1 * 26);
    let x3: i32 = (21 & p);
    let x4: i32 = (f0(x1) & (f3(p) * f2(p)));
    let x5: i32 = f6(x3);
    ((((((x5 + p) | x1) + x2) - x3) & x4) * x5)
}

fn f8(p: i32) -> i32 {
    let mut x1: i32 = f7(57);
    x1 = (x1 & f5(p));
    x1 = (x1 ^ (x1 & f1(p)));
    let x2: i32 = 25;
    let x3: i32 = f5(x1);
    ((((x1 - p) * x1) & x2) ^ x3)
}

fn f9(p: i32) -> i32 {
    let mut x1: i32 = f8(23);
    x1 = (x1 | (((26 | f4(x1)) - f0(p)) | f8(p)));
    let x2: i32 = x1;
    let mut x3: i32 = 58;
    x3 = (x3 - p);
    let x4: i32 = f7(x1);
    let x5: i32 = f7(x2);
    let x6: i32 = (f7(x1) & 79);
    (((((((33 | p) - x1) + x2) + x3) * x4) ^ x5) + x6)
}

fn f10(p: i32) -> i32 {
    let mut x1: i32 = f9(75);
    x1 = (x1 | 37);
    (((f8(x1) + f3(x1)) | p) * x1)
}

fn f11(p: i32) -> i32 {
    let mut x1: i32 = f10(8);
    x1 = (x1 ^ f6(x1));
    let mut x2: i32 = p;
    x2 = (x2 ^ 84);
    let x3: i32 = (f5(p) ^ f5(p));
    x1 = (x1 * f5(p));
    x1 = (x1 | f1(x2));
    x1 = (x1 * f8(p));
    ((((((f0(x3) | f9(p)) - f4(x1)) + p) & x1) & x2) - x3)
}

fn f12(p: i32) -> i32 {
    let mut x1: i32 = f11(33);
    x1 = (x1 * 84);
    let mut x2: i32 = (67 - f0(p));
    x2 = (x2 | x1);
    x1 = (x1 - 67);
    x2 = (x2 - f6(p));
    (((p - p) * x1) | x2)
}

fn f13(p: i32) -> i32 {
    let mut x1: i32 = f12(90);
    x1 = (x1 + (f6(x1) - f4(p)));
    x1 = (x1 - 19);
    let x2: i32 = 92;
    let mut x3: i32 = f9(x1);
    let mut x4: i32 = x3;
    x4 = (x4 - (87 | f5(x3)));
    x3 = (x3 | 49);
    let x5: i32 = 25;
    let x6: i32 = x3;
    (((((((2 & p) - x1) ^ x2) ^ x3) ^ x4) | x5) | x6)
}

fn f14(p: i32) -> i32 {
    let mut x1: i32 = f13(66);
    let x2: i32 = f2(p);
    x1 = (x1 - 11);
    let mut x3: i32 = 69;
    x3 = (x3 * x2);
    let x4: i32 = 91;
    (((((19 * p) + x1) | x2) ^ x3) & x4)
}

fn f15(p: i32) -> i32 {
    let mut x1: i32 = f14(79);
    x1 = (x1 + (f8(p) & p));
    let x2: i32 = p;
    x1 = (x1 | ((f5(p) & x2) ^ x2));
    let mut x3: i32 = x1;
    x1 = (x1 - p);
    x3 = (x3 * p);
    ((((40 * p) ^ x1) + x2) + x3)
}

fn f16(p: i32) -> i32 {
    let x1: i32 = f15(77);
    let mut x2: i32 = 5;
    let mut x3: i32 = x1;
    let x4: i32 = p;
    x2 = (x2 + p);
    let x5: i32 = x4;
    x3 = (x3 | f9(x4));
    let x6: i32 = (68 ^ (61 ^ (24 * f14(x4))));
    (((((((88 + p) - x1) & x2) | x3) & x4) ^ x5) | x6)
}

fn f17(p: i32) -> i32 {
    let mut x1: i32 = f16(41);
    x1 = (x1 | 4);
    let mut x2: i32 = x1;
    x1 = (x1 | 52);
    x1 = (x1 & 49);
    x2 = (x2 & (f8(x2) ^ p));
    let mut x3: i32 = x2;
    x3 = (x3 ^ ((x1 ^ x2) + f15(x2)));
    let mut x4: i32 = (f13(x2) ^ 73);
    x4 = (x4 - f12(x1));
    (((((x3 - p) + x1) ^ x2) + x3) | x4)
}

fn f18(p: i32) -> i32 {
    let mut x1: i32 = f17(3);
    x1 = (x1 & (p - ((33 * (95 | 87)) | (9 - f1(x1)))));
    x1 = (x1 & (80 - f16(x1)));
    x1 = (x1 & p);
    x1 = (x1 + p);
    x1 = (x1 | (82 - ((81 ^ p) - 97)));
    ((20 - p) * x1)
}

fn f19(p: i32) -> i32 {
    let x1: i32 = f18(24);
    let x2: i32 = (p & p);
    let mut x3: i32 = 82;
    let x4: i32 = (4 + x1);
    x3 = (x3 | ((f10(p) + (f16(x3) - 34)) - f10(x1)));
    let x5: i32 = (x4 | (x1 * (((f16(x1) + f4(x4)) - 43) & f7(x3))));
    (((((((f14(x3) | f9(x5)) - p) & x1) * x2) & x3) * x4) + x5)
}

fn f20(p: i32) -> i32 {
    let x1: i32 = f19(78);
    let x2: i32 = 81;
    let x3: i32 = (x2 + (59 & x1));
    (((((f9(x3) ^ f11(x3)) * p) * x1) - x2) ^ x3)
}

fn main() {
    let mut x0: i32 = f20(65);
    x0 = (x0 * (53 + 37));
    let mut x1: i32 = (x0 - ((41 | ((f20(x0) * f9(x0)) + ((((f20(x0) + (77 + (f14(x0) ^ 60))) * 27) & 62) + x0))) & f20(x0)));
    let x2: i32 = f15(x1);
    x1 = (x1 | (x0 * (4 ^ 37)));
    let m: i32 = (((x2 | x0) | x1) | x2);
    println!("{}", m);

}
@ajdust ajdust added the C-bug Category: This is a bug. label Jul 4, 2021
@klensy
Copy link
Contributor

klensy commented Jul 4, 2021

With opt-level 0 or 1 it compiles instant both versions. With 0 level it reports

thread 'main' panicked at 'attempt to multiply with overflow', test_500.rs:100:7
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Is it expected for your code?

@ajdust
Copy link
Author

ajdust commented Jul 5, 2021

Is it expected for your code?

That's right klensy. This generated code was more about stressing the compiler than actually running the resulting program. With low optimization or debug compilation the code triggers 'attempt to multiply with overflow'. The higher opt-level such as 2 turns off that check I believe. The 20 function versions - test_20.rs which takes a minute+ to compile by default and test_20.cpp which takes a second - when compiled with optimization both print the value "-148260" as the result.

Note the 500 function (or more) versions are scaled up version intended to stress a compiler.

If a larger version that still outputs a result is useful, Here's a 300 function version where the executable produced by rustc and clang++ when compiled under optimization prints "-1247309868": test_rs_300.rs and test_cpp_300.

@Cryptjar
Copy link

Cryptjar commented Jul 5, 2021

You know, I though if the arithmetic should wrap around, why not use std::num::Wrapping, right?

So, I, actually, try it out and replaced all the i32 with Wrapping<i32> (and wrapped the literals too). And what should I say, it just compiles 3.2 s on my machine, whereas the C++ code with clang++ takes me 2.3 s, so fairly comparable.

But I don't really understand, why this makes such a big difference for the compiler.

Rust version with Wrapping:

$ hyperfine 'rustc -C opt-level=2 main_500.rs'
Benchmark #1: rustc -C opt-level=2 main_500.rs
  Time (mean ± σ):      3.277 s ±  0.044 s    [User: 3.272 s, System: 0.067 s]
  Range (min … max):    3.212 s …  3.370 s    10 runs

Original C++ version:

$ hyperfine 'clang++ -O2 test_500.cpp -Wno-xor-used-as-pow'
Benchmark #1: clang++ -O2 test_500.cpp -Wno-xor-used-as-pow
  Time (mean ± σ):      2.347 s ±  0.079 s    [User: 2.271 s, System: 0.053 s]
  Range (min … max):    2.249 s …  2.462 s    10 runs

Edit: I get similar results for the 300 functions test, that is 2.0 s for wrapping-Rust compilation and 1.6 s for C++. But I could also test the resulting binaries, which give the same output of course, the Rust binary runs for 3.6 s whereas the C++ binary run for 8.7 s, on my machine.

@ajdust
Copy link
Author

ajdust commented Jul 6, 2021

Nice Cryptjar! I hadn't thought about using Wrapping<i32>. Sure enough that brings the compile time back down just like using the -Cinline-threshold=1000 flag does.

To add to that, on my machine at least, compiling with -Cinline-threshold=1000 is a bit speedier than the Wrapped version, and Wrapped+-Cinline-threshold=1000 is the fastest.

Actually running the 300 functions test executable produced, the -Cinline-threshold=1000 version is nearly instant at 5ms, whereas the Wrapped executable and clang executable take times similar to what you got (3.5 / 8.5 seconds).

@Stargateur
Copy link
Contributor

maybe compiler use bigint to compile time the result

@nagisa
Copy link
Member

nagisa commented Jul 6, 2021

The code necessary to panic for each checked operation is going to be quite large, and introduce a fair number of additional basic blocks. Ideally compilation would still be linear wrt the amount of code (rather than superlinear).

@kirill-dev-pro
Copy link

kirill-dev-pro commented Jun 27, 2023

So whats going on with this issue?

@ChrisDenton ChrisDenton added the needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. label Jul 16, 2023
@fmease fmease added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. and removed needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. labels Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants