Large memory usage and long time on compiling large number of println #86244

upsuper · 2021-06-12T11:11:40Z

Not sure whether it's a known bug.

I tried this code:

fn main() {
    println!("hello world!"); // repeat 16384 times
}

(To save you some time, you can download the file here directly rather than generating it yourself.)

I expected to see this happen: it should take reasonable amount of memory and reasonable time to compile.

Instead, this happened: on my machine, I saw it took up to 34GB memory and 2.5min of time just to compile this.

Meta

rustc --version --verbose:

rustc 1.52.1 (9bc8c42bb 2021-05-09)
binary: rustc
commit-hash: 9bc8c42bb2f19e745a63f3445f1ac248fb015e53
commit-date: 2021-05-09
host: x86_64-unknown-linux-gnu
release: 1.52.1
LLVM version: 12.0.0

The text was updated successfully, but these errors were encountered:

weiby3 · 2021-06-12T12:05:00Z

Another test, dbg! used more than 40G MEM.
test scripts:

echo 'fn main() {' > hw.rs
for i in {0..16384}
do
echo '    dbg!("Hello, world!");' >> hw.rs
done
echo '}' >> hw.rs
/usr/bin/time -f "%Uuser %Ssystem %Eelapsed %PCPU (%Xtext+%Ddata %Mmax)k\n%Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps\nUsed %MKbytes Max Resident MEM" rustc hw.rs

log:

1044.19user 24.17system 18:08.63elapsed 98%CPU (0text+0data 39245668max)k
129736inputs+0outputs (2010major+17465633minor)pagefaults 0swaps
Used 39245668Kbytes Max Resident MEM

$ grep "Out of memory" /var/log/messages
Jun 12 19:58:32 centos kernel: Out of memory: Kill process 31123 (rustc) score 937 or sacrifice child

oxalica · 2021-06-12T14:07:56Z

I simplify the code to tons of &"foo", which is unexpectedly costly in both memory and time. (But not sure if it's the root cause of original code. The original code expands to &["hello"], and also have lifetime-generic function calls)

Generator script:

#!/usr/bin/env bash
exec >test.rs
echo 'fn main() {'
for i in $(seq 16384); do
    echo '&"foo";'
done
echo '}'

It costs 2.6GB memory and 50s to compile. While adding one more reference (&&"foo"; per line), it costs 4.3GB memory and 86s to compile.

For comparation, &&1i32; cost 300MB memory and 3.4s to compile.

inquisitivecrystal · 2021-06-12T20:46:15Z

@rustbot label I-compilemem I-compiletime T-compiler

nbdd0121 · 2021-06-12T20:59:17Z

Profiling shows that 97% of the time is spent in mir_borrowck.

nbdd0121 · 2021-06-12T22:27:00Z

Did some further analysis with perf, and get this call stack:

  Children      Self  Comma  Shared Object                        Symbol
-   97.02%     0.00%  rustc  librustc_driver-3894de7c87fd4580.so  [.] rustc_mir::borrow_check::do_mir_borrowck
   - 97.02% rustc_mir::borrow_check::do_mir_borrowck
      - 96.54% rustc_mir::borrow_check::nll::compute_regions
         - 95.54% rustc_mir::borrow_check::type_check::type_check
            - 95.38% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_body
               - 95.35% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_rvalue
                  - 95.26% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_constant
                       77.10% rustc_index::bit_set::HybridBitSet<T>::insert
                     + 6.18% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_body

Majority of the time is spent in <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_constant, and a quite significant portion is in rustc_index::bit_set::HybridBitSet<T>::insert, presumably when updating liveness_constraints.

visit_constant is highly inlined, so I am not very confident about my analysis, but it seems to me that it's sanitize_promoted that uses majority of the time. ~~There is an for live_region in liveness_constraints.rows(), perhaps means that the algorithm is O(N^2) for the MCVE above (N constants * N regions)?~~ EDIT: looked again and it doesn't seem to be O(N^2) since swap_constraints means that each "promoted" has its own liveness_constraints.

I don't understand the code well enough to make any conclusions or make any changes, so perhaps someone more familiar with this code could give some hints.

memoryruins · 2021-06-16T01:39:51Z

#50994 uses 5,000 println!("hello world!");s to benchmark the compiler and compares debug/opt builds (though note the profiles are a few years old). Should the investigation here continue in that issue?

nbdd0121 · 2021-06-16T01:52:49Z

#50994 uses 5,000 println!("hello world!");s to benchmark the compiler and compares debug/opt builds (though note the profiles are a few years old). Should the investigation here continue in that issue?

The example by @oxalica shows that this issue has nothing to do with println! but with references.

#50994 is related to LLVM performance issue while this one is related to borrow checking. Note that when that issue is posted, Rust does not yet have stable NLL, so it's certainly a different issue from this one.

memoryruins · 2021-06-16T02:30:09Z

Both issues were opened with the premise of benchmarking the compiler with the same println! test case. The other issue is not only about LLVM. Its debug profiles suggest something similar to @oxalica 's findings and your perf profile. As noted, that's happening in a different part of rustc now, which is relevant information to post on the other issue, but it does not mean the two issues are different. Edited for clarity.

perf: Don't track specific live points for promoteds We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands. For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds). cc rust-lang#50994, rust-lang#86244

…dtwco perf: Don't track specific live points for promoteds We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands. For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds). cc rust-lang#50994, rust-lang#86244

perf: Don't track specific live points for promoteds We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands. For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds). cc rust-lang/rust#50994, rust-lang/rust#86244

upsuper added the C-bug Category: This is a bug. label Jun 12, 2021

rustbot added I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 12, 2021

nbdd0121 mentioned this issue Nov 1, 2021

Literally initializing [&str; N] (using proc macro) doesn't scale, rustc consumes huge amounts of time and RAM (stable and nightly) #90445

Closed

crlf0710 added the A-borrow-checker Area: The borrow checker label Nov 2, 2021

Mark-Simulacrum mentioned this issue Dec 2, 2021

Store liveness in interval sets for region inference #90637

Merged

Mark-Simulacrum mentioned this issue Jan 15, 2024

perf: Don't track specific live points for promoteds #120003

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large memory usage and long time on compiling large number of println #86244

Large memory usage and long time on compiling large number of println #86244

upsuper commented Jun 12, 2021 •

edited

weiby3 commented Jun 12, 2021

oxalica commented Jun 12, 2021

inquisitivecrystal commented Jun 12, 2021 •

edited

nbdd0121 commented Jun 12, 2021

nbdd0121 commented Jun 12, 2021 •

edited

memoryruins commented Jun 16, 2021

nbdd0121 commented Jun 16, 2021

memoryruins commented Jun 16, 2021 •

edited

Large memory usage and long time on compiling large number of println #86244

Large memory usage and long time on compiling large number of println #86244

Comments

upsuper commented Jun 12, 2021 • edited

Meta

weiby3 commented Jun 12, 2021

oxalica commented Jun 12, 2021

inquisitivecrystal commented Jun 12, 2021 • edited

nbdd0121 commented Jun 12, 2021

nbdd0121 commented Jun 12, 2021 • edited

memoryruins commented Jun 16, 2021

nbdd0121 commented Jun 16, 2021

memoryruins commented Jun 16, 2021 • edited

upsuper commented Jun 12, 2021 •

edited

inquisitivecrystal commented Jun 12, 2021 •

edited

nbdd0121 commented Jun 12, 2021 •

edited

memoryruins commented Jun 16, 2021 •

edited