Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory usage and long time on compiling large number of println #86244

Open
upsuper opened this issue Jun 12, 2021 · 8 comments
Open

Large memory usage and long time on compiling large number of println #86244

upsuper opened this issue Jun 12, 2021 · 8 comments
Labels
A-borrow-checker Area: The borrow checker C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@upsuper
Copy link
Contributor

upsuper commented Jun 12, 2021

Not sure whether it's a known bug.

I tried this code:

fn main() {
    println!("hello world!"); // repeat 16384 times
}

(To save you some time, you can download the file here directly rather than generating it yourself.)

I expected to see this happen: it should take reasonable amount of memory and reasonable time to compile.

Instead, this happened: on my machine, I saw it took up to 34GB memory and 2.5min of time just to compile this.

Meta

rustc --version --verbose:

rustc 1.52.1 (9bc8c42bb 2021-05-09)
binary: rustc
commit-hash: 9bc8c42bb2f19e745a63f3445f1ac248fb015e53
commit-date: 2021-05-09
host: x86_64-unknown-linux-gnu
release: 1.52.1
LLVM version: 12.0.0
@upsuper upsuper added the C-bug Category: This is a bug. label Jun 12, 2021
@weiby3
Copy link

weiby3 commented Jun 12, 2021

Another test, dbg! used more than 40G MEM.
test scripts:

echo 'fn main() {' > hw.rs
for i in {0..16384}
do
echo '    dbg!("Hello, world!");' >> hw.rs
done
echo '}' >> hw.rs
/usr/bin/time -f "%Uuser %Ssystem %Eelapsed %PCPU (%Xtext+%Ddata %Mmax)k\n%Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps\nUsed %MKbytes Max Resident MEM" rustc hw.rs

log:

1044.19user 24.17system 18:08.63elapsed 98%CPU (0text+0data 39245668max)k
129736inputs+0outputs (2010major+17465633minor)pagefaults 0swaps
Used 39245668Kbytes Max Resident MEM

$ grep "Out of memory" /var/log/messages
Jun 12 19:58:32 centos kernel: Out of memory: Kill process 31123 (rustc) score 937 or sacrifice child

@oxalica
Copy link
Contributor

oxalica commented Jun 12, 2021

I simplify the code to tons of &"foo", which is unexpectedly costly in both memory and time. (But not sure if it's the root cause of original code. The original code expands to &["hello"], and also have lifetime-generic function calls)

Generator script:

#!/usr/bin/env bash
exec >test.rs
echo 'fn main() {'
for i in $(seq 16384); do
    echo '&"foo";'
done
echo '}'

It costs 2.6GB memory and 50s to compile. While adding one more reference (&&"foo"; per line), it costs 4.3GB memory and 86s to compile.

For comparation, &&1i32; cost 300MB memory and 3.4s to compile.

@inquisitivecrystal
Copy link
Contributor

inquisitivecrystal commented Jun 12, 2021

@rustbot label I-compilemem I-compiletime T-compiler

@rustbot rustbot added I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 12, 2021
@nbdd0121
Copy link
Contributor

Profiling shows that 97% of the time is spent in mir_borrowck.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Jun 12, 2021

Did some further analysis with perf, and get this call stack:

  Children      Self  Comma  Shared Object                        Symbol
-   97.02%     0.00%  rustc  librustc_driver-3894de7c87fd4580.so  [.] rustc_mir::borrow_check::do_mir_borrowck
   - 97.02% rustc_mir::borrow_check::do_mir_borrowck
      - 96.54% rustc_mir::borrow_check::nll::compute_regions
         - 95.54% rustc_mir::borrow_check::type_check::type_check
            - 95.38% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_body
               - 95.35% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_rvalue
                  - 95.26% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_constant
                       77.10% rustc_index::bit_set::HybridBitSet<T>::insert
                     + 6.18% <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_body

Majority of the time is spent in <rustc_mir::borrow_check::type_check::TypeVerifier as rustc_middle::mir::visit::Visitor>::visit_constant, and a quite significant portion is in rustc_index::bit_set::HybridBitSet<T>::insert, presumably when updating liveness_constraints.

visit_constant is highly inlined, so I am not very confident about my analysis, but it seems to me that it's sanitize_promoted that uses majority of the time. There is an for live_region in liveness_constraints.rows(), perhaps means that the algorithm is O(N^2) for the MCVE above (N constants * N regions)? EDIT: looked again and it doesn't seem to be O(N^2) since swap_constraints means that each "promoted" has its own liveness_constraints.

I don't understand the code well enough to make any conclusions or make any changes, so perhaps someone more familiar with this code could give some hints.

@memoryruins
Copy link
Contributor

#50994 uses 5,000 println!("hello world!");s to benchmark the compiler and compares debug/opt builds (though note the profiles are a few years old). Should the investigation here continue in that issue?

@nbdd0121
Copy link
Contributor

#50994 uses 5,000 println!("hello world!");s to benchmark the compiler and compares debug/opt builds (though note the profiles are a few years old). Should the investigation here continue in that issue?

The example by @oxalica shows that this issue has nothing to do with println! but with references.

#50994 is related to LLVM performance issue while this one is related to borrow checking. Note that when that issue is posted, Rust does not yet have stable NLL, so it's certainly a different issue from this one.

@memoryruins
Copy link
Contributor

memoryruins commented Jun 16, 2021

Both issues were opened with the premise of benchmarking the compiler with the same println! test case. The other issue is not only about LLVM. Its debug profiles suggest something similar to @oxalica 's findings and your perf profile. As noted, that's happening in a different part of rustc now, which is relevant information to post on the other issue, but it does not mean the two issues are different. Edited for clarity.

@crlf0710 crlf0710 added the A-borrow-checker Area: The borrow checker label Nov 2, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 16, 2024
perf: Don't track specific live points for promoteds

We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands.

For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds).

cc rust-lang#50994, rust-lang#86244
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 20, 2024
…dtwco

perf: Don't track specific live points for promoteds

We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands.

For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds).

cc rust-lang#50994, rust-lang#86244
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Jan 21, 2024
perf: Don't track specific live points for promoteds

We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands.

For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds).

cc rust-lang/rust#50994, rust-lang/rust#86244
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-borrow-checker Area: The borrow checker C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants