Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944

Open
cheesycod opened this issue Mar 23, 2024 · 21 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@cheesycod
Copy link

cheesycod commented Mar 23, 2024

I do not have a simple reproducible example since this bug goes over my head, perhaps someone else can find the bug but it seems pretty bad:

This file takes over 30 minutes of time to compile in rustc 1.77.0 (rustc 1.77.0 (aedd173 2024-03-17)) and rustc nightly (rustc 1.78.0-nightly (2bf78d1 2024-02-18)): https://github.com/Anti-Raid/splashtail/blob/74a33ddebe045d2a9fb4acc2d65ef97f75314e84/botv2/src/modules/moderation/cmd.rs [this specific commit triggered the issue of 30 minutes though this was always exponentially slow from the very start]

Don't know why it is, but this seemingly innocent file takes up a full 25-30 minutes of compile time and hogs up over 32GB of ram to the point that I needed to add 128GB of swap in order for Linux's Out Of Memory killer to not SIGKILL rust. In CI, this just leads to a SIGKILL quickly within 20 minutes of compiling. Actual compilation seems to go through without much of an error. Removing moderation from the module tree reduces compile times from 30 minutes to 5-6 minutes and RAM usage returns to normal

I expected to see this happen: Compile times within 5-6 minutes (which is normal right now, don't know if the 5 minute time comes from the same root cause as this bug though but it could be) and normal RAM usage (<32GB ram)

Instead, this happened:

warning: struct ConfigOption is never constructed
--> src/silverpelt/config_opt.rs:159:12
|
159 | pub struct ConfigOption {
| ^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default

warning: botv2 (bin "botv2") generated 1 warning
Finished release [optimized] target(s) in 30m 27s

[Ignore the warning, thats unrelated]

Linux kernel kills rust with a SIGKILL 9 randomly and Out Of Memory killer activates

Meta

Tested and reproduced across both Apple M1 and a Dedicated Server (Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz) on both rust stable and nightly using both LLVM and Cranelift as codegen backends. Switching linkers to clang+mold/ld.lld from defaults did not help. Rust versions are given below:

rustc 1.77.0 (aedd173 2024-03-17) and rustc 1.78.0-nightly (2bf78d1 2024-02-18)

No backtrace is available, the compiler does not crash, attempts to compile the code using --timing for timing information don't seem to even complete in any reasonable timespan. Issue occurs on both debug and release builds so this is not an issue of release build optimizations. Typical options to speed up compile times such as opt-level = 1 do not work in reducing neither the compile times nor the memory usage. Splitting up the code into multiple crates does not work as long as this file is in the module tree and hence compiled.

@cheesycod cheesycod added the C-bug Category: This is a bug. label Mar 23, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 23, 2024
@jieyouxu jieyouxu added I-slow Issue: Problems and improvements with respect to performance of generated code. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. labels Mar 23, 2024
@workingjubilee
Copy link
Contributor

No backtrace is available, the compiler does not crash, attempts to compile the code using --timing for timing information don't seem to even complete in any reasonable timespan.

@cheesycod So this smells like heap accumulation instead of recursion, since recursion would stack overflow. Recursion would only be possible if the code inside rustc hitting this is an optimized tail call. Does that vibe with what you're experiencing?

I'm seeing some async fn, which are prone to runaway behaviors if a slight mistake in resource usage happens inside the state machine transformations inside the compiler, but the commit you mention mostly touches macro handling. Is there any chance you can create a slightly more minimized version of this? I don't mean that you have to eliminate all the dependencies (though that might be nice), I more mean:

  • Do all of these functions have to be present inside mod moderation; for this to get the runaway compilemem?
  • I'm also seeing what looks like a proc-macro that runs over all the arguments. Is that proc-macro optional to any extent? I am seeing some other often-naughty macros, but that's the only one that touches all the args.
  • Does the problem go away if you eliminate the "internal" async fn but keep the "external" async fn?

You seem to already have tried a few refactorings and know this code better than me.

@workingjubilee workingjubilee added the A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) label Mar 23, 2024
@kyleguarco
Copy link

kyleguarco commented Mar 23, 2024

I tested this in a podman container I pulled and modified for this test. The behavior I got was close to what @cheesycod had. The postgres database in the container was seeded using the instructions specified in splashtail/README.

My aim was to build botv2. I did a checkout at the problem commit (74a33dde), then built it like so:

$ cd botv2; rustup override set nightly

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.79.0-nightly (85e449a32 2024-03-22)

$ PGUSER=kyleg RUSTFLAGS="-Zself-profile" cargo build --release

The build did hang for a while, but it did finish (likely because my system has ~64G of memory). Usage peaked around ~30G of memory.

I used the instructions provided at summarize/README.md to read the profiler data put together by the compiler. Here's what I got (click for gist).

Someone way smarter than me could probably understand what that means. Does taking the longest on mir_borrowck imply it's some issue between the borrow checker and sqlx?

Edit: The behavior I got wasn't exactly like the original behavior, but it was close.

@workingjubilee
Copy link
Contributor

workingjubilee commented Mar 23, 2024

@kyleguarco Did you compare the behavior for you against the 1.76 stable behavior?

I can't actually reproduce a meaningful difference, honestly, in timing:

# 1.76: cargo clean && time cargo check
real    1m18.864s
user    4m9.207s
sys     0m22.751s

# 1.77: cargo clean && time cargo check

real    1m15.771s
user    4m5.529s
sys     0m22.082s

Glancing at memory usage via top doesn't suggest a huge difference either. If anything, 1.77 seems to be slightly leaner on that count.

# 1.76 ~peak usage
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
814954  jubilee   20   0    5.5g    3.8g   0.2g R  99.9   6.0   0:17.18  rustc
# 1.77 ~peak usage
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
807991  jubilee   20   0    5.6g    1.4g   0.2g R  99.9   2.3   0:26.18  rustc

However, while it is possible my machine is incredibly beastly, I set export SQLX_OFFLINE=true in order to get the code to build at all.

@cheesycod @kyleguarco Does this runaway behavior go away if you also set SQLX_OFFLINE=true?

@kyleguarco
Copy link

kyleguarco commented Mar 24, 2024

Before I say anything, I'd like to mention that I was looking at overall system memory usage in my last comment. I'll only mention rustc this time. Apologies.


Did you compare the behavior for you against the 1.76 stable behavior?

No, but I can! I'll run the commands you ran. Checked out on splashtail commit 74a33dde.

The test with previous stable (rust 1.76.0). rustc memory usage peaked ~16G.

$ rustc --version
rustc 1.76.0 (07dca489a 2024-02-04)
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	8m58.225s
user	12m41.254s
sys	0m39.664s

Here's current stable (rust 1.77.0). rustc memory usage peaked ~18G.

$ rustc --version
rustc 1.77.0 (aedd173a2 2024-03-17)
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	8m13.534s
user	11m53.724s
sys	0m41.228s

There doesn't seem to be a difference between the two releases. Your assessment with 1.77 being slightly better tracks with me too.


Does this runaway behavior go away if you also set SQLX_OFFLINE=true?

Here's a test on current stable (rust 1.77.0) with that option enabled, and with DATABASE_URL unset (I didn't know what SQLX_OFFLINE did, so I looked it up). rustc memory usage peaked ~16G

$ rustc --version
rustc 1.77.0 (aedd173a2 2024-03-17)
$ export SQLX_OFFLINE=true
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	8m16.077s
user	11m54.983s
sys	0m41.437s

So, no, SQLX_OFFLINE=true didn't seem to make a difference.


@workingjubilee Did you by chance build on splashtail's latest development (commit d4d3e2ec)? It compiles just fine on rust 1.77.0 for me. rustc memory usage peaked ~4G.

$ unset SQLX_OFFLINE
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	1m11.929s
user	4m51.965s
sys	0m30.711s

I tried doing an actual cargo build on development to see if it was any different. The memory usage was slightly higher (and stressed all my cores for a minute). rustc memory usage peaked ~8G.

$ git reset --hard; git clean -f; cargo clean && time cargo build
...
real	3m53.000s
user	27m43.526s
sys	0m57.147s

Commit 74a33dde is the one giving the issue... A cargo build on that commit peaks memory usage ~25G and takes considerably longer.

$ git reset --hard; git clean -f; cargo clean && time cargo build
...
real	18m8.741s
user	25m0.119s
sys	1m22.509s

Is there a tool I could use to draw the usage over time?

@workingjubilee
Copy link
Contributor

Oh whoops. You're correct, that's much more exciting!

@workingjubilee workingjubilee added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. labels Mar 24, 2024
@workingjubilee
Copy link
Contributor

workingjubilee commented Mar 24, 2024

PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+     COMMAND
884877  jubilee   20   0    38.9g   23.2g  0.3g R  99.9  37.4   10:46.54  rustc

amazing.

@kyleguarco
Copy link

kyleguarco commented Mar 24, 2024

Looks like splashtail's had SQLX_OFFLINE enabled in botv2/.env since 7203d6702f55e3db5a4d19b9bf9e505cb6e55317 (after the problem commit). Whether it was set in the shell environment or not past that didn't matter.

@kyleguarco
Copy link

I did a git bisect between 74a33dde and d4d3e2ec to see which one fixed the behavior.

A git bisect good indicates a commit with the heap accumulation behavior, I couldn't do it the other way around.

git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [74a33ddebe045d2a9fb4acc2d65ef97f75314e84] add unban support
git bisect good 74a33ddebe045d2a9fb4acc2d65ef97f75314e84
# status: waiting for bad commit, 1 good commit known
# bad: [d4d3e2eca4c7151130569f7e1cc2a5693b2bd978] begin implementing audit logs
git bisect bad d4d3e2eca4c7151130569f7e1cc2a5693b2bd978
# good: [ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647] add cranelift
git bisect good ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647
# good: [ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647] add cranelift
git bisect good ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647
# bad: [04f52e3d5faa12a0a34cdee18db3ec8177a3917f] add some extra perm checks
git bisect bad 04f52e3d5faa12a0a34cdee18db3ec8177a3917f
# bad: [e14655803f5f04a98e8f595d7743fe289f32faba] cleanup code
git bisect bad e14655803f5f04a98e8f595d7743fe289f32faba
# first bad commit: [e14655803f5f04a98e8f595d7743fe289f32faba] cleanup code

Commit e146558 seemed to have fixed the behavior. The diff is large though (as is the diff on 74a33dde). It'll take a bit of time to see exactly what line caused the behavior.

@workingjubilee
Copy link
Contributor

Probably the factoring-out of the pruning-messages logic allowing less arguments to be passed and thus tracking logically-smaller function bodies during compilation.

@cheesycod
Copy link
Author

Probably the factoring-out of the pruning-messages logic allowing less arguments to be passed and thus tracking logically-smaller function bodies during compilation.

This tracks with what I've been seeing. Factoring prune out was tbe workaround I attempted that did improve compile times and memory usage but there's still a smaller overhead even with prune function

@cheesycod
Copy link
Author

I tried digging into this a bit more BTW (sorry I live in India and just woke up)

Removing the sqlx code did help slightly (from 33min to 17min, memory usage goes down from 33gb to 12-14gb) so sqlx is certainly influencing it.

@workingjubilee
Copy link
Contributor

  • Out of curiosity, I tried disabling incremental. Predictably, this did not work.
  • I also tried removing most of the poise_macros::command "macro arguments", wondering if that would prompt the macro to do less work. It did not seem to have much effect.
  • On a hunch, I tried optimizing what seem to be the macro crates during the build! This mostly rewarded me with hitting the high memory usage part of the build faster:
+[profile.dev.package]
+poise_macros.opt-level = 3
+sqlx-macros.opt-level = 3
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
942247  jubilee   20   0    22.3g   2.5g   0.2g R  99.9   4.1   1:28.81  rustc

# ultimately...
real    10m18.717s
user    13m26.572s
sys     0m35.709s

The performance, even on 1.76, it should be noted, is quite bad. I can build rustc within this kind of time frame.

real    10m52.876s
user    14m2.106s
sysv     0m35.329s

Note, however, that without such opts, I still get this kind of timing on 1.77:

real    10m15.497s
user    12m59.488s
sys     0m35.516s

Mind, however: I have enough RAM to keep everything "in memory". Everything else is probably a result of aggressive disk swapping being pretty slow, even with NVMe drives. So yeah, this doesn't seem to be a huge regression so much as the code in question just makes all versions of rustc choke on it, with perhaps a slight fluctuation upward managing to clear the swap/cache thrashing threshold. I'm not 100% on that conclusion because I don't have a firm graph/stats yet, though, but this leads me to ask:

@cheesycod What is the RAM available to the M1 and Xeon?

@kyleguarco
Copy link

kyleguarco commented Mar 24, 2024

So I tried splitting up 74a33dde into several smaller commits, and I was able to get the heap accumulation behavior just by adding these patches on top of commit 03ae98b:

A build on commit 03ae98b takes as long as expected, and memory peaks ~8G.

$ cargo clean; time cargo build
...
real	3m49.382s
user	9m54.360s
sys	0m42.650s

A build after this patch is applied takes about as long as the cargo check did earlier, with memory peaking ~25G

$ git apply diff_after_03ae98b.patch
$ cargo clean; time cargo build
...
real	9m37.130s
user	15m35.520s
sys	0m51.666s

@workingjubilee Can you confirm that these patches cause the behavior?

@cheesycod
Copy link
Author

  • Out of curiosity, I tried disabling incremental. Predictably, this did not work.
  • I also tried removing most of the poise_macros::command "macro arguments", wondering if that would prompt the macro to do less work. It did not seem to have much effect.
  • On a hunch, I tried optimizing what seem to be the macro crates during the build! This mostly rewarded me with hitting the high memory usage part of the build faster:
+[profile.dev.package]
+poise_macros.opt-level = 3
+sqlx-macros.opt-level = 3
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
942247  jubilee   20   0    22.3g   2.5g   0.2g R  99.9   4.1   1:28.81  rustc

# ultimately...
real    10m18.717s
user    13m26.572s
sys     0m35.709s

The performance, even on 1.76, it should be noted, is quite bad. I can build rustc within this kind of time frame.

real    10m52.876s
user    14m2.106s
sysv     0m35.329s

Note, however, that without such opts, I still get this kind of timing on 1.77:

real    10m15.497s
user    12m59.488s
sys     0m35.516s

Mind, however: I have enough RAM to keep everything "in memory". Everything else is probably a result of aggressive disk swapping being pretty slow, even with NVMe drives. So yeah, this doesn't seem to be a huge regression so much as the code in question just makes all versions of rustc choke on it, with perhaps a slight fluctuation upward managing to clear the swap/cache thrashing threshold. I'm not 100% on that conclusion because I don't have a firm graph/stats yet, though, but this leads me to ask:

@cheesycod What is the RAM available to the M1 and Xeon?

Apple M1 has 8GB ram and 256GB disk, Xeon has 32GB ram and 1TB ssd (but CPU is much slower)

@workingjubilee
Copy link
Contributor

workingjubilee commented Mar 24, 2024

8GB RAM

oh yeah no wonder, that's OOM like 5 times over

@kyleguarco The root commit 03ae98b

     Finished dev [unoptimized + debuginfo] target(s) in 2m 04s

real    2m4.386s
user    5m1.034s
sys     0m24.272s

The diff applied:

     Finished dev [unoptimized + debuginfo] target(s) in 5m 54s

real    5m54.150s
user    8m48.596s
sys     0m27.101s

The rustc version:

$ rustc --version --verbose
rustc 1.77.0 (aedd173a2 2024-03-17)
binary: rustc
commit-hash: aedd173a2c086e558c2b66d3743b344f977621a7
commit-date: 2024-03-17
host: x86_64-unknown-linux-gnu
release: 1.77.0
LLVM version: 17.0.6

@workingjubilee
Copy link
Contributor

There is one other set of places that get anywhere near as much action as do_mir_borrowck:

  • dropck_outlives
  • needs_drop_raw
  • mir_coroutine_witnesses

However, the overwhelming amount of it is just in do_mir_borrowck.

@notriddle
Copy link
Contributor

notriddle commented Mar 26, 2024

If you're on Linux, and to find out what its peak memory usage is, you can cargo install cgmemtime and then use cgmemtime -- cargo WHATEVER.

@workingjubilee
Copy link
Contributor

@notriddle oh I know what that number is actually.

this is my dhat with the g-max of ~16gb. there's a viewer at https://nnethercote.github.io/dh_view/dh_view.html
dhat file, it's basically a json

@workingjubilee
Copy link
Contributor

Okay, so while do_mir_borrowck is where all the code is spending most of its runtime, its g-max is inside (and specifically because of) this backtrace:

0x5F_____: rustc_mir_transform::coroutine::locals_live_across_suspend_points (coroutine.rs:___)
0x5F9831B: {closure#0} (coroutine.rs:1579)
0x5F9831B: rustc_mir_transform::coroutine::mir_coroutine_witnesses (coroutine.rs:1558)
0x71FE656: {closure#0} (plumbing.rs:624)
0x71FE656: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::mir_coroutine_witnesses::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 8]>> (plumbing.rs:548)
0x6F05C6D: {closure#2} (plumbing.rs:620)
0x6F05C6D: <rustc_query_impl::query_impl::mir_coroutine_witnesses::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_span::def_id::DefId)>>::call_once (function.rs:250)

I omitted most of the numbers for the trace into locals_live_across_suspend_points because that has a few different points at which it allocates what becomes huge amounts. MaybeBorrowedLocals, MaybeRequiresStorage, MaybeStorageLive, and MaybeLiveLocals each account for about ~15% of the allocation, with compute_storage_conflicts being another 15%.

@workingjubilee
Copy link
Contributor

workingjubilee commented Mar 26, 2024

I suspect we're adding some very nasty constant multipliers to what is necessarily a quadratic computation, making the quadratic case hit much faster and hurt a lot more, and that there might be a way to directly compute the quadratic part without first allocating enough space to fit an average Bethesda game.

@cheesycod
Copy link
Author

I suspect we're adding some very nasty constant multipliers to what is necessarily a quadratic computation, making the quadratic case hit much faster and hurt a lot more, and that there might be a way to directly compute the quadratic part without first allocating enough space to fit an average Bethesda game.

Any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

6 participants