Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944

cheesycod · 2024-03-23T12:56:54Z

I do not have a simple reproducible example since this bug goes over my head, perhaps someone else can find the bug but it seems pretty bad:

This file takes over 30 minutes of time to compile in rustc 1.77.0 (rustc 1.77.0 (aedd173 2024-03-17)) and rustc nightly (rustc 1.78.0-nightly (2bf78d1 2024-02-18)): https://github.com/Anti-Raid/splashtail/blob/74a33ddebe045d2a9fb4acc2d65ef97f75314e84/botv2/src/modules/moderation/cmd.rs [this specific commit triggered the issue of 30 minutes though this was always exponentially slow from the very start]

Don't know why it is, but this seemingly innocent file takes up a full 25-30 minutes of compile time and hogs up over 32GB of ram to the point that I needed to add 128GB of swap in order for Linux's Out Of Memory killer to not SIGKILL rust. In CI, this just leads to a SIGKILL quickly within 20 minutes of compiling. Actual compilation seems to go through without much of an error. Removing moderation from the module tree reduces compile times from 30 minutes to 5-6 minutes and RAM usage returns to normal

I expected to see this happen: Compile times within 5-6 minutes (which is normal right now, don't know if the 5 minute time comes from the same root cause as this bug though but it could be) and normal RAM usage (<32GB ram)

Instead, this happened:

warning: struct ConfigOption is never constructed
--> src/silverpelt/config_opt.rs:159:12
|
159 | pub struct ConfigOption {
| ^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default

warning: botv2 (bin "botv2") generated 1 warning
Finished release [optimized] target(s) in 30m 27s

[Ignore the warning, thats unrelated]

Linux kernel kills rust with a SIGKILL 9 randomly and Out Of Memory killer activates

Meta

Tested and reproduced across both Apple M1 and a Dedicated Server (Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz) on both rust stable and nightly using both LLVM and Cranelift as codegen backends. Switching linkers to clang+mold/ld.lld from defaults did not help. Rust versions are given below:

rustc 1.77.0 (aedd173 2024-03-17) and rustc 1.78.0-nightly (2bf78d1 2024-02-18)

No backtrace is available, the compiler does not crash, attempts to compile the code using --timing for timing information don't seem to even complete in any reasonable timespan. Issue occurs on both debug and release builds so this is not an issue of release build optimizations. Typical options to speed up compile times such as opt-level = 1 do not work in reducing neither the compile times nor the memory usage. Splitting up the code into multiple crates does not work as long as this file is in the module tree and hence compiled.

The text was updated successfully, but these errors were encountered:

workingjubilee · 2024-03-23T18:49:54Z

No backtrace is available, the compiler does not crash, attempts to compile the code using --timing for timing information don't seem to even complete in any reasonable timespan.

@cheesycod So this smells like heap accumulation instead of recursion, since recursion would stack overflow. Recursion would only be possible if the code inside rustc hitting this is an optimized tail call. Does that vibe with what you're experiencing?

I'm seeing some async fn, which are prone to runaway behaviors if a slight mistake in resource usage happens inside the state machine transformations inside the compiler, but the commit you mention mostly touches macro handling. Is there any chance you can create a slightly more minimized version of this? I don't mean that you have to eliminate all the dependencies (though that might be nice), I more mean:

Do all of these functions have to be present inside mod moderation; for this to get the runaway compilemem?
I'm also seeing what looks like a proc-macro that runs over all the arguments. Is that proc-macro optional to any extent? I am seeing some other often-naughty macros, but that's the only one that touches all the args.
Does the problem go away if you eliminate the "internal" async fn but keep the "external" async fn?

You seem to already have tried a few refactorings and know this code better than me.

kyleguarco · 2024-03-23T21:38:41Z

I tested this in a podman container I pulled and modified for this test. The behavior I got was close to what @cheesycod had. The postgres database in the container was seeded using the instructions specified in splashtail/README.

My aim was to build botv2. I did a checkout at the problem commit (74a33dde), then built it like so:

$ cd botv2; rustup override set nightly

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.79.0-nightly (85e449a32 2024-03-22)

$ PGUSER=kyleg RUSTFLAGS="-Zself-profile" cargo build --release

The build did hang for a while, but it did finish (likely because my system has ~64G of memory). Usage peaked around ~30G of memory.

I used the instructions provided at summarize/README.md to read the profiler data put together by the compiler. Here's what I got (click for gist).

Someone way smarter than me could probably understand what that means. Does taking the longest on mir_borrowck imply it's some issue between the borrow checker and sqlx?

Edit: The behavior I got wasn't exactly like the original behavior, but it was close.

workingjubilee · 2024-03-23T22:13:12Z

@kyleguarco Did you compare the behavior for you against the 1.76 stable behavior?

I can't actually reproduce a meaningful difference, honestly, in timing:

# 1.76: cargo clean && time cargo check
real    1m18.864s
user    4m9.207s
sys     0m22.751s

# 1.77: cargo clean && time cargo check

real    1m15.771s
user    4m5.529s
sys     0m22.082s

Glancing at memory usage via top doesn't suggest a huge difference either. If anything, 1.77 seems to be slightly leaner on that count.

# 1.76 ~peak usage
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
814954  jubilee   20   0    5.5g    3.8g   0.2g R  99.9   6.0   0:17.18  rustc

# 1.77 ~peak usage
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
807991  jubilee   20   0    5.6g    1.4g   0.2g R  99.9   2.3   0:26.18  rustc

However, while it is possible my machine is incredibly beastly, I set export SQLX_OFFLINE=true in order to get the code to build at all.

@cheesycod @kyleguarco Does this runaway behavior go away if you also set SQLX_OFFLINE=true?

kyleguarco · 2024-03-24T01:37:05Z

Before I say anything, I'd like to mention that I was looking at overall system memory usage in my last comment. I'll only mention rustc this time. Apologies.

Did you compare the behavior for you against the 1.76 stable behavior?

No, but I can! I'll run the commands you ran. Checked out on splashtail commit 74a33dde.

The test with previous stable (rust 1.76.0). rustc memory usage peaked ~16G.

$ rustc --version
rustc 1.76.0 (07dca489a 2024-02-04)
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	8m58.225s
user	12m41.254s
sys	0m39.664s

Here's current stable (rust 1.77.0). rustc memory usage peaked ~18G.

$ rustc --version
rustc 1.77.0 (aedd173a2 2024-03-17)
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	8m13.534s
user	11m53.724s
sys	0m41.228s

There doesn't seem to be a difference between the two releases. Your assessment with 1.77 being slightly better tracks with me too.

Does this runaway behavior go away if you also set SQLX_OFFLINE=true?

Here's a test on current stable (rust 1.77.0) with that option enabled, and with DATABASE_URL unset (I didn't know what SQLX_OFFLINE did, so I looked it up). rustc memory usage peaked ~16G

$ rustc --version
rustc 1.77.0 (aedd173a2 2024-03-17)
$ export SQLX_OFFLINE=true
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	8m16.077s
user	11m54.983s
sys	0m41.437s

So, no, SQLX_OFFLINE=true didn't seem to make a difference.

@workingjubilee Did you by chance build on splashtail's latest development (commit d4d3e2ec)? It compiles just fine on rust 1.77.0 for me. rustc memory usage peaked ~4G.

$ unset SQLX_OFFLINE
$ git reset --hard; git clean -f; cargo clean && time cargo check
...
real	1m11.929s
user	4m51.965s
sys	0m30.711s

I tried doing an actual cargo build on development to see if it was any different. The memory usage was slightly higher (and stressed all my cores for a minute). rustc memory usage peaked ~8G.

$ git reset --hard; git clean -f; cargo clean && time cargo build
...
real	3m53.000s
user	27m43.526s
sys	0m57.147s

Commit 74a33dde is the one giving the issue... A cargo build on that commit peaks memory usage ~25G and takes considerably longer.

$ git reset --hard; git clean -f; cargo clean && time cargo build
...
real	18m8.741s
user	25m0.119s
sys	1m22.509s

Is there a tool I could use to draw the usage over time?

workingjubilee · 2024-03-24T01:45:56Z

Oh whoops. You're correct, that's much more exciting!

workingjubilee · 2024-03-24T01:54:01Z

PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+     COMMAND
884877  jubilee   20   0    38.9g   23.2g  0.3g R  99.9  37.4   10:46.54  rustc

amazing.

kyleguarco · 2024-03-24T02:02:57Z

Looks like splashtail's had SQLX_OFFLINE enabled in botv2/.env since 7203d6702f55e3db5a4d19b9bf9e505cb6e55317 (after the problem commit). Whether it was set in the shell environment or not past that didn't matter.

kyleguarco · 2024-03-24T02:40:45Z

I did a git bisect between 74a33dde and d4d3e2ec to see which one fixed the behavior.

A git bisect good indicates a commit with the heap accumulation behavior, I couldn't do it the other way around.

git bisect log

git bisect start
# status: waiting for both good and bad commits
# good: [74a33ddebe045d2a9fb4acc2d65ef97f75314e84] add unban support
git bisect good 74a33ddebe045d2a9fb4acc2d65ef97f75314e84
# status: waiting for bad commit, 1 good commit known
# bad: [d4d3e2eca4c7151130569f7e1cc2a5693b2bd978] begin implementing audit logs
git bisect bad d4d3e2eca4c7151130569f7e1cc2a5693b2bd978
# good: [ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647] add cranelift
git bisect good ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647
# good: [ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647] add cranelift
git bisect good ac6cd4e46cd7e7491e5eab51e3566ec8daf2d647
# bad: [04f52e3d5faa12a0a34cdee18db3ec8177a3917f] add some extra perm checks
git bisect bad 04f52e3d5faa12a0a34cdee18db3ec8177a3917f
# bad: [e14655803f5f04a98e8f595d7743fe289f32faba] cleanup code
git bisect bad e14655803f5f04a98e8f595d7743fe289f32faba
# first bad commit: [e14655803f5f04a98e8f595d7743fe289f32faba] cleanup code

Commit e146558 seemed to have fixed the behavior. The diff is large though (as is the diff on 74a33dde). It'll take a bit of time to see exactly what line caused the behavior.

workingjubilee · 2024-03-24T02:48:10Z

Probably the factoring-out of the pruning-messages logic allowing less arguments to be passed and thus tracking logically-smaller function bodies during compilation.

cheesycod · 2024-03-24T03:32:56Z

Probably the factoring-out of the pruning-messages logic allowing less arguments to be passed and thus tracking logically-smaller function bodies during compilation.

This tracks with what I've been seeing. Factoring prune out was tbe workaround I attempted that did improve compile times and memory usage but there's still a smaller overhead even with prune function

cheesycod · 2024-03-24T04:18:57Z

I tried digging into this a bit more BTW (sorry I live in India and just woke up)

Removing the sqlx code did help slightly (from 33min to 17min, memory usage goes down from 33gb to 12-14gb) so sqlx is certainly influencing it.

workingjubilee · 2024-03-24T04:21:12Z

Out of curiosity, I tried disabling incremental. Predictably, this did not work.
I also tried removing most of the poise_macros::command "macro arguments", wondering if that would prompt the macro to do less work. It did not seem to have much effect.
On a hunch, I tried optimizing what seem to be the macro crates during the build! This mostly rewarded me with hitting the high memory usage part of the build faster:

+[profile.dev.package]
+poise_macros.opt-level = 3
+sqlx-macros.opt-level = 3

PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
942247  jubilee   20   0    22.3g   2.5g   0.2g R  99.9   4.1   1:28.81  rustc

# ultimately...
real    10m18.717s
user    13m26.572s
sys     0m35.709s

The performance, even on 1.76, it should be noted, is quite bad. I can build rustc within this kind of time frame.

real    10m52.876s
user    14m2.106s
sysv     0m35.329s

Note, however, that without such opts, I still get this kind of timing on 1.77:

real    10m15.497s
user    12m59.488s
sys     0m35.516s

Mind, however: I have enough RAM to keep everything "in memory". Everything else is probably a result of aggressive disk swapping being pretty slow, even with NVMe drives. So yeah, this doesn't seem to be a huge regression so much as the code in question just makes all versions of rustc choke on it, with perhaps a slight fluctuation upward managing to clear the swap/cache thrashing threshold. I'm not 100% on that conclusion because I don't have a firm graph/stats yet, though, but this leads me to ask:

@cheesycod What is the RAM available to the M1 and Xeon?

kyleguarco · 2024-03-24T04:27:31Z

So I tried splitting up 74a33dde into several smaller commits, and I was able to get the heap accumulation behavior just by adding these patches on top of commit 03ae98b:

A build on commit 03ae98b takes as long as expected, and memory peaks ~8G.

$ cargo clean; time cargo build
...
real	3m49.382s
user	9m54.360s
sys	0m42.650s

A build after this patch is applied takes about as long as the cargo check did earlier, with memory peaking ~25G

$ git apply diff_after_03ae98b.patch
$ cargo clean; time cargo build
...
real	9m37.130s
user	15m35.520s
sys	0m51.666s

@workingjubilee Can you confirm that these patches cause the behavior?

cheesycod · 2024-03-24T04:28:30Z

Out of curiosity, I tried disabling incremental. Predictably, this did not work.

I also tried removing most of the poise_macros::command "macro arguments", wondering if that would prompt the macro to do less work. It did not seem to have much effect.

On a hunch, I tried optimizing what seem to be the macro crates during the build! This mostly rewarded me with hitting the high memory usage part of the build faster:
+[profile.dev.package]
+poise_macros.opt-level = 3
+sqlx-macros.opt-level = 3
PID     USER      PR  NI    VIRT    RES    SHR  S  %CPU  %MEM   TIME+    COMMAND
942247  jubilee   20   0    22.3g   2.5g   0.2g R  99.9   4.1   1:28.81  rustc

# ultimately...
real    10m18.717s
user    13m26.572s
sys     0m35.709s
The performance, even on 1.76, it should be noted, is quite bad. I can build rustc within this kind of time frame.
real    10m52.876s
user    14m2.106s
sysv     0m35.329s
Note, however, that without such opts, I still get this kind of timing on 1.77:
real    10m15.497s
user    12m59.488s
sys     0m35.516s
Mind, however: I have enough RAM to keep everything "in memory". Everything else is probably a result of aggressive disk swapping being pretty slow, even with NVMe drives. So yeah, this doesn't seem to be a huge regression so much as the code in question just makes all versions of rustc choke on it, with perhaps a slight fluctuation upward managing to clear the swap/cache thrashing threshold. I'm not 100% on that conclusion because I don't have a firm graph/stats yet, though, but this leads me to ask:

@cheesycod What is the RAM available to the M1 and Xeon?

Apple M1 has 8GB ram and 256GB disk, Xeon has 32GB ram and 1TB ssd (but CPU is much slower)

workingjubilee · 2024-03-24T05:03:32Z

8GB RAM

oh yeah no wonder, that's OOM like 5 times over

@kyleguarco The root commit 03ae98b

     Finished dev [unoptimized + debuginfo] target(s) in 2m 04s

real    2m4.386s
user    5m1.034s
sys     0m24.272s

The diff applied:

     Finished dev [unoptimized + debuginfo] target(s) in 5m 54s

real    5m54.150s
user    8m48.596s
sys     0m27.101s

The rustc version:

$ rustc --version --verbose
rustc 1.77.0 (aedd173a2 2024-03-17)
binary: rustc
commit-hash: aedd173a2c086e558c2b66d3743b344f977621a7
commit-date: 2024-03-17
host: x86_64-unknown-linux-gnu
release: 1.77.0
LLVM version: 17.0.6

workingjubilee · 2024-03-24T07:12:52Z

There is one other set of places that get anywhere near as much action as do_mir_borrowck:

dropck_outlives
needs_drop_raw
mir_coroutine_witnesses

However, the overwhelming amount of it is just in do_mir_borrowck.

notriddle · 2024-03-26T18:27:46Z

If you're on Linux, and to find out what its peak memory usage is, you can cargo install cgmemtime and then use cgmemtime -- cargo WHATEVER.

workingjubilee · 2024-03-26T21:38:10Z

@notriddle oh I know what that number is actually.

this is my dhat with the g-max of ~16gb. there's a viewer at https://nnethercote.github.io/dh_view/dh_view.html
dhat file, it's basically a json

workingjubilee · 2024-03-26T23:21:14Z

Okay, so while do_mir_borrowck is where all the code is spending most of its runtime, its g-max is inside (and specifically because of) this backtrace:

0x5F_____: rustc_mir_transform::coroutine::locals_live_across_suspend_points (coroutine.rs:___)
0x5F9831B: {closure#0} (coroutine.rs:1579)
0x5F9831B: rustc_mir_transform::coroutine::mir_coroutine_witnesses (coroutine.rs:1558)
0x71FE656: {closure#0} (plumbing.rs:624)
0x71FE656: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::mir_coroutine_witnesses::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 8]>> (plumbing.rs:548)
0x6F05C6D: {closure#2} (plumbing.rs:620)
0x6F05C6D: <rustc_query_impl::query_impl::mir_coroutine_witnesses::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_span::def_id::DefId)>>::call_once (function.rs:250)

I omitted most of the numbers for the trace into locals_live_across_suspend_points because that has a few different points at which it allocates what becomes huge amounts. MaybeBorrowedLocals, MaybeRequiresStorage, MaybeStorageLive, and MaybeLiveLocals each account for about ~15% of the allocation, with compute_storage_conflicts being another 15%.

workingjubilee · 2024-03-26T23:45:43Z

I suspect we're adding some very nasty constant multipliers to what is necessarily a quadratic computation, making the quadratic case hit much faster and hurt a lot more, and that there might be a way to directly compute the quadratic part without first allocating enough space to fit an average Bethesda game.

cheesycod · 2024-04-14T05:05:58Z

I suspect we're adding some very nasty constant multipliers to what is necessarily a quadratic computation, making the quadratic case hit much faster and hurt a lot more, and that there might be a way to directly compute the quadratic part without first allocating enough space to fit an average Bethesda game.

Any progress on this?

cheesycod added the C-bug Category: This is a bug. label Mar 23, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 23, 2024

workingjubilee added the A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) label Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944

Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944

cheesycod commented Mar 23, 2024 •

edited

workingjubilee commented Mar 23, 2024

kyleguarco commented Mar 23, 2024 •

edited

workingjubilee commented Mar 23, 2024 •

edited

kyleguarco commented Mar 24, 2024 •

edited

workingjubilee commented Mar 24, 2024

workingjubilee commented Mar 24, 2024 •

edited

kyleguarco commented Mar 24, 2024 •

edited

kyleguarco commented Mar 24, 2024

workingjubilee commented Mar 24, 2024

cheesycod commented Mar 24, 2024

cheesycod commented Mar 24, 2024

workingjubilee commented Mar 24, 2024

kyleguarco commented Mar 24, 2024 •

edited

cheesycod commented Mar 24, 2024

workingjubilee commented Mar 24, 2024 •

edited

workingjubilee commented Mar 24, 2024

notriddle commented Mar 26, 2024 •

edited

workingjubilee commented Mar 26, 2024

workingjubilee commented Mar 26, 2024

workingjubilee commented Mar 26, 2024 •

edited

cheesycod commented Apr 14, 2024

Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944

Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944

Comments

cheesycod commented Mar 23, 2024 • edited

Meta

workingjubilee commented Mar 23, 2024

kyleguarco commented Mar 23, 2024 • edited

workingjubilee commented Mar 23, 2024 • edited

kyleguarco commented Mar 24, 2024 • edited

workingjubilee commented Mar 24, 2024

workingjubilee commented Mar 24, 2024 • edited

kyleguarco commented Mar 24, 2024 • edited

kyleguarco commented Mar 24, 2024

workingjubilee commented Mar 24, 2024

cheesycod commented Mar 24, 2024

cheesycod commented Mar 24, 2024

workingjubilee commented Mar 24, 2024

kyleguarco commented Mar 24, 2024 • edited

cheesycod commented Mar 24, 2024

workingjubilee commented Mar 24, 2024 • edited

workingjubilee commented Mar 24, 2024

notriddle commented Mar 26, 2024 • edited

workingjubilee commented Mar 26, 2024

workingjubilee commented Mar 26, 2024

workingjubilee commented Mar 26, 2024 • edited

cheesycod commented Apr 14, 2024

cheesycod commented Mar 23, 2024 •

edited

kyleguarco commented Mar 23, 2024 •

edited

workingjubilee commented Mar 23, 2024 •

edited

kyleguarco commented Mar 24, 2024 •

edited

workingjubilee commented Mar 24, 2024 •

edited

kyleguarco commented Mar 24, 2024 •

edited

kyleguarco commented Mar 24, 2024 •

edited

workingjubilee commented Mar 24, 2024 •

edited

notriddle commented Mar 26, 2024 •

edited

workingjubilee commented Mar 26, 2024 •

edited