New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler takes 30 minutes + large (32GB+) amounts of ram to compile code with large amount of arguments + function-within-function #122944
Comments
@cheesycod So this smells like heap accumulation instead of recursion, since recursion would stack overflow. Recursion would only be possible if the code inside rustc hitting this is an optimized tail call. Does that vibe with what you're experiencing? I'm seeing some async fn, which are prone to runaway behaviors if a slight mistake in resource usage happens inside the state machine transformations inside the compiler, but the commit you mention mostly touches macro handling. Is there any chance you can create a slightly more minimized version of this? I don't mean that you have to eliminate all the dependencies (though that might be nice), I more mean:
You seem to already have tried a few refactorings and know this code better than me. |
I tested this in a podman container I pulled and modified for this test. The behavior I got was close to what @cheesycod had. The My aim was to build
The build did hang for a while, but it did finish (likely because my system has ~64G of memory). Usage peaked around ~30G of memory. I used the instructions provided at summarize/README.md to read the profiler data put together by the compiler. Here's what I got (click for gist). Someone way smarter than me could probably understand what that means. Does taking the longest on Edit: The behavior I got wasn't exactly like the original behavior, but it was close. |
@kyleguarco Did you compare the behavior for you against the 1.76 stable behavior? I can't actually reproduce a meaningful difference, honestly, in timing:
Glancing at memory usage via
However, while it is possible my machine is incredibly beastly, I set @cheesycod @kyleguarco Does this runaway behavior go away if you also set |
Before I say anything, I'd like to mention that I was looking at overall system memory usage in my last comment. I'll only mention
No, but I can! I'll run the commands you ran. Checked out on The test with previous stable (rust 1.76.0).
Here's current stable (rust 1.77.0).
There doesn't seem to be a difference between the two releases. Your assessment with 1.77 being slightly better tracks with me too.
Here's a test on current stable (rust 1.77.0) with that option enabled, and with
So, no, @workingjubilee Did you by chance build on
I tried doing an actual
Commit 74a33dde is the one giving the issue... A
Is there a tool I could use to draw the usage over time? |
Oh whoops. You're correct, that's much more exciting! |
amazing. |
Looks like |
I did a A
|
Probably the factoring-out of the pruning-messages logic allowing less arguments to be passed and thus tracking logically-smaller function bodies during compilation. |
This tracks with what I've been seeing. Factoring prune out was tbe workaround I attempted that did improve compile times and memory usage but there's still a smaller overhead even with prune function |
I tried digging into this a bit more BTW (sorry I live in India and just woke up) Removing the sqlx code did help slightly (from 33min to 17min, memory usage goes down from 33gb to 12-14gb) so sqlx is certainly influencing it. |
+[profile.dev.package]
+poise_macros.opt-level = 3
+sqlx-macros.opt-level = 3
The performance, even on 1.76, it should be noted, is quite bad. I can build rustc within this kind of time frame.
Note, however, that without such opts, I still get this kind of timing on 1.77:
Mind, however: I have enough RAM to keep everything "in memory". Everything else is probably a result of aggressive disk swapping being pretty slow, even with NVMe drives. So yeah, this doesn't seem to be a huge regression so much as the code in question just makes all versions of rustc choke on it, with perhaps a slight fluctuation upward managing to clear the swap/cache thrashing threshold. I'm not 100% on that conclusion because I don't have a firm graph/stats yet, though, but this leads me to ask: @cheesycod What is the RAM available to the M1 and Xeon? |
So I tried splitting up 74a33dde into several smaller commits, and I was able to get the heap accumulation behavior just by adding these patches on top of commit 03ae98b: A build on commit 03ae98b takes as long as expected, and memory peaks ~8G.
A build after this patch is applied takes about as long as the
@workingjubilee Can you confirm that these patches cause the behavior? |
Apple M1 has 8GB ram and 256GB disk, Xeon has 32GB ram and 1TB ssd (but CPU is much slower) |
oh yeah no wonder, that's OOM like 5 times over @kyleguarco The root commit 03ae98b
The diff applied:
The rustc version:
|
There is one other set of places that get anywhere near as much action as
However, the overwhelming amount of it is just in |
If you're on Linux, and to find out what its peak memory usage is, you can |
@notriddle oh I know what that number is actually. this is my dhat with the g-max of ~16gb. there's a viewer at https://nnethercote.github.io/dh_view/dh_view.html |
Okay, so while
I omitted most of the numbers for the trace into |
I suspect we're adding some very nasty constant multipliers to what is necessarily a quadratic computation, making the quadratic case hit much faster and hurt a lot more, and that there might be a way to directly compute the quadratic part without first allocating enough space to fit an average Bethesda game. |
Any progress on this? |
I do not have a simple reproducible example since this bug goes over my head, perhaps someone else can find the bug but it seems pretty bad:
This file takes over 30 minutes of time to compile in rustc 1.77.0 (rustc 1.77.0 (aedd173 2024-03-17)) and rustc nightly (rustc 1.78.0-nightly (2bf78d1 2024-02-18)): https://github.com/Anti-Raid/splashtail/blob/74a33ddebe045d2a9fb4acc2d65ef97f75314e84/botv2/src/modules/moderation/cmd.rs [this specific commit triggered the issue of 30 minutes though this was always exponentially slow from the very start]
Don't know why it is, but this seemingly innocent file takes up a full 25-30 minutes of compile time and hogs up over 32GB of ram to the point that I needed to add 128GB of swap in order for Linux's Out Of Memory killer to not SIGKILL rust. In CI, this just leads to a SIGKILL quickly within 20 minutes of compiling. Actual compilation seems to go through without much of an error. Removing
moderation
from the module tree reduces compile times from 30 minutes to 5-6 minutes and RAM usage returns to normalI expected to see this happen: Compile times within 5-6 minutes (which is normal right now, don't know if the 5 minute time comes from the same root cause as this bug though but it could be) and normal RAM usage (<32GB ram)
Instead, this happened:
warning: struct
ConfigOption
is never constructed--> src/silverpelt/config_opt.rs:159:12
|
159 | pub struct ConfigOption {
| ^^^^^^^^^^^^
|
= note:
#[warn(dead_code)]
on by defaultwarning:
botv2
(bin "botv2") generated 1 warningFinished release [optimized] target(s) in 30m 27s
[Ignore the warning, thats unrelated]
Linux kernel kills rust with a SIGKILL 9 randomly and Out Of Memory killer activates
Meta
Tested and reproduced across both Apple M1 and a Dedicated Server (Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz) on both rust stable and nightly using both LLVM and Cranelift as codegen backends. Switching linkers to
clang
+mold
/ld.lld
from defaults did not help. Rust versions are given below:rustc 1.77.0 (aedd173 2024-03-17) and rustc 1.78.0-nightly (2bf78d1 2024-02-18)
No backtrace is available, the compiler does not crash, attempts to compile the code using
--timing
for timing information don't seem to even complete in any reasonable timespan. Issue occurs on bothdebug
andrelease
builds so this is not an issue ofrelease
build optimizations. Typical options to speed up compile times such asopt-level = 1
do not work in reducing neither the compile times nor the memory usage. Splitting up the code into multiple crates does not work as long as this file is in the module tree and hence compiled.The text was updated successfully, but these errors were encountered: