-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Make the allocator shim participate in LTO again #146232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make the allocator shim participate in LTO again
4ec9a9b
to
d0e65a9
Compare
Forgot to revert the changes in exported_symbols_for_lto. This shouldn't affect the perf run other than possibly showing less of a performance improvement than it should give in the end. |
This comment has been minimized.
This comment has been minimized.
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
Some changes occurred in compiler/rustc_codegen_ssa |
This test reproduces some from of these other two issues ( //@ compile-flags: --crate-type cdylib -C lto
use std::alloc::{GlobalAlloc, Layout};
struct MyAllocator;
unsafe impl GlobalAlloc for MyAllocator {
unsafe fn alloc(&self, _layout: Layout) -> *mut u8 {
todo!()
}
unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {
}
}
#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;
You've since done this, IIUC. |
Thanks! Had to modify it slightly to work with compiletest.
Correct |
Otherwise this looks good to me and fixes the regressions, so that's great, thanks! I'm not sure we care about the perf results, but they should be available in 3-4hours. You can r=me at your preference. |
Co-Authored-By: Rémy Rakic <remy.rakic+github@gmail.com>
e10f5b6
to
e072d7d
Compare
Finished benchmarking commit (5ab6398): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.8%, secondary -2.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -13.2%, secondary -8.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 53.0%, secondary 59.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 467.829s -> 466.151s (-0.36%) |
It improves things even more than it previously regressed. $ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +nightly-2025-09-01 - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 1772392 5 sep 20:53 rust_out
-rw-rw-r-- 1 bjorn bjorn 2609776 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6979364 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6979324 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 4512 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 3211728 5 sep 20:53 rust_out.rust_out.49aad25d9732bf04-cgu.0.rcgu.o
$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +nightly - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 1766032 5 sep 20:54 rust_out
-rw-rw-r-- 1 bjorn bjorn 2612612 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6982788 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6982752 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 4512 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 3188680 5 sep 20:54 rust_out.rust_out.f8d6093b640b0034-cgu.0.rcgu.o
-rw-rw-r-- 1 bjorn bjorn 3484 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-crate.allocator.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 3224 5 sep 20:53 rust_out.rust_out.f8d6093b640b0034-crate.allocator.rcgu.o
$ echo 'fn main() {}' | RUSTC_BOOTSTRAP=1 rustc +5ab63980021f7c1ae280eba3261d66240d594007 - -Zhuman-readable-cgu-names -O -Csave-temps -Clto=true && ls -l
-rwxrwxr-x 1 bjorn bjorn 2264800 5 sep 20:55 rust_out
-rw-rw-r-- 1 bjorn bjorn 4512 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-cgu.0.rcgu.no-opt.bc
-rw-rw-r-- 1 bjorn bjorn 2619640 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.bc
-rw-rw-r-- 1 bjorn bjorn 6982380 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.lto.after-restriction.bc
-rw-rw-r-- 1 bjorn bjorn 6982344 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.lto.input.bc
-rw-rw-r-- 1 bjorn bjorn 3701008 5 sep 20:55 rust_out.rust_out.dbca3ea46f37a61b-crate.allocator.rcgu.o I suspect what happened is that I didn't restore the check in fat LTO that ensures the allocator module is not used as base to merge all other modules into. The allocator module is not configured to be optimized, so we probably skipped all optimizations after doing the module merging pass of fat LTO. I've added a new commit to fix this. @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
This comment has been minimized.
This comment has been minimized.
Make the allocator shim participate in LTO again
r? @lqd or codegen |
Queued 499d4b9 with parent 9cd272d, future comparison URL. |
This is likely the cause of the perf regression in #145955. It also caused some functional regressions.
Fixes #146235
Fixes #146239