Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no_std: binary size blowup using Result::unwrap since 1.49.0 #83925

Open
benma opened this issue Apr 6, 2021 · 12 comments
Open

no_std: binary size blowup using Result::unwrap since 1.49.0 #83925

benma opened this issue Apr 6, 2021 · 12 comments
Labels
I-heavy Issue: Problems and improvements with respect to binary size of generated code. ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@benma
Copy link

benma commented Apr 6, 2021

Needed toolchains and targets to run the examples (nightly instead of 1.49.0 also works, 1.49.0 is the first release with the regression):

rustup toolchain add 1.48.0
rustup toolchain add 1.49.0
rustup target add thumbv7em-none-eabi --toolchain=1.48.0
rustup target add thumbv7em-none-eabi --toolchain=1.49.0

Code

[package]
name = "foo"
version = "0.1.0"
edition = "2018"

[profile.release]
opt-level = 'z'
codegen-units = 1
panic = 'abort'
lto = true
#![no_std]
#![no_main]

#[no_mangle]
pub extern "C" fn _start() -> ! {
    let e: Result<(), ()> = Err(());
    e.unwrap();

    loop {}
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

Compiled with:

cargo +1.49.0 build --release --target thumbv7em-none-eabi

I expected the all functions related to panics to be optimized away and the binary size to be relatively small:

Output with toolchain 1.48.0, everything is as expected:

$ nm target/thumbv7em-none-eabi/release/foo
000200e4 T _start
$ du -b  target/thumbv7em-none-eabi/release/foo
932	target/thumbv7em-none-eabi/release/foo

The same output when compiled with 1.49.0:

$ nm target/thumbv7em-none-eabi/release/foo
00020154 T _start
00020124 t _ZN45_$LT$$LP$$RP$$u20$as$u20$core..fmt..Debug$GT$3fmt17h83e1a6586aaeb498E
00020160 t _ZN4core3fmt9Formatter3pad17h81a3156a1ec5b453E
00020134 t _ZN4core3ptr13drop_in_place17h3eafc7c4aae35b40E
000204c4 t _ZN4core6option18expect_none_failed17h37d4a90aeed60e0cE
00020138 t _ZN4core6result19Result$LT$T$C$E$GT$6unwrap17hb9125050bd4dfd28E
0002015e t _ZN4core9panicking9panic_fmt17h49c6aff3292697f2E
$ du -b  target/thumbv7em-none-eabi/release/foo
11240	target/thumbv7em-none-eabi/release/foo

The binary size blew up by ~10kb. For my embedded program, this increase is too large and for my target and I cannot run the program anymore.

When I inline the code from unwrap() and call panic!() manually, the binary size reduces against drastically, though the symbols list still has a panic_fmt entry that is not optimized away since 1.49.0 (previous toolchains produce a minimal binary with no panic-related code in the binary, as expected):

#![no_std]
#![no_main]

#[no_mangle]
pub extern "C" fn _start() -> ! {
    let e: Result<(), ()> = Err(());
    match e {
        Ok(()) => {}
        Err(err) => panic!("error: {:?}", err),
    }
    loop {}
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}
$ nm target/thumbv7em-none-eabi/release/foo
000200e4 T _start
000200ee t _ZN4core9panicking9panic_fmt17h49c6aff3292697f2E
$ du -b  target/thumbv7em-none-eabi/release/foo
1892	target/thumbv7em-none-eabi/release/foo

Version it worked on

Toolchains below 1.49.0

Version with regression

Toolchains since 1.49.0, including today's nightly.

@benma benma changed the title no_std: binary size blowup using Result.::unwrap since 1.49.0 no_std: binary size blowup using Result::unwrap since 1.49.0 Apr 6, 2021
@eggyal
Copy link
Contributor

eggyal commented Apr 6, 2021

Bisected this regression to nightly-2020-10-17.

@eggyal
Copy link
Contributor

eggyal commented Apr 6, 2021

@rustbot label: +I-heavy +regression-from-stable-to-beta

@rustbot rustbot added I-heavy Issue: Problems and improvements with respect to binary size of generated code. regression-from-stable-to-beta Performance or correctness regression from stable to beta. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 6, 2021
@eggyal
Copy link
Contributor

eggyal commented Apr 6, 2021

@rustbot label: -regression-from-stable-to-beta +regression-from-stable-to-stable

@rustbot rustbot added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Apr 6, 2021
benma added a commit to benma/bitbox02-firmware that referenced this issue Apr 6, 2021
rust-lang/rust#83925

For now we just remove the one instance of .unwrap() that is used in
the bootloader, removing most of the bootloader size increase.
benma added a commit to benma/bitbox02-firmware that referenced this issue Apr 6, 2021
rust-lang/rust#83925

For now we just remove the one instance of .unwrap() that is used in
the bootloader, removing most of the bootloader size increase.
benma added a commit to benma/bitbox02-firmware that referenced this issue Apr 13, 2021
rust-lang/rust#83925

For now we just remove the one instance of .unwrap() that is used in
the bootloader, removing most of the bootloader size increase.
@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 14, 2021
@apiraino
Copy link
Contributor

Issue has been discussed during team compiler meeting on Zulip: we acknowledge this is a pain point for anybody doing any embedded, but this issue needs a well thought way out, possibly through our RFC/MCP process.

@apiraino
Copy link
Contributor

Bisected this regression to nightly-2020-10-17.

We have the nighty where this regressed, can we try to go down a specific commit where this happened?

@rustbot ping icebreakers-cleanup-crew

@rustbot rustbot added the ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections label Apr 15, 2021
@rustbot
Copy link
Collaborator

rustbot commented Apr 15, 2021

@apiraino
Copy link
Contributor

Assigning priority as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 15, 2021
@eggyal
Copy link
Contributor

eggyal commented Apr 15, 2021

@apiraino:

We have the nighty where this regressed, can we try to go down a specific commit where this happened?

Unfortunately the CI artifacts from back then are no longer available (to me, at least); so this would entail manually compiling those specific commits... compiling rustc is very slow for me, but I'll see what I can do.

I should also add that I ran the bisection on the basis of treating compilations that took longer than 5 minutes as a regression; this appeared to correlate with memory usage exceeding 20GiB, although I did notice some builds completing within 5 minutes where memory usage exceeded 10GiB so this issue may be the combined effect of multiple commits and/or the bisection may not be entirely reliable. Sorry wasn't paying attention—this comment related to a different issue/bisection. Am pretty confident the identified nightly is the one in which the regression was introduced.

@tmiasko
Copy link
Contributor

tmiasko commented Apr 15, 2021

Command Size (bytes)
cargo +nightly-2020-10-16 932
cargo +nightly-2020-10-17 11248
env RUSTFLAGS=-Zinsert-sideeffect cargo +nightly-2020-10-16 11248
cargo +nightly-2020-10-17 + unreachable_unchecked 932
cargo +nightly-2020-10-17 -Zbuild-std=panic_abort,core 1092

@davidsvasak
Copy link

This broke due to the change merged at a78a62f which was intended to fix #28728. In particular, it adds a side effect call to empty loops, which would be triggered here. I'm not familiar enough with the behaviour of the compiler to say for certain, but I think the side effect inhibits the compiler from optimising away the associated loop / callers and as a result the symbols are left in the binary. I'm also not quite able to replicate the nightly build with my locally built compiler so there may be some other surprises here.

In case some more details would be of use:

Here's the LLVM IR I get from 7f58716 (i.e. the last good commit):

; ModuleID = 'example.bxq6xejs-cgu.0'
source_filename = "example.bxq6xejs-cgu.0"
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7em-none-unknown-eabi"

; Function Attrs: minsize norecurse noreturn nounwind optsize readnone
define void @_start() unnamed_addr #0 {
bb1:
  br label %bb2

bb2:                                              ; preds = %bb2, %bb1
  br label %bb2
}

attributes #0 = { minsize norecurse noreturn nounwind optsize readnone "frame-pointer"="all" "target-cpu"="generic" }

Whereas here's what I get from e2efec8 (i.e. the commit that broke this):

; ModuleID = 'example.bxq6xejs-cgu.0'
source_filename = "example.bxq6xejs-cgu.0"
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7em-none-unknown-eabi"

; Function Attrs: minsize noreturn nounwind optsize
define void @_start() unnamed_addr #0 {
start:
  %e.i = alloca {}, align 1
  %0 = bitcast {}* %e.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 0, i8* nonnull %0)
; call core::result::unwrap_failed
  call fastcc void @_ZN4core6result13unwrap_failed17h28e5bee20479b83aE({}* nonnull align 1 %e.i) #4
  unreachable
}

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1

; Function Attrs: inaccessiblememonly nounwind willreturn
declare void @llvm.sideeffect() #2

; core::result::unwrap_failed
; Function Attrs: cold noinline noreturn nounwind
define internal fastcc void @_ZN4core6result13unwrap_failed17h28e5bee20479b83aE({}* nocapture nonnull readnone align 1 %0) unnamed_addr #3 {
; call core::panicking::panic_fmt
  tail call fastcc void @_ZN4core9panicking9panic_fmt17hbd2067f8be3a4464E()
  unreachable
}

; core::panicking::panic_fmt
; Function Attrs: cold noinline noreturn nounwind
define internal fastcc void @_ZN4core9panicking9panic_fmt17hbd2067f8be3a4464E() unnamed_addr #3 {
  br label %bb1.i

bb1.i:                                            ; preds = %bb1.i, %0
  tail call void @llvm.sideeffect() #4
  br label %bb1.i
}

attributes #0 = { minsize noreturn nounwind optsize "frame-pointer"="all" "target-cpu"="generic" }
attributes #1 = { argmemonly nounwind willreturn }
attributes #2 = { inaccessiblememonly nounwind willreturn }
attributes #3 = { cold noinline noreturn nounwind "frame-pointer"="all" "target-cpu"="generic" }
attributes #4 = { nounwind }

Here we see bb2 has become bb1.i and calls a side effect function, stopping optimisation.

@davidsvasak
Copy link

It looks like this is the same theory discussed in the Zulip thread.

@Eugeny
Copy link

Eugeny commented Jan 16, 2024

For embedded developers finding this issue: if you're using a halt/abort panic handler anyway, building with -Zbuild-std=core -Zbuild-std-features=panic_immediate_abort will allow Rust to optimize away the formatting code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-heavy Issue: Problems and improvements with respect to binary size of generated code. ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants