Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious binary size regression on ARMv7-M on nightly-2018-02-11 #49260

Closed
japaric opened this issue Mar 22, 2018 · 3 comments
Closed

Serious binary size regression on ARMv7-M on nightly-2018-02-11 #49260

japaric opened this issue Mar 22, 2018 · 3 comments

Comments

@japaric
Copy link
Member

japaric commented Mar 22, 2018

STR

$ git clone https://github.com/japaric/stable-embedded-rust

$ cd stable-embedded-rust

$ git checkout 7cc87cff95b2d90b5cab258d912cf4692130312f
  • nightly-2018-02-11 or newer
$ rustc -V
rustc 1.26.0-nightly (75af15ee6 2018-03-20)

$ # NOTE you need to have {gcc,libnewlib}-arm-none-eabi installed
$ xargo build --example minimal --target thumbv7m-none-eabi
$ xargo build --example minimal --target thumbv7m-none-eabi --release

$ arm-none-eabi-size target/thumbv7m-none-eabi/{debug,release}/examples/minimal
   text    data     bss     dec     hex filename
    530       0       0     530     212 target/thumbv7m-none-eabi/debug/examples/minimal
    662       0       0     662     296 target/thumbv7m-none-eabi/release/examples/minimal

$ xargo rustc --example minimal --target thumbv7m-none-eabi --release -- -C lto
   text    data     bss     dec     hex filename
   1114       0       0    1114     45a target/thumbv7m-none-eabi/release/examples/minimal
  • nightly-2018-02-10
$ rustc -V
rustc 1.25.0-nightly (3bcda48a3 2018-02-09)

$ # NOTE you need to have {gcc,libnewlib}-arm-none-eabi installed
$ xargo build --example minimal --target thumbv7m-none-eabi
$ xargo build --example minimal --target thumbv7m-none-eabi --release

$ arm-none-eabi-size target/thumbv7m-none-eabi/{debug,release}/examples/minimal
   text    data     bss     dec     hex filename
    530       0       0     530     212 target/thumbv7m-none-eabi/debug/examples/minimal
    160       0       0     160      a0 target/thumbv7m-none-eabi/release/examples/minimal

$ xargo rustc --example minimal --target thumbv7m-none-eabi --release -- -C lto

$ arm-none-eabi-size target/thumbv7m-none-eabi/release/examples/minimal
   text    data     bss     dec     hex filename
    130       0       0     130      82 target/thumbv7m-none-eabi/release/examples/minimal

Basically as of nightly-2018-02-11 dev profile is better than release profile and that's better than
compiling with LTO ...

Also, in this case, today's LTO produces a 1 KB (756%) bigger binary that using LTO on
nightly-2018-02-10 . On a more real world example I see a 2.4 KB (21%) increase in binary size.

LLVM?

One notorious difference between nightly-2018-02-10 and newer nightlies is that
nightly-2018-02-10 is using LLVM 4 and everything newer than that is using LLVM 6. However, I
don't think LLVM is fully to blame: today's rustc produces much more LLVM IR than it did on
2018-02-10.

$ # nightly-2018-02-10
$ xargo rustc --example minimal --target thumbv7m-none-eabi --release -- -C lto --emit=llvm-ir
$ wc $(find -name '*.ll')
  72  393 2748 ./target/thumbv7m-none-eabi/release/examples/minimal-dd6a4842433c97ae.ll


$ # nightly-2018-03-20
$ xargo rustc --example minimal --target thumbv7m-none-eabi --release -- -C lto --emit=llvm-ir
$ wc $(find -name '*.ll')
  707  5942 34041 ./target/thumbv7m-none-eabi/release/examples/minimal-52848323be89f9be.ll

IR files for reference.

thinLTO / parallel codegen?

The problem doesn't seem to be caused by thinLTO or parallel codegen either. I tried this with both
nightly-2018-02-11 and nightly-2018-03-20:

$ tail Cargo.toml
[profile.release]
codegen-units = 1
incremental = false

$ cat .cargo/config
[target.thumbv7m-none-eabi]
runner = "arm-none-eabi-gdb" # Not required; just used for testing
rustflags = [
  "-Z", "thinlto=no", # <- NEW!
  "-C", "link-arg=-Tlink.x",
  "-C", "link-arg=-nostartfiles",
  "-C", "link-arg=-march=armv7-m",
  "-C", "link-arg=-mthumb",
]

It got slightly better but it's not on parity with nightly-2018-02-10

$ # nightly-2018-02-11
$ xargo rustc --example minimal --target thumbv7m-none-eabi --release -- -C lto

$ arm-none-eabi-size target/thumbv7m-none-eabi/release/examples/minimal
   text    data     bss     dec     hex filename
    598       0       0     598     256 target/thumbv7m-none-eabi/release/examples/minimal


$ # nightly-2018-03-20
$ xargo rustc --example minimal --target thumbv7m-none-eabi --release -- -C lto

$ arm-none-eabi-size target/thumbv7m-none-eabi/release/examples/minimal
   text    data     bss     dec     hex filename
    598       0       0     598     256 target/thumbv7m-none-eabi/release/examples/minimal

ARMv6-M

Also none of this seems to affect ARMv6-M.

$ # nightly-2018-03-20
$ xargo rustc --example minimal --target thumbv6m-none-eabi --release -- -C lto

$ arm-none-eabi-size target/thumbv6m-none-eabi/release/examples/minimal
   text    data     bss     dec     hex filename
    112       0       0     112      70 target/thumbv6m-none-eabi/release/examples/minimal

This is with the .cargo/config and Cargo.toml stuff undone.


cc @alexcrichton @nagisa any clue about what could be going wrong here?

@japaric
Copy link
Member Author

japaric commented Mar 22, 2018

@nagisa pointed out to me on IRC that --emit=llvm-ir produces the IR after LLVM optimizes what rustc emits. They also suggested to use -C no-prepolutate-passes to check the IR that rustc produces and I can see that the IR that rustc produces is pretty much the same before and after the LLVM upgrade.

It seems that LLVM 6 is deciding to unroll some for loops "for performance" (though I doubt it makes any difference in the case of the minimal example) and that's what's increasing the binary size by a few KBs.

I don't know if LLVM provides any option to prevent unrolling loops so some people may want to stick to LLVM 4 ...

@jonas-schievink
Copy link
Contributor

This can be fixed by optimizing for size instead of performance:

[profile.release]
opt-level = "s"

Add that to the main Cargo.toml and we're back to 130 Bytes:

   text	   data	    bss	    dec	    hex	filename
    530	      0	      0	    530	    212	target/thumbv7m-none-eabi/debug/examples/minimal
    130	      0	      0	    130	     82	target/thumbv7m-none-eabi/release/examples/minimal

@japaric
Copy link
Member Author

japaric commented Mar 22, 2018

Optimizing for size "fixes" the problem in this particular case but it's not a general solution to prevent loop unrolling where it makes no sense; that would require fine grained control over loop unrolling like clang's loop unroll pragma.

My experience with opt-level={s,z}, at least when LLVM 4 was around, is that they produce bigger binaries than opt-level=3 -- now that opt-level=3 binaries are bloated due to loop unrolling that may no longer be the case. iirc, opt-level={s,z} also reduces the iniling threshold which prevents LLVM from optimizing dead branches when using RTFM's claim mechanism.

I think there's nothing that can be done on the rustc side. We'll have to live with this and document all the options to improve binary size / performance, including switching back to LLVM 4.

@japaric japaric closed this as completed Mar 22, 2018
bors added a commit that referenced this issue May 21, 2018
stabilize opt-level={s,z}

closes #35784
closes #47651

### Rationale

Since the lastest LLVM upgrade rustc / LLVM does more agressive loop unrolling. This results in increased binary size of embedded / no_std programs: a hundreds of bytes increase, or about a 7x increase, in the case of the smallest Cortex-M binary cf. #49260.

As we are shooting for embedded Rust on stable it would be great to also provide a way to optimize for size (which is pretty important for embedded applications that target resource constrained devices) on stable.

Also this has been baking in nightly for a long time.

r? @alexcrichton which team has to sign off this?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants