Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The binary size - performance tradeoff #69

Closed
japaric opened this issue Mar 22, 2018 · 13 comments
Closed

The binary size - performance tradeoff #69

japaric opened this issue Mar 22, 2018 · 13 comments
Labels

Comments

@japaric
Copy link
Member

japaric commented Mar 22, 2018

As of the latest LLVM upgrade (4.0 -> 6.0 on 2018-02-11) LLVM seems to now perform loop unrolling more agressively; this increased the binary size of a minimal program that only zeroes .bss and initializes .data from 130 bytes .text (nightly-2018-02-10) to 1114 bytes (nightly-2018-03-20) when using opt-level=3 + LTO -- FWIW, I highly doubt the loop unrolling actually improves performance at all. Original report: rust-lang/rust#49260

This put us in a bad spot because by default we'll end with large optimized (--release) binaries -- I can already foresee future comparisons between C and Rust pointing out that the smallest embedded C program is only a hundred bytes in size whereas the smallest embedded Rust program is 1 KB.

So we should make sure we clearly document why Rust programs are so large by default and how to make Rust programs small. Using opt-level=s + LTO on the minimal program mentioned above brings the size back to 130 bytes .text.

cc @jamesmunns ^ that should be included in the book

There are other possibilites to explore here: like having something like C's / clang's #pragma nounroll to prevent LLVM from optimizing loops marked with that attribute, but I doubt we'll get any of that into the 2018 edition release -- it's too late, I think.

@japaric japaric added the docs label Mar 22, 2018
@therealprof
Copy link
Contributor

FWIW: I think funky stuff like unrolling (which is really worthless on embedded architectures) is to be expected to be done at higher optimisation levels and I've seen all kind of funky size regressions. I'm always using (and recommending) -s. Potentially -z could also be tried, but -O3 is a big nono...

@whitequark
Copy link

-O3 is pretty much defined as "-O2 with optimizations that cause code bloat", so I'm not sure why you'd go higher than -O2 on embedded devices.

@therealprof
Copy link
Contributor

therealprof commented Mar 23, 2018

the smallest embedded C program is only a hundred bytes in size whereas the smallest embedded Rust program is 1 KB.

NB: I highly doubt that. As soon as one uses some the initialisation code from some of the typical SDKs, the code will be well in the kBs already. To even stay in Rusts range you'll have to manually bang the memory mapped registers and write your own linker scripts.

Case in point, this is the smallest possible binary for a main { while(1) {} } loop for the STM32F051 I could achieve based on STM32Cube initialisation:

# arm-none-eabi-size .pioenvs/disco_f051r8/firmware.elf
   text	   data	    bss	    dec	    hex	filename
    892	   1080	   1600	   3572	    df4	.pioenvs/disco_f051r8/firmware.elf

@Emilgardis
Copy link
Member

Could we get a RFC for something like #[no_unroll]/#[unroll(disable)]?

@whitequark
Copy link

@Emilgardis No need for an RFC, marking the function with the loop as #[cold] should suffice.

@jonas-schievink
Copy link
Contributor

@japaric also wrote in rust-lang/rust#49260:

My experience with opt-level={s,z}, at least when LLVM 4 was around, is that they produce bigger binaries than opt-level=3

If this is still the case with LLVM 6, this definitely wants to be investigated and fixed on the LLVM side.

He also wrote:

iirc, opt-level={s,z} also reduces the iniling threshold which prevents LLVM from optimizing dead branches when using RTFM's claim mechanism.

This might be the cause for some amount of bloat due to unnecessary branches, but shouldn't #[inline] be a strong enough hint to LLVM to still inline the function?

@therealprof
Copy link
Contributor

Why not get the default flags changed instead? It'd be very annoying to put annotations in every source file just in case someone might accidentally not change the compiler flags...

@whitequark
Copy link

This might be the cause for some amount of bloat due to unnecessary branches, but shouldn't #[inline] be a strong enough hint to LLVM to still inline the function?

The inlining thresholds in LLVM are tailored for C, which produces functions with relatively compact IR, and likely aren't well suited for Rust. In our in-house language we had to raise them significantly to get decent reductions in code size.

@Emilgardis
Copy link
Member

@whitequark I've never heard of that attribute, seems like it should work however.

@therealprof
Copy link
Contributor

@jonas-schievink I can not confirm that it produces larger files with opt-level=s, at least not in general. This all has quite a bit of premature optimisation smell to it, same as with the #[inline(always)] we had sprinkled all over the map...

@durka
Copy link

durka commented Mar 23, 2018 via email

@RandomInsano
Copy link

RandomInsano commented Mar 24, 2018 via email

@japaric
Copy link
Member Author

japaric commented Aug 10, 2018

This issue was moved to rust-embedded/book#11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants