Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default handlers get mono-morphized needlessly #19

Closed
jpochyla opened this issue Mar 17, 2017 · 27 comments
Closed

Default handlers get mono-morphized needlessly #19

jpochyla opened this issue Mar 17, 2017 · 27 comments

Comments

@jpochyla
Copy link

Since #17, identical copies of exception::default_handler get included in the binary, thanks to the token generic parameter. It's not a lot of code, but it seems a bit unfortunate. Is there any way around this?

Thanks for all the work you do for embedded Rust, btw! <3

@japaric
Copy link
Member

japaric commented Mar 19, 2017

Is there any way around this?

Not that I know of. I was hoping LLVM would merge the identical functions but it doesn't. I don't know if it's capable of doing that at all.

cc @eddyb Do you know if LLVM should be able to optimize something like this and we are not giving it enough information or maybe it's not possible to do at all:

extern "C" fn default_handler<T>(_: T) {
    // code that doesn't use `T` at all
}

struct Foo;
struct Bar;
struct Baz;
struct Quux;

#[repr(C)]
struct Exceptions {
    foo: extern "C" fn(Foo),
    bar: extern "C" fn(Bar),
    baz: extern "C" fn(Baz),
    quux: extern "C" fn(Quux),
}

#[no_mangle]
pub static _EXCEPTIONS: Exceptions = Exceptions {
    foo: default_handler,
    bar: default_handler,
    baz: default_handler,
    quux: default_handler,
};

Today, we get something like this (release + LTO):

08000008 <_EXCEPTIONS>:
 8000008:       080003f3 0800033f 080002ef 08000439     ....?.......9...

080002ee <cortex_m::exception::default_handler>:
 80003f2:       f3ef 8008       mrs     r0, MSP
 80003f6:       6941            ldr     r1, [r0, #20]
 80003f8:       f000 b863       b.w     80004c2 <cortex_m::exception::default_handler::handler>

0800033e <cortex_m::exception::default_handler>:
 80003fc:       f3ef 8008       mrs     r0, MSP
 8000400:       6941            ldr     r1, [r0, #20]
 8000402:       f000 b85e       b.w     80004c2 <cortex_m::exception::default_handler::handler>

080003f2 <cortex_m::exception::default_handler>:
 8000406:       f3ef 8008       mrs     r0, MSP
 800040a:       6941            ldr     r1, [r0, #20]
 800040c:       f000 b859       b.w     80004c2 <cortex_m::exception::default_handler::handler>

08000438 <cortex_m::exception::default_handler>:
 8000410:       f3ef 8008       mrs     r0, MSP
 8000414:       6941            ldr     r1, [r0, #20]
 8000416:       f000 b854       b.w     80004c2 <cortex_m::exception::default_handler::handler>

Ideally, we'd like something like this:

08000008 <_EXCEPTIONS>:
 8000008:       080002ef 080002ef 080002ef 080002ef     ................

080002ee <cortex_m::exception::default_handler>:
 80003f2:       f3ef 8008       mrs     r0, MSP
 80003f6:       6941            ldr     r1, [r0, #20]
 80003f8:       f000 b863       b.w     80004c2 <cortex_m::exception::default_handler::handler>

@eddyb
Copy link

eddyb commented Mar 19, 2017

@japaric You need -C opt-level=3 IIRC, and even then it might be disabled.
You can try -C passes=mergefunc to force it to run, but I'm not sure it will be placed right to do its thing.

@japaric
Copy link
Member

japaric commented Mar 19, 2017

Thanks for the suggestion, @eddyb. Unfortunately, it didn't work. We were already using -C opt-level=3 because of Cargo's --release flag and recompiling the world with -C passes didn't help. -C passes actually made one test program 8 bytes larger (originally, 1270 bytes in size) 😆.

@whitequark
Copy link
Contributor

whitequark commented Apr 30, 2017

Running a debug opt with -debug -debug-only mergefunc reveals the cause:

_ZN8cortex_m9exception15default_handler17h022e0dbc834bf513E is to small to bother merging
_ZN8cortex_m9exception15default_handler17h07a6735a3f66597aE is to small to bother merging
_ZN8cortex_m9exception15default_handler17h0912171e374cbd1fE is to small to bother merging

etc.

Well, turns out it's a FIXME in LLVM:

  // Don't merge tiny functions, since it can just end up making the function
  // larger.
  // FIXME: Should still merge them if they are unnamed_addr and produce an
  // alias.
  if (NewFunction->size() == 1) {
    if (NewFunction->front().size() <= 2) {
      DEBUG(dbgs() << NewFunction->getName()
                   << " is to small to bother merging\n");
      return false;
    }
  }

I've looked it and it seems very straightforward to fix. Take the logic from writeThunkOrAlias, check that !F->isInterposable() && !G->isInterposable(), then writeAlias(F, G);. I'll review and merge a patch if someone writes it (with a test).

@whitequark
Copy link
Contributor

@japaric
Copy link
Member

japaric commented Jul 18, 2017

Another fix has been proposed in rust-embedded/cortex-m-rt#27 and rust-embedded/svd2rust#130. It uses global_asm! to define exception / interrupt handlers as weak aliases of the DEFAUT_HANDLER symbol. The fix saves 4*N + 32 bytes of Flash memory where N is the number of interrupt handlers the device has.

@pftbest
Copy link
Contributor

pftbest commented Jul 18, 2017

Hello, @whitequark! Any updates on your patches? I think they can also save some space on panic handlers as well.

For each panic in a program there is a separate function that just calls the panic_fmt

/// === Normal function that can produce 2 panics:
    ...
0000c084 <.LBB4_6>:
    c084:       3c 40 00 c1     mov     #49408, r12     ;#0xc100
    c088:       b0 12 a0 c0     call    #49312          ;#0xc0a0

0000c08c <.LBB4_7>:
    c08c:       3d 40 03 00     mov     #3,     r13     ;
    c090:       b0 12 94 c0     call    #49300          ;#0xc094

/// === Two extra functions that just call panic_fmt:

0000c094 <_ZN4core9panicking18panic_bounds_check17h5df2045dec8a376aE>:
    c094:       b0 12 98 c0     call    #49304          ;#0xc098

0000c0a0 <_ZN4core9panicking5panic17hb78b2a90052a7056E>:
    c0a0:       b0 12 98 c0     call    #49304          ;#0xc098

/// === The real panic_fmt handler:

0000c098 <_ZN4core9panicking9panic_fmt17hceb300dda72a8369E>:
    c098:       32 c2           dint
    ...

For example AT2XT project by @cr1901, has 5 of this functions, and that is too much.

@whitequark
Copy link
Contributor

@pftbest I will take another look at them soon. Half of the patch is already LGTM'd, there's very little left.

@japaric Let's not merge this hacky workaround. I'll try to be more responsive this time.

@pftbest
Copy link
Contributor

pftbest commented Jul 18, 2017

But the workaround hack will give 4 bytes shorter result :)

@whitequark
Copy link
Contributor

@pftbest The proper solution may well benefit other functions.

@pftbest
Copy link
Contributor

pftbest commented Jul 18, 2017

Agreed, but using a hack for this specific case will not cancel the benefits of a proper solution.
And since it's not a public interface, but a private implementation detail, we can always change it back if we need to.

@japaric
Copy link
Member

japaric commented Jul 18, 2017

I think weak aliases is the way to solve this. Semantically it is what we want to do: unless overriden a handler will be the DEFAULT_HANDLER. The need for global_asm! is a bit unfortunate but that's because of the lack of a proper aliasing mechanism in rustc (gcc and clang have such mechanism).

Relying on LLVM to merge exactly looking function works too but requires explicitly enabling the LLVM pass -- it's not enabled by default, right? -- and may not work (well) if not compiling with optimizations; I haven't actually tested that scenario. Whereas weak aliases work regardless of the optimization settings.

Of course, making mergefunc better is a welcome addition. It might help in other cases like in the panicking stuff that @pftbest mentioned.

@whitequark
Copy link
Contributor

and may not work (well) if not compiling with optimizations

Take a look at the code generated with svd2rust that's built without optimizations.

@whitequark
Copy link
Contributor

whitequark commented Jul 18, 2017

requires explicitly enabling the LLVM pass -- it's not enabled by default, right?

I suspect that it can actually produce better results if enabled in rustc in general, given how many functions may well monomorphize to the same thing. For example, every single time you have phantom types.

@therealprof
Copy link
Contributor

@japaric I'd love to see that patch landing. I was looking into this the other day as well but obviously you were quicker. ;)

@eddyb
Copy link

eddyb commented Jul 19, 2017

AFAIK, mergefunc is enabled at -C opt-level=3 which is what Cargo uses for release mode.

@japaric
Copy link
Member

japaric commented Jul 20, 2017

@eddyb Well, I certainly do see a difference between xargo build --release and RUSTFLAGS="-C passes=mergefunc" xargo build --release: a 4 byte increase in binary size. So either mergefunc is not enabled or specifying it in RUSTFLAGS adds a second mergefunc pass -- I don't know if that's possible.

@japaric
Copy link
Member

japaric commented Jul 20, 2017

@therealprof I can't land these patches until rust-lang/rust#43313 makes it way into the nightly channel. Otherwise we'll hit LLVM assertions.

@eddyb
Copy link

eddyb commented Jul 20, 2017

@japaric I'm pretty sure LLVM just has a list of passes and you can add as many copies of anything as you'd like (some might break because the IR isn't what they expect but that's it).

japaric added a commit to rust-embedded/cortex-m-rt that referenced this issue Jul 22, 2017
@japaric
Copy link
Member

japaric commented Jul 22, 2017

The global_asm! fix has landed and it's available in cortex-m-rt v0.3.5 and svd2rust v0.11.2. Note that you'll need to nightly-2017-07-21 or newer if targetting the Cortex-M architecture (or you'll hit an LLVM assertion). If these changes cause problems in the future we can always revert them as they are not part of the public / user-facing API.

@japaric japaric closed this as completed Jul 22, 2017
@pftbest
Copy link
Contributor

pftbest commented Aug 22, 2017

Hello, @whitequark . Any news on your patches?

@whitequark
Copy link
Contributor

whitequark commented Aug 22, 2017

@pftbest Yes, I got some of them merged upstream, but the others seem to fail tests in weird ways. I have this on my TODO list and will return to it shortly.

@pftbest
Copy link
Contributor

pftbest commented Aug 22, 2017

@whitequark that is great, thank you! global_asm hack didn't work that well, and we still have an extra function in binary, so it's not saving us any space compared to your solution.

@whitequark
Copy link
Contributor

@japaric in light of above: reopen this?

@pftbest
Copy link
Contributor

pftbest commented Aug 22, 2017

Well, the issue is fixed for now, the question is do we want to remove dependency on global_asm (and go back to naked functions, which is also not ideal, but already used for other things) when @whitequark 's patches will be merged.

@japaric
Copy link
Member

japaric commented Aug 24, 2017

I don't think we should reopen this issue as the problem is already 99% fixed and changing the underlying solution will at most reduce binary size by 4 or 8 bytes (I forget the exact number).

However, in general, I'm 👍 on reducing the number of unstable features this crate, and the whole cortex-m stack, uses so we can open an issue about that (reducing unstable features) and list the global_asm -> naked_function switch there.

@whitequark
Copy link
Contributor

@pftbest All done.

adamgreig pushed a commit that referenced this issue Jan 12, 2022
[RFC] default panic! to abort, drop panic-* features, add panic_fmt! macro
reitermarkus pushed a commit to reitermarkus/cortex-m that referenced this issue May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants