New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] `#[ramfunc]` #100

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@japaric
Copy link
Member

japaric commented Aug 31, 2018

This is a 2 in 1 RFC and implementation PR. This builds on top of #90 so you can ignore the first commit. See #42 for some background information.

@rust-embedded/cortex-m Please review this RFC and if you are in favor approve this PR.


Summary

Add a #[ramfunc] attribute to this crate. This attribute can be applied to
functions to place them in RAM.

Motivation

Running functions from RAM may result in faster execution times. Thus placing
"hot code" in RAM may improve the performance of an application.

Design and rationale

Expansion

The #[ramfunc] attribute will expand into a #[link_section = ..]
attribute that will place the function in a .ramfunc.$hash section. Each
#[ramfunc] function will be placed in a different linker section (in a
similar fashion to -ffunction-sections) to let the linker GC (Garbage Collect)
the unused functions . To this end, #[ramfunc] won't accept generic functions
because if the generic function has several instantiations all of them would end
in the same linker section.

link.x will be tweaked to instruct the linker to collect all the input
sections of the form .ramfunc.* into a single .ramfunc output section that
will placed in RAM.

The Reset handler will be updated to initialize the .ramfunc section before
the user entry point is called.

Why an attribute?

This is implemented as an attribute to make it composable with the attributes to
be added in RFC #90. For example you can place an exception handler in RAM. The
compiler won't generate a veener in this case as the address to the RAM location
will be placed in the vector table.

#[exception(SysTick)]
#[ramfunc]
fn systick() {
    // do stuff
}

Why a separate linker section?

Using a separate section lets the compiler / linker assign the right ELF flags
to .ramfunc (i.e. "AX"). That way .ramfunc ends up showing in the output of
objdump -C.

$ readelf -S app
  [ 1] .vector_table     PROGBITS        08000000 010000 000400 00   A  0   0  4
  [ 2] .text             PROGBITS        08000400 010400 0003ea 00  AX  0   0  2
  [ 3] .rodata           PROGBITS        080007ec 020020 000000 00  WA  0   0  4
  [ 4] .data             PROGBITS        20000000 020000 000004 00  WA  0   0  4
  [ 5] .ramfunc          PROGBITS        20000004 020004 00001c 00  AX  0   0  4
  [ 6] .bss              NOBITS          20000020 000000 000000 00  WA  0   0  4
$ arm-none-eabi-objdump -Cd app
Disassembly of section .text:

08000400 <Reset>:
 8000400:       f000 f9ee       bl      80007e0 <DefaultPreInit>
 (..)

Disassembly of section .ramfunc:

20000004 <SysTick>:
20000004:       b580            push    {r7, lr}
(..)

This also means that .data would only contain ... data so the section can be
inspected (not disassembled) separately.

$ arm-none-eabi-objdump -s -j .data app
Contents of section .data:
 20000000 01000000                             ....

The downside (?) is that the default format of size will report .ramfunc
under text.

$ size app
   text    data     bss     dec     hex filename
   2066       4       4    2074     81a app

But in any case it's (usually) better to look at the output in System V format.

$ size -Ax app
app  :
section             size         addr
.vector_table      0x400    0x8000000
.text              0x3ea    0x8000400
.rodata              0x0    0x80007ec
.data                0x4   0x20000000
.ramfunc            0x28   0x20000004
.bss                 0x4   0x2000002c

Unresolved questions?

  • Is there a reliable way to initialize both .data and .ramfunc in a single
    for loop? (see init_data in the Reset function) Right now I initialize
    each one separately and this produces more code. I haven't tried initializing
    them in a single pass.

  • Should #[ramfunc] imply #[inline(never)]? Right now the compiler is free
    to inline functions marked as #[ramfunc] into other functions. That could
    cause the function to not end in RAM.

  • Bikeshed the attribute name and linker section name? IAR and TI use the name
    ramfunc for this functionality.

japaric added some commits Aug 18, 2018

add `#[ramfunc]` attribute
this attribute lets you place functions in RAM

closes #42

@japaric japaric requested a review from rust-embedded/cortex-m as a code owner Aug 31, 2018

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Aug 31, 2018

Oh and I have not tested this on hardware. I have only looked at the output of objdump so it would be good if someone tested it to see if it actually works ...

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Aug 31, 2018

@japaric This is fantastic! Especially the ability to have exception/interrupt handlers without veneer.

Should #[ramfunc] imply #[inline(never)]? Right now the compiler is free
to inline functions marked as #[ramfunc] into other functions. That could
cause the function to not end in RAM.

Yes, we should do that.

/// ```
/// # use cortex_m_rt_macros::ramfunc;
/// #[ramfunc]
/// unsafe fn computation_heavy_function() {

This comment has been minimized.

@therealprof

therealprof Aug 31, 2018

Contributor

Why unsafe? Also I think it would be better do a computation heavy function example with something that actually consumes and/or produces something via parameters and/or return values. ;) Showing how to apply this to an interrupt/exception handler would also be helpful.

This comment has been minimized.

@japaric

japaric Aug 31, 2018

Member

Why unsafe?

Oh, I literally copy pasted the #[pre_init] example and forgot to remove the unsafe keyword. Will fix it in a bit.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Aug 31, 2018

Something that should be mentioned in the docs is that #[ramfunc] does not automatically propagate through the call stack. By this I mean that code like:

#[ramfunc]
fn foo() {
    // ..
    bar();
    // ..
}

fn bar() { .. }

Could result in foo being executed from RAM and bar being executed from Flash unless bar gets inlined into foo but that would be up to the compiler.

Sure you can add #[ramfunc] to bar if you wrote it but if you are calling third party code, like libm, you have pretty much no control over whether something like sinf will end in Flash or RAM.

// function is in so we can't use a mangled version of the path to the function -- we don't know
// its full path! Instead we'll use a random string for the section name of each function.
let mut rng = rand::thread_rng();
let hash = (0..16)

This comment has been minimized.

@japaric

japaric Aug 31, 2018

Member

Using a RNG in a procedural macro doesn't fell quite right to me but I don't see any other option to generate a unique section name from only the AST of function. Also, I have no idea if the RNG negatively impacts incremental compilation. I don't think it affects the reproducibility of the builds because the random section names won't make it to the final binary -- the linker merge all of them and puts them under .ramfunc.

*(.ramfunc.*);

. = ALIGN(4); /* 4-byte align the end (VMA) of this section */
} > RAM AT > FLASH

This comment has been minimized.

@korken89

korken89 Aug 31, 2018

Do we want it at RAM? It is common to put the ramfunc in CCM (core coupled memory) RAM to not contend with the data RAM or vice versa.
Can we have make this user specifiable?

This comment has been minimized.

@japaric

japaric Aug 31, 2018

Member

Can we have make this user specifiable?

In general, we don't have good support for multiple memory regions -- we hardcode most things to Flash and RAM (e.g. we don't have a way to say put this static mut in CCRAM and that one in RAM).

By user specifiable do you mean that (a) the user should be able to (a) say pull all #[ramfunc] functions into CCRAM or put all of them in RAM; or (b) that the user should be specify that this one function goes in CCRAM and that one goes in RAM?

This comment has been minimized.

@korken89

korken89 Aug 31, 2018

Preferable for each one, but if we can make all settable that would be a good improvement.

@korken89

This comment has been minimized.

Copy link

korken89 commented Aug 31, 2018

Also I agree with @therealprof, a ramfunc should not be inlined, that destroys the purpose. :)
Writing RAM functions was a bit of an art before to make sure the entire call-tree is in RAM, though I do not believe we can auto promote other functions to RAM, and more importantly if we should.

@thejpster

This comment has been minimized.

Copy link

thejpster commented Aug 31, 2018

Sounds good to me. I'm quite used to this sort of thing from working with big-name RTOS from a vendor that rhymes with "Nentor" and the little pitfalls tend to be fairly well known by people who have a need for this.

@adamgreig

This comment has been minimized.

Copy link
Member

adamgreig commented Sep 3, 2018

Should #[ramfunc] imply #[inline(never)]?

👍

Bikeshed the attribute name and linker section name?

I'm fine with ramfunc.

To this end, #[ramfunc] won't accept generic functions
because if the generic function has several instantiations all of them would end
in the same linker section.

Is that actually a problem? Presumably if the instantiation was generated it's because it's in use, so wouldn't be eliminated by the linker later anyway, but I might be misunderstanding how the generic instantiations get made. In any event it seems like it might be more useful to be able to have ramfunc generics than not. Since calling a function from a ramfunc doesn't make the called function live in RAM unless it gets inlined, you can't even make your own type-specific instantiations which call the generic and be confident it will still be in RAM.

Otherwise this looks good. As @korken89 mentioned it would be really nice if we could specify what section ramfuncs end up in, but I think we also want that for statics, and for things like #59, and even being able to move stacks to CCM. The same consideration about being able to initialise in a single loop would apply to non-.data statics too.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Sep 3, 2018

@korken89

Continuing thread #100 (comment) here because review comments get hidden when new commits are pushed.

We could support placing functions in CCRAM by adding a #[ccramfunc] attribute but this AFAICT this would require all users to define a CCRAM.

An alternative is to extend the #[ramfunc] to take an integer argument instead of adding a new attribute. This could be generalized to support any number of RAM regions. That is, #[ramfunc(0)] would place a function in .ramfunc0, #[ramfunc(1)] would place a function in .ramfunc1 and so on. Then .ramfunc0 would go in region RAM0, .ramfunc1 in RAM1, and so on.

This can be further generalized to allow placing static mut variables in different RAM regions. At that point, it would make more sense to rename the attribute to just #[ram]. Then #[ram] fn foo() would put foo in .ramfunc0 (no argument is equivalent to passing 0 as the argument), and #[ram(1)] static mut FOO: u32 = 0 (*) would put FOO in .data1 (**). This opens the question of how many regions to support by default? Just 1? And the question of how to let users configure the number of regions? Cargo features (being booleans) seems like a bad fit for this kind of configuration.

(*) Using #[ram] with static variables is tricky and potentially error prone. A static variable could end in .rodata (e.g. a string literal) or in .data (if it contains something with interior mutability or an atomic). The attribute doesn't have enough information (just the item AST) to determine if a static should really go in RAM so a misuse could cause a string literal to be stored in RAM, which would be wasteful.

(**) Similarly the attribute doesn't have enough information to determine if something should go in .bss or .data so putting everything in .data is the safe and conservative choice although it can unnecessarily increase the initialization time.

On the one hand, support for multiple memory regions looks like a breaking change so it would be good to batch it with breaking addition of attributes. On the other hand, I don't have the bandwidth to sit down and design something reasonable at the moment so I would prefer to postpone it for v0.7.0.


Disclaimer: I never had the need to call stuff from RAM so take my comment with a grain of salt.

I'm wondering if having #[ramfunc] imply #[inline(never)] is more error prone or at least less flexible than leaving #[inline(never)] as a separate attribute. Imagine one of the simplest scenario where you need to call a function foo that calls another function bar. You want the call to foo to execute from RAM for $reasons. I might write this:

#[ramfunc]
fn foo() {
    // ..
    bar(args);
    // ..
}

#[ramfunc]
fn bar(args: Args) { .. }

If #[ramfunc] implies #[inline(never)] you forbid LLVM from inlining bar into foo which could hamper optimizations. In this case inlining is something that may be favorable so it's best to leave LLVM do its thing.

OTOH, if #[ramfunc] doesn't imply #[inline(never)] then you really should mark foo as #[inline(never)] otherwise it may get inlined into the caller function that's running from Flash. So if we go this route we should have guideline to mark "entry" RAM functions as #[inline(never)].

Or, instead of that guideline we could add a ramcall!(fun) macro that calls fun through a thunk:

// `ramcall!(fun)` expands into:
{
    #[ramfunc]
    #[inline(never)]
    fn thunk() { fun() } // LLVM will likely inline `fun` into `thunk`
    thunk();
}

Then the recommendation would be to use ramcall!(fun) when you want to execute fun from RAM and the caller is being executed from Flash. This may work correctly even if fun is not a #[ramfunc]. The problem here is how to implement ramcall! so that it supports inputs for fun ergonomically.

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Sep 3, 2018

If #[ramfunc] implies #[inline(never)] you forbid LLVM from inlining bar into foo which could hamper optimizations. In this case inlining is something that may be favorable so it's best to leave LLVM do its thing.

True, but why would you do that? If you wanted to make sure it's inlined, why not declare it just #[inline(always)] and if you don't care just don't declare it as #[ramfunc]; then it's up to the compiler to decide wether to inline or not.

@adamgreig

This comment has been minimized.

Copy link
Member

adamgreig commented Sep 3, 2018

On inline:
Can #[ramfunc] detect if the user has already applied an #[inline] attribute, so that by default ramfuncs would be #[inline(never)] but could be overridden with #[inline] on a per-function basis? What happens at the moment if you put #[inline] and #[ramfunc] on something?


On sections:
I quite like the idea of having ram0, ram1, ram2, etc etc, where the user can define where they live in their memory.x to map to CCM/ITCM/DTCM/backup SRAM. I'd support a lot of regions, like 8 or 16. Higher end chips commonly have multiple normal SRAMs, plus ITCM/DTCM (or just CCM in other cases), backup SRAM, and you might well have multiple external mapped memories too. I would think the user configuration of number of sections can be part of memory.x; either all sections need to be defined (and most just have an address and length of 0x0), or defaults are applied for any that are undefined. Here's an example from a C RTOS. Note ram0 is made up of ram1 and ram2 (which are contiguous but distinct SRAMs).

I think we'd really need to be able to differentiate bss sections from data sections too: one common use of CCM is process stacks and they're likely several kB in size but all zero initially; it would be a huge waste to put them in .data. Perhaps just use #[ramdata(0)] vs #[rambss(0)] etc.

There are related issues around setting up memory protection units with details of the sections and which ones should have which attributes too.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Sep 4, 2018

@therealprof

True, but why would you do that? If you wanted to make sure it's inlined

Because you are not 100% sure whether you inlining it is better or not -- that's the job of LLVM. What you want to be sure of is that if it's not inlined then bar should still be placed in RAM and you would accomplish that with #[ramfunc] #[inline] fn bar(..) but #[ramfunc] must not imply #[inline(never)].

@adamgreig

On inline

That might work. It seems #[ramfunc] can see all the other attributes regardless of whether they come before or after it.

On sections:

But having more memory regions is not zero cost. The loop for initializing .{bss,data}{0,1,...N} will be kept in the binary even if the section is empty. You already see this today with .bss and .data: the compiler doesn't know if a section is empty because that information is known only after linking. Thus the number should be configurable and not fixed.

Manually placing things in .bss is unsafe (e.g. placing static mut FOO: Enum = Enum::NonZeroVariant is UB) so the attribute should indicate that the operation is unsafe and using the attribute should be rejected by #[deny(unsafe_code)].

re: generics

Is that actually a problem? Presumably if the instantiation was generated it's because it's in use, so wouldn't be eliminated by the linker later anyway

Not necessarily. The default is to use multiple codegen units -- basically the Rust source code is split in chunks and each one is codegen-ed independently of the other. A simple example of instances lowered to machine code but still GC-ed is: you have foo that on some condition calls bar which in turns calls baz::<i32>. In a multiple codegen scenario foo and bar are processed separately. bar causes baz to be instantiated. While translating foo is observed that the branch that calls bar is unreachable so it gets eliminated. Now it's the job of the linker to discard the unused baz::<i32> instance.

The scenario you are picturing is probably only achievable using fat LTO (codegen-units = 1), but I'm not 100% sure it's guaranteed to occur.

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Sep 4, 2018

Because you are not 100% sure whether you inlining it is better or not -- that's the job of LLVM. What you want to be sure of is that if it's not inlined then bar should still be placed in RAM and you would accomplish that with #[ramfunc] #[inline] fn bar(..) but #[ramfunc] must not imply #[inline(never)].

I don't buy it. People who fiddle with such optimisations should also be capable of figuring out whether inlining is worth it or not and not leave such a decision to LLVM... After all small changes in rustc, LLVM or the program may change the inlining decision, so in doubt an automatic decision shouldn't be trusted anyway.

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Sep 4, 2018

The main aspect here is respecting the declared wish of the developer: if a function is declared #[ramfunc] it has to go to RAM, the way to ensure that is by declaring #[inline(never)]. If there's a better way to ensure that -- no matter what -- a #[ramfunc] function will always end up in RAM (inlined in another #[ramfunc] or standalone), that's fine by me. Something that should not happen is that the compiler decides it would be better to inline a #[ramfunc] into code residing in flash...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment