[RFC] multi-core support #204

japaric · 2019-06-14T00:26:59Z

EDIT(2019-06-18): revised with both homogeneous and heterogeneous multi-core support and better documented the contract with the {device} crate.

Summary

Add experimental heterogeneous and homogeneous multi-core support behind
Cargo features.

Detailed design

Semantics

In multi-core applications each core will basically run its own independent RTFM
sub-application. Each core will have its own tasks and resources (static mut
variables) and the cores will be able to communicate using message passing:
cross-core spawn & schedule API.

It's not possible to share resources (static mut variables) between cores.
However, it is possible for one core to initialize another core's resources and
it's also possible to share static variables between cores.

Feature gating

The heterogeneous multi-core support will live behind a heterogeneous Cargo
feature considered experimental. Likewise, homogeneous multi-core support will
be gated by a homogeneous Cargo feature.

Semver exception

We'll reserve the right to do breaking changes to both multi-core APIs in patch
releases and will recommend end users to pin to a exact version of
cortex-m-rtfm when using either API (i.e. =0.5.0 in Cargo.toml).

It's unlikely that that these breaking changes will include changes in the
end-user API or the syntax of the #[rtfm::app] macro. The breaking changes we
foresee are at the level of the binary contract with the {device} crate.
There's no multi-core equivalent to the cortex-m-rt crate or
svd2rust-generated crates; we eventually will move to whatever the ecosystem
settles on -- this is a breaking change we expect to occur but want to avoid it
requiring a minor version bump for users of the single-core API.

Syntax

The #[app] attribute will gain a cores argument that indicates the number of
cores the application will use. The argument expects an integer value equal
or greater than 2. Omitting this argument indicates that the application is a
single-core application.

// dual-core application
#[rtfm::app(cores = 2)]
const APP: () = {
   // ..
};

In multi-core mode all tasks, #[init] and #[idle] functions need to indicate
in which core they'll run using the core argument.

// dual-core application
#[rtfm::app(cores = 2)]
const APP: () = {
    // core #0 initialization function
    #[init(core = 0)]
    fn init(c: init::Context) {
        // cross-core message passing
        c.spawn.ping(0).ok();
    }

    #[task(core = 0)]
    fn pong(c: pong::Context, x: u32) {
        c.spawn.ping(x + 1).ok();
    }

    // core #1 initialization function
    #[init(core = 1)]
    fn init(_: init::Context) {
        // ..
    }

    #[task(core = 1)]
    fn ping(c: ping::Context, x: u32) {
        c.spawn.pong(x + 1).ok();
    }
};

External interrupts will also need to indicate which core may use the
interrupt. This distinction is required because one core may have an interrupt
that the other cores don't have. The distinction is also required to forward the
interrupt attributes to the right task dispatcher.

#[rtfm::app(cores = 2)]
const APP: () = {
    extern "C" {
         // used only by core #0; this interrupt doesn't exist on core #0
         #[core = 0]
         fn EXTI5();

         // both cores have an EXTI0 interrupt
         #[core = 0]
         fn EXTI0();

         // but core #1 will place this interrupt (task dispatcher) in RAM
         #[core = 1]
         #[link_section = ".data.EXTI0"]
         fn EXTI0();
    }
};

There's no syntax to indicate which core owns which resource. Resource ownership
is inferred from the resources argument used in tasks, #[init] and #[idle]
functions.

#[rtfm::app(cores = 2)]
const APP: () = {
    // owned by core #0
    static mut X: u32 = 0;

    // owned by core #1
    static mut Y: u32 = 0;

    // shared by both cores
    static Z: AtomicBool = AtomicBool::new(false);

    #[task(core = 0, resources = [X, Z])]
    fn foo(_: foo::Context) {
        // ..
    }

    #[task(core = 1, resources = [Y, Z])]
    fn bar(_: bar::Context) {
        // ..
    }
};

`late`

To support flexible cross-core initialization of late resources the init
attribute will gain a late argument. This argument takes a list of late
resources (identifiers) that the #[init] function will initialize. This
argument can only be used in multi-core mode.

The late argument is not required in all scenarios. The framework will infer
which core initializes which resources based on the presence of the
LateResources return type. Some examples below:

Core #0 initializes all late resources

#[rtfm::app(cores = 2)]
const APP: () = {
    static mut X: u32 = ();
    static mut Y: u32 = ();

    #[init(core = 0)]
    fn init(_: init::Context) -> init::LateResources {
        init::LateResources { X: 0, Y: 1 }
    }

    #[init(core = 1)]
    fn init(_: init::Context) {
        // ..
    }
};

Initialization is split between the two cores

#[rtfm::app(cores = 2)]
const APP: () = {
    static mut X: u32 = ();
    static mut Y: u32 = ();

    #[init(core = 0, late = [X])]
    fn init(_: init::Context) -> init::LateResources {
        init::LateResources { X: 0 }
    }

    #[init(core = 1)]
    fn init(_: init::Context) -> init::LateResources {
        init::LateResources { Y: 0 }
    }
};

This is an error: it's ambiguous how late resources should be split between
the cores. Use the late argument to disambiguate.

#[rtfm::app(cores = 2)]
const APP: () = {
    static mut X: u32 = ();
    static mut Y: u32 = ();

    #[init(core = 0)]
    fn init(_: init::Context) -> init::LateResources {
        // ..
    }

    #[init(core = 1)]
    fn init(_: init::Context) -> init::LateResources {
        // ..
    }
};

Compile-time checks

The #[app] macro will reject applications where a resource is shared between
different cores.

#[rtfm::app(cores = 2)]
const APP: () = {
    static mut X: u32 = 0;
    static mut Y: u32 = 0;

    #[task(core = 0, resources = [X])]
    fn foo(_: foo::Context) {
        // ..
    }

    #[task(core = 0, resources = [X, Y])]
    fn bar(_: bar::Context) {
        // ..
    }

    // error: `Y` can't be shared between cores
    #[task(core = 1, resources = [Y])]
    fn baz(_: baz::Context) {
        // ..
    }
};

The #[app] macro will also reject applications where more than one software
task is given the same name.

#[rtfm::app(cores = 2)]
const APP: () = {
    #[task(core = 0)]
    fn foo(_: foo::Context) {
        // ..
    }

    // error: `foo` identifier has already been used
    #[task(core = 1, spawn = [foo])]
    fn foo(_: foo::Context) {
        // rationale: ambiguity -- which task is spawned by this operation?
        c.spawn.foo().ok();
    }
}

It is possible, however, to name hardware tasks the same as long as the same
name is not used within one core.

#[rtfm::app(cores = 2)]
const APP: () = {
    // ..

    #[interrupt(core = 0, binds = EXTI0, resources = [X])]
    fn foo(_: foo::Context) {
        // refers to core #0's `foo` -- this struct contains a `X` field
        let _: foo::Resources = c.resources;
        // ..
    }

    // this is OK
    #[interrupt(core = 1, binds = EXTI1, resources = [Y])]
    fn foo(c: foo::Context) {
        // refers to core #1's `foo` -- this struct contains a `Y` field
        let _: foo::Resources = c.resources;
        // ..
    }
}

It is also possible to have each core bind the "same" Cortex-M exception. The
reason for that is that each core has its own set of independent Cortex-M
exceptions.

#[rtfm::app(cores = 2)]
const APP: () = {
    // ..

    #[exception(core = 0, binds = SysTick)]
    fn foo(_: foo::Context) {
        // .. fires when core #0's system timer times out ..
    }

    // this is OK
    #[exception(core = 1, binds = SysTick)]
    fn bar(c: bar::Context) {
        // .. fires when core #1's system timer times out ..
    }
}

Synchronization barriers

During the initialization phase synchronization barriers may be required for
correctness or memory safety. The #[app] macro will insert them where
required. Some examples of where they are required:

Before core #0 can send a message to (spawna task on) core #1, core #1
must be out of the reset state. Thus core #0 must wait until core #1 has
booted before it attempts to send a message. This synchronization barrier
could be located before init is invoked or after init returns but before
interrupts are enabled.
If core #0 initializes a resource owned by core #1 then core #1 must wait
until the resource has been initialized before it enables its interrupts.

Heterogeneous vs homogeneous

When writing a multi-core application the user needs to pick between the
homogeneous and heterogeneous features.

With the homogeneous feature the application will be compiled for a single
(compilation) target and the output will be a single ELF image. This variant is
meant to be used with homogeneous multi-core (e.g. 2x Cortex-M33 cores)
devices, however it is also possible to use it with heterogeneous multi-core
devices with compatible instruction sets. For example, the thumbv6m-none-eabi
compilation target can be used to target a Cortex-M4F + Cortex-M0+ device;
however, one would not be able to use the AtomicU32.fetch_add API or
hardware accelerated floating point math on the M4F core using this approach.

With the heterogeneous feature the same application will be compiled for
multiple (compilation) targets and the output will N (N > 1) ELF images. This
variant is meant to be used with heterogeneous multi-core devices. For
example, combining the thumbv7em-none-eabihf and thumbv6m-none-eabi
(compilation) targets would let the programmer fully utilize the features of a
Cortex-M4F + Cortex-M0+ device.

Examples of homogeneous RTFM applications can be found in the lpcxpresso55s69 repository.

Examples of heterogeneous RTFM applications can be found in the lpcxpresso54114 repository.

`cargo-microamp`

To support heterogeneous multi-core devices the cortex-m-rtfm crate will
leverage v0.1.0 of the microamp framework. Meaning that heterogeneous RTFM
applications will need to be build using the cargo-microamp subcommand.

$ # build a dual-core application
$ # use the ARMv7-EM compilation target for the first core
$ # and the ARMv6-M compilation target for the second core

$ cargo microamp  \
    --example xspawn \
    --target thumbv7em-none-eabihf,thumbv6m-none-eabi \
    --release

$ # this produces two images
$ size target/*/release/examples/xspawn-{0,1}
   text    data     bss     dec     hex filename
   3132      26       4    3162     c5a target/thumbv7em-none-eabihf/release/examples/xspawn-0
    866      26       4     896     380 target/thumbv6m-none-eabi/release/examples/xspawn-1

This also means that the {device} crate or the application author will have to
specify the memory layout of each image using core*.x linker scripts. See the
documentation of the microamp framework for more details.

Device crate

In single-core mode the device argument takes a path to a crate generated
using svd2rust. As svd2rust doesn't support multi-core devices this section
describes what RTFM expects of the {device} module / crate in multi-core mode.

Boot process

In multi-core mode the cortex-m-rtfm crate will not link to the
cortex-m-rt crate; This crate takes care of initializing static variables but
only supports single-core systems.

Multi-core systems have complex and non-standardized boot processes. RTFM
expects that the {device} crate takes care of initializing static variables
and booting all the cores in the system. It also expects the {device} crate to
call a function named main (in heterogeneous mode) or main_{i} (in
homogeneous mode) after the memory initialization is complete.

By the time control is transferred to main* RTFM expects that:

Each image (core) .bss and .data sections have been initialized
heterogeneous mode only: The .shared section, which all images share, has
been initialized exactly once

Vector table

In single-core mode the {device} crate places the device-specific part of the
vector table in the right memory location. This vector table contains several
interrupt handler symbols that are weakly aliased to some default handler. The
{device} crate also provides an Interrupt enumeration of all the interrupts
the device has; the names of the variants of this enum must match the symbol
names of the interrupt handlers in the vector table. RTFM uses this fact to
check if the interrupts specified by the user exist on the device. This
enumeration also implements the bare_metal::Nr trait which maps each variant
to its position in the vector table.

In multi-core mode, RTFM expects the {device} crate to provide a full
vector table for each core. Because each core could dispatch different
interrupts, instead of a single Interrupt enum RTFM expects one enum per
core named Interrupt_0, Interrupt_1, etc.

In homogeneous mode a single image is produced. Because more than one core may
service the same interrupt a suffix is added to all interrupt handler symbols
to prevent symbol collisions. The suffix will have the form _{i} where {i}
is the core number which will be in the inclusive range 0..={cores}. Note that
these suffixes do not appear in application code.

// `homogeneous` application
#[rtfm::app(device = .., cores = 2)]
const APP: () = {
    // actually binds to `EXTI1_0`
    #[interrupt(core = 0, binds = EXTI1)]
    fn a(_: a::Context) {
        // ..
    }

    // actually binds to `EXTI0_1`
    #[interrupt(core = 1, binds = EXTI0)]
    fn b(_: b::Context) {
        // ..
    }
};

`xpend`

The NVIC peripheral has no mechanism to "pend" an interrupt on a different core.
Also the NVIC peripheral sits on a private bus so one core can not access the
NVIC peripheral of other cores. Thus interrupt signaling between cores is a
device specific feature.

To accommodate this fact the runtime expects the {device} crate / module to
contain an xpend function that implements cross-core interrupt signaling.
The xpend function must have the following signature:

use bare_metal::Nr; // "interrupt number" trait

/// Pends the `interrupt` on the `receiver` core
pub fn xpend(receiver: u8, interrupt: impl Nr) {
    // ..
}

The runtime will only use this function for cross-core interrupt signaling. So
the implementer can rely on the fact that RTFM will never invoke
xpend(0, some_interrupt) from core #0. However, the end user might use
the API like that.

For an example implementation of this function refer to the lpc541xx
prototype, which uses the device-specific MAILBOX peripheral.

Unresolved questions

Hardware tasks

Should we allow more than one core bind a hardware task to the same interrupt?
This seems like it should be supported by the hardware: the semantics would be
that both cores will start executing some hardware task when a peripheral
fires the corresponding interrupt signal. Note that if one core executes
rtfm::pend(SomeInterrupt) this will have no effect on other cores.

#[rtfm::app(cores = 2)]
const APP: () = {
    // both foo and bar will start and run (in parallel) when the CTIMER0 times out

    #[interrupt(core = 0, binds = CTIMER0)]
    fn foo(_: foo::Context) {
        // ..
    }

    #[interrupt(core = 1, binds = CTIMER0)]
    fn bar(_: bar::Context) {
        // ..
    }
};

`pend`

As Interrupt_{i} and xpend are public and safe APIs these incorrect
operations are possible:

#[rtfm::app(cores = 2, device = pac)]
const APP: () = {
    #[init(core = 0)]
    fn a(_: a::Context) {
        // interrupt `Foo` doesn't exist on this core
        // this could be a no-op or could invoke a different interrupt on this core
        rtfm::pend(Interrupt_1::Foo);

        // interrupt `Bar` doesn't exist on core #1
        pac::xpend(1, Interrupt_1::Bar)
    }
};

Note that none of these operation break memory safety. The question is whether
we want to walk the extra mile and try to use the type system to prevent these
operations.

Do consider that we do not try to prevent any of these incorrect, but memory
safe, operations in single-core mode:

#[rtfm::app(device = pac)]
const APP: () = {
    #[init]
    fn init(_: init::Context) {
        // this invokes the `EXTI1` task distpacher but `foo` is not executed because
        // no message was sent
        rtfm::pend(Interrupt::EXTI1);

        // no task has been bound to this interrupt
        // this invokes the default interrupt handler
        rtfm::pend(Interrupt::EXTI2);
    }

    #[interrupt]
    fn EXTI0(_: EXTI0::Context) {
        // ..
    }

    #[task]
    fn foo(_: foo::Context, input: u32) {
        // ..
    }

    extern "C" {
        // dispatches task `foo`
        fn EXTI1();
    }
};

The text was updated successfully, but these errors were encountered:

eddyp · 2019-06-15T00:56:00Z

@japaric Regarding the unanwered question, it is possible that the same interrupt number have different meanings on the same SoC, especially if one core is M class and another is A class (and on armv8, there needs to be some duplication, depending on the EL level.

japaric · 2019-06-18T20:12:38Z

I have revised the RFC to include homogeneous multi-core support and have better documented the contract with the {device} crate. Examples of the homogeneous multi-core support can be found in the lpcxpresso55S69 repository, though the examples are unexciting because the API is still the same.

I have also added a new unresolved question about compile time checking uses of the xpend API from the application.

@eddyp There's an Interrupt enumeration per core that maps the interrupt name to its interrupt number. The runtime will use the appropriate enumeration when signaling interrupts and when configuring the interrupts in the NVIC so I don't think that would be an issue.

korken89 · 2019-06-25T18:16:35Z

Overall I am very positive to this as I was not sure how SRP would handle the multi-core support!
Are there (cheap) development boards to test this on? This is the kind of feature that needs a lot of evaluation IMO :)

japaric · 2019-06-29T14:10:34Z

I think we can go ahead and FCP merge this. There isn't much choice in term of syntax and the biggest piece of design work before this can be called stable is going to be the contract with the device crate which is probably going to see quite a bit of iteration (see RFC #211 already) but the RFC let's iterate that part in patch releases so I think we are good on that front.

Are there (cheap) development boards to test this on?

I have been using the lpcxpresso54114 (+pyocd) in heterogeneous mode and lpcxpresso55S69 (+jlink) in homogeneous mode (though this device is asymmetric the first core has an FPU so one should use a different compilation target for that one) with great success. The support for debugging (GDB) the second core is rather bad, IME, so you have to be a bit creative to figure out what the second core is doing :-).

I also tried the nucleo WB55 but that was a waste because the second core comes with locked down firmware (read / write protection is enabled) which seems like it going to be pain in the neck to overwrite with any firmware other than ST's signed blobs. (plus this device has a radio with almost zero documentation about it; just 2 pages in the reference manual that include a block diagram of the radio and no register map)

TeXitoi · 2019-06-29T15:41:47Z

OK for me

korken89 · 2019-07-02T07:18:08Z

Cool, I will be playing a bit with this!
Maybe the debug tool @therealprof and someone more was working on is a good candidate here for looking into tooling?

japaric · 2019-07-08T09:54:32Z

🎉 This RFC has been formally approved. Implementation is in PR #205 (I'm going to keep this open until the PR lands).

205: rtfm-syntax refactor + heterogeneous multi-core support r=japaric a=japaric this PR implements RFCs #178, #198, #199, #200, #201, #203 (only the refactor part), #204, #207, #211 and #212. most cfail tests have been removed because the test suite of `rtfm-syntax` already tests what was being tested here. The `rtfm-syntax` crate also has tests for the analysis pass which we didn't have here -- that test suite contains a regression test for #183. the remaining cfail tests have been upgraded into UI test so we can more thoroughly check / test the error message presented to the end user. the cpass tests have been converted into plain examples EDIT: I forgot, there are some examples of the multi-core API for the LPC541xx in [this repository](https://github.com/japaric/lpcxpresso54114) people that would like to try out this API but have no hardware can try out the x86_64 [Linux port] which also has multi-core support. [Linux port]: https://github.com/japaric/linux-rtfm closes #178 #198 #199 #200 #201 #203 #204 #207 #211 #212 closes #163 cc #209 (documents how to deal with errors) Co-authored-by: Jorge Aparicio <jorge@japaric.io>

japaric · 2019-09-15T17:26:52Z

Done in PR #205

204: Slightly improve the bash scripts r=adamgreig a=jonas-schievink cf. rust-embedded/cortex-m#165 Co-authored-by: Jonas Schievink <jonasschievink@gmail.com>

japaric added the RFC This issue needs you input! label Jun 14, 2019

japaric added this to the v0.5.0 milestone Jun 14, 2019

japaric mentioned this issue Jun 14, 2019

rtfm-syntax refactor + heterogeneous multi-core support #205

Merged

japaric changed the title ~~[RFC] heterogeneous multi-core support~~ [RFC] multi-core support Jun 19, 2019

japaric mentioned this issue Jun 29, 2019

[RFC] [multi-core] #[shared] and placement of code and data #211

Closed

japaric added the disposition-merge label Jun 29, 2019

japaric added S-accepted This RFC has been accepted but not yet implemented and removed disposition-merge labels Jul 8, 2019

japaric mentioned this issue Aug 21, 2019

document the multi-core API #227

Closed

japaric closed this as completed Sep 15, 2019

vn-ki mentioned this issue Jan 11, 2021

multicore brainstorming vn-ki/harsark.rs#2

Open

3 tasks

andrewgazelka pushed a commit to andrewgazelka/cortex-m-rtic that referenced this issue Nov 3, 2021

Merge rtic-rs#204

965bf1e

204: Slightly improve the bash scripts r=adamgreig a=jonas-schievink cf. rust-embedded/cortex-m#165 Co-authored-by: Jonas Schievink <jonasschievink@gmail.com>

explocion mentioned this issue Oct 4, 2023

Documentation of Multicore Support in RTIC #814

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] multi-core support #204

[RFC] multi-core support #204

japaric commented Jun 14, 2019 •

edited

eddyp commented Jun 15, 2019

japaric commented Jun 18, 2019

korken89 commented Jun 25, 2019

japaric commented Jun 29, 2019 •

edited

TeXitoi commented Jun 29, 2019

korken89 commented Jul 2, 2019

japaric commented Jul 8, 2019

japaric commented Sep 15, 2019

[RFC] multi-core support #204

[RFC] multi-core support #204

Comments

japaric commented Jun 14, 2019 • edited

Summary

Detailed design

Semantics

Feature gating

Semver exception

Syntax

late

Compile-time checks

Synchronization barriers

Heterogeneous vs homogeneous

cargo-microamp

Device crate

Boot process

Vector table

xpend

Unresolved questions

Hardware tasks

pend

eddyp commented Jun 15, 2019

japaric commented Jun 18, 2019

korken89 commented Jun 25, 2019

japaric commented Jun 29, 2019 • edited

TeXitoi commented Jun 29, 2019

korken89 commented Jul 2, 2019

japaric commented Jul 8, 2019

japaric commented Sep 15, 2019

japaric commented Jun 14, 2019 •

edited

`late`

`cargo-microamp`

`xpend`

`pend`

japaric commented Jun 29, 2019 •

edited