Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable assembly operations #63

Closed
1 task
japaric opened this issue Mar 14, 2018 · 43 comments
Closed
1 task

Stable assembly operations #63

japaric opened this issue Mar 14, 2018 · 43 comments

Comments

@japaric
Copy link
Member

japaric commented Mar 14, 2018

Triage(2018-08-21)

A pre-RFC discussing what should go in core::arch::arm have been opened in #184

Update - 2018-07-27

Intrinsics like __NOP are now available in core::arch::arm. Path towards stabilization is being discussed in rust-lang/stdarch#518 (comment).

Help wanted

We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm. Details in #63 (comment).


One of the features that ties embedded development to the nightly channel is the asm! macro, and
by extension the global_asm! macro.

In some cases it's possible to turn an unstable asm! call into a stable FFI call that invokes some
subroutine that comes from a pre-compiled external assembly file. This alternative comes at the cost
of a function call overhead per asm! invocation.

In other cases the function call overhead of the FFI call breaks the intended semantics of the
original asm! call; this can be seen when reading registers like the Program Counter (PC) or the
Link Register (LR).

The asm! feature is hard to stabilize because it's directly tied to LLVM; rustc literally passes
the contents of asm! invocations to LLVM's internal assembler. It's not possible to guarantee that
the syntax of LLVM assembly won't change across LLVM releases so stabilizing the asm! feature in
its current form is not possible.

This ticket is about exploring making operations that require assembly available on the stable
channel. This proposal is not about stabilizing the asm! macro itself.

The main idea is that everything that requires assembly and can be implemented as external assembly
files should use that approach. For everything that requires inlining the assembly operation the
proposal is that the core crate will provide such functionality.

This idea is not unlike what's currently being done in stdsimd land: core::arch provides
functions that are thin wrappers around unstable LLVM intrinsics that provide SIMD functionality.
Similarly core::asm (module up for bikeshedding) would provide functionality that requires asm!
but in a stable fashion. Example below:

mod asm {
    mod arm {
        #[inline(always)]
        #[stable]
        pub fn pc() -> u32 {
            let r;
            unsafe { asm!("mov $0,R15" : "=r"(r) ::: "volatile") }
            r
        }

        #[inline(always)]
        #[stable]
        pub fn lr() -> u32 {
            let r;
            unsafe { asm!("mov $0,R14" : "=r"(r) ::: "volatile") }
            r
        }
    }
}

This way the functionality can be preserved across LLVM upgrades; the maintainers of the core
crate would update the implementation to match the current LLVM assembly syntax.

TODO (outdated)

  • Come up with a list of assembly operations that are used in practice and that would need to be
    provided in core::asm.
    • I would appreciate some help making a Proof of Concept PR against cortex-m that moves
      most of the assembly operations to external assembly files by enabling some Cargo feature. What can't be moved to assembly files would be a candidate for inclusion in core::asm.
    • AVR
    • MSP430
    • RISCV
  • The team will bring up this idea to the lang / compiler team during the Rust All Hands event.

cc @gnzlbg @nagisa could we get your thoughts on this? does this sound feasible, or does this sound
like a bad idea?

cc @dvc94ch @dylanmckay @pftbest could you help me identify assembly operations that require inline assemly on AVR, MSP430 and RICSV?

@gnzlbg
Copy link

gnzlbg commented Mar 14, 2018

Is there any sequence of instructions for which a single call to an asm! macro containing the whole sequence is not equivalent to a sequence of asm! macro calls with one call per instruction?

I am thinking here about compiling in debug mode without optimizations were the code generated might differ, potentially in ways that break it.

@nagisa
Copy link

nagisa commented Mar 14, 2018

I’ve always been a strong proponent of the "intrinsic" functions that map down to a single instruction. For example, instead of writing asm!("cpsid" ...), you’d call the function that would do the same, look the same and behave mostly the same as the asm call. Not only it is harder to mess up using these intrinsics, but compiler also gets more information about the code.

Of course they have their downsides, but for most of the use cases in embedded, intrinsics are more than sufficient. In my experience using the IAR toolchain, which exposes such intrinsics for most of the important instructions, I’ve only had to maintain assembly to implement exactly two things: atomic byte increment and the context switch handler. Both of them implemented in an external assembly file.

I believe that we could work on adding intrinsics to rustc (LLVM doesn’t have a platform intrinsic for ARM cpsid, for example, but it has a llvm.returnaddress for lr, llvm.read_cycle_counter for cycle count and llvm.read_register to read arbitrary registers) and then the wrappers to stdsimd.

... So basically, what your proposal says.

@nagisa
Copy link

nagisa commented Mar 14, 2018

@gnzlbg yes, compiler is free to insert whatever code it wishes between the separate asm! statements to satisfy the constraints. For example it could decide to spill and reload the registers between two asm! calls.

@gnzlbg
Copy link

gnzlbg commented Mar 14, 2018

Thanks @nagisa that makes sense.

So my opinion is that it only makes sense to do this after we have stabilized inline assembly, which kind of defeats the point of doing this in the first place.

IIUC the idea behind this is that if we expose the hardware ISA via intrinsics then nothing will break if LLVM changes the syntax or semantics of inline assembly because we will be able to fix this in core.

But this is the problem: core will break.

That is, somebody will need to fix core, and the amount of work required to fix core is linearly proportional to the usage of inline assembly within core. This might be a lot of work already, but if we go this route that will turn into a titanic amount of work.

The same would happen if we decided to add another backed to rustc and had to reimplement all usages of inline assembly in "Cretonne-syntax".

So IMO the only way to make upgrading to a new syntax/backend realistic is to turn the porting effort from O(N) in the usage of inline assembly to O(1).

The only way I can think of to achieve this is to stabilize the inline assembly macro enough (e.g. like this https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443), so that upgrading can be done in Rust by "just" mapping Rust's inline assembly syntax to whatever syntax a new or upgraded backed uses. This will be a lot of work, but at least its independent from how often inline assembly is used.

Once we are there, we might just as well stabilize inline assembly instead of pursuing this. Libraries that implement this can be developed in the ecosystem, and "core" intrinsics can continue to be added to stdsimd which already exposes some intrinsics via inline assembly (some core stdsimd modules like run-time feature detection are built around inline assembly).

Then there is also the issue that two consecutive asm! calls allow LLVM to insert code in between. This might not matter often in practice, but sounds a bit brittle to me.

@nagisa
Copy link

nagisa commented Mar 14, 2018

The intrinsics would be implemented in the backend, rather than libcore. While it is true, that it would increase burden when upgrading a backend, it wouldn’t be any greater than the burden of adapting whatever we stabilise as our inline assembly implementation.

@nagisa
Copy link

nagisa commented Mar 14, 2018

Then there is also the issue that two consecutive asm! calls allow LLVM to insert code in between. This might not matter often in practice, but sounds a bit brittle to me.

With volatile assembly statements, the "code" would be limited to code that would be necessary to satisfy the constraints of the asm! statements. So, I think just moves from and to memory.

@dvc94ch
Copy link
Member

dvc94ch commented Mar 14, 2018

why can't global_asm! be stabilized? Isn't that just like an external assembly file?

for riscv it would be quite a few intrinsics I believe. this is what I'm doing to r/w csr regs:

#[cfg(target_arch = "riscv")]
macro_rules! csr_asm {
    ($op:ident, $csr:expr, $value:expr) => (
        {
            let res: usize;
            unsafe {
                asm!(concat!(stringify!($op), " $0, ", stringify!($csr), ", $1")
                     : "=r"(res)
                     : "r"($value)
                     :
                     : "volatile");
            }
            res
        }
    )
}

the $csr value isn't an operand that can be loaded from a register, so there would need to be an intrinsic for each csr.

@jcsoo
Copy link

jcsoo commented Mar 14, 2018

For reference, here is ARM's C Language Extensions 2.0 document:

ARM® C Language Extensions Release 2.0

I believe these were implemented by ARM's in-house compilers and by GCC according to this page: 6.59.7 ARM C Language Extensions (ACLE)

I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics. Having a standard to point to should make it easier to get them implemented.

For instance, section "8.4 Hints" describe wfi, wfe, sev, sevl, yield, and dbg. "8.7" describes nop, which I think already has a LLVM intrinsic defined since pretty much all instruction sets have some variant of that instruction. "10.1 Special Registers" cover a variety of special registers.

@dylanmckay
Copy link
Contributor

could you help me identify assembly operations that require inline assemly on AVR

I've taken a look over the full instruction set listing, here's what stands out

Assembly operations that are often required for nontrivial programs:

  • cli/sli instructions to globally enable or disable interrupts

Somewhat more obscure stuff:

  • If watchdog timers are enabled by the programmer, the wdr instruction to be executed regularly otherwise the chip will immediately reset. WDR is short for "watchdog timer reset"
  • The des instruction for DES encryption. Most existing software would do this in, well, software, so not major
  • The sleep instruction - a la std::thread::yield()

Unlike AVR-GCC, LLVM transparently handles accesses to and from program memory, meaning that that whole class of operations doesn't require asm! magic.

Although currently unimplemented, I believe LLVM's atomic intrinsics directly could be mapped to the RMW atomic instructions (FWIW these map to something like cli; <operation>; sei

@gnzlbg
Copy link

gnzlbg commented Mar 17, 2018

I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics.

stdsimd does offer some v7 and v8 intrinsics, at least rev and rbit.

@japaric
Copy link
Member Author

japaric commented Apr 3, 2018

I discussed this with @alexcrichton during Rust All Hands and he said he was fine with adding an stable API for assembly ops that have an unstable implementation (e.g. inline asm!). We'll make an RFC for the ARM Cortex-M ops, but first we have to figure out exactly what ops need to be inlined (will likely be less than 10 ops, maybe just 5 or so).

@japaric japaric added this to the 2018 edition milestone Apr 3, 2018
@gnzlbg
Copy link

gnzlbg commented Apr 3, 2018

@japaric would those go into std::arch ?

@japaric
Copy link
Member Author

japaric commented Apr 3, 2018

@gnzlbg That can be bikeshed in the RFC

@jcsoo
Copy link

jcsoo commented Apr 26, 2018

In order to understand which intrinsics might be needed for the various Cortex-M processors, I
extracted these two lists from the CMSIS-Core reference, excluding armv8-only instructions.

The first group of intrinsics is "Core Register Access". These are mainly wrappers around
the CPSID, CPSIE, MRS, and MSR instructions, except for get_FPSCR and set_FPSCR.

https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html

// Implemented using CPSID and CPSIE

disable_fault_irq - CPSID f
disable_irq - CPSID i
enable_fault_irq - CPSIE f
enable_irq - CPSIE e

// Mostly implemented using MRS

get_ASPR
get_BASEPRI
get_CONTROL
get_FAULTMASK
get_FPSCR - M4, M7
get_IPSR
get_MSP
get_PRIMASK
get_PSP
get_xPSR

// Mostly implemented using MSR

set_ASPR
set_BASEPRI
set_CONTROL
set_FAULTMASK
set_FPSCR - M4, M7
set_IPSR
set_MSP
set_PRIMASK
set_PSP
set_xPSR

The second group of intrinsics provides access to CPU instructions. Each of these
is a wrapper around a specific instruction.

Some of these instructions have direct equivalents in core::intrinsics or
are used for implementing numeric primitives. There are also many instructions
that are likely used in the existing atomics support.

https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__CPU__gr.html

* - in core or core::intrinsics or stdsimd
** - atomic

_NOP
_WFI - Wait For Interrupt
_WFE - Wait For Event
_SEV - Send Event
_BKPT* - Set Breakpoint - core::intrinsics::breakpoint()
_ISB - Instruction Synchronization Barrier
_DSB - Data Synchronization Barrier
_REV* - Reverse Byte Order (32 bit) - u32.swap_bytes()
_REV16* - Reverse Byte Order (16 bit) - u16.swap_bytes()
_REVSH - Reverse Byte Order Signed (16 bit)
_RBIT* - Reverse Bit Order
_ROR* - Rotate Right - u32.rotate_right()
_LDREXB** - LDR Exclusive (8 bit) - Not M0, M0+
_LDREXH** - LDR Exclusive (16 bit) - Not M0, M0+
_LDREXW** - LDR Exclusive (32 bit) - Not M0, M0+
_STREXB** - STR Exclusive (8 bit) - Not M0, M0+
_STREXH** - STR Exclusive (16 bit) - Not M0, M0+
_STREXW** - STR Exclusive (32 bit) - Not M0, M0+
_CLREX** - Remove Exclusive Lock - Not M0, M0+
_SSAT - Signed Saturate - Not M0, M0+
_USAT - Unsigned Saturate - Not M0, M0+
_CLZ* - Count Leading Zeros - u32.leading_zeros()
_RRX - Rotate Right with Extend (32 bit)
_LDRBT - LDRT Unprivileged (8 bit)
_LDRHT - LDRT Unprivileged (16 bit)
_LDRT - LDRT Unprivileged (32 bit)
_STRBT - STRT Unprivileged (8 bit)
_STRHT - STRHT Unprivileged (16 bit)
_STRT - STRT Unprivileged (32 bit)

In the first group, disable_fault_irq, disable_irq, enable_fault_irq, enable_irq seem to be pretty critical. The rest of the get_ / set_ functions are more specialized.

In the second group, NOP, WFI, WFE, SEV, ISB, DSB are ones that I am familiar with and use often. REVSH, RBIT, and RRX seem like they would be primarily for optimization. SSAT and USAT provide more flexibility than the core saturated math primitives by allowing selection of a bit width. LDRxT and STRxT are mainly for unprivileged access checking. The rest should be covered by built-ins and atomics.

@gnzlbg
Copy link

gnzlbg commented Apr 26, 2018

Do gcc, clang, or msvc provide any of these as functions ?

@jcsoo
Copy link

jcsoo commented Apr 26, 2018

As far as I can tell, all three implement them via the ACLE (ARM C Language Extensions) specification.

gcc - 6.59.7 ARM C Language Extensions (ACLE)

clang - The arm_acle.h header is shown in their documentation

msvc - ARM Intrinsics

@gnzlbg
Copy link

gnzlbg commented Apr 26, 2018

All of it?

The arm_acle.h header is exposed via std::arch::{arm,aarch64}. If all you need is arm_acle.h this doesn't even need an RFC. The only thing required for these to land on nightly is for someone to implement the missing functions in coresimd. For this to land in stable, an automatic verification of the header against the ACLE specification must be enabled in stdsimd, and after that, a mini-FCP would probably be enough.

@jcsoo
Copy link

jcsoo commented Apr 27, 2018

Thanks for the pointer back to coresimd - I looked at this briefly a while back but didn't dig in to figure out the specifics of how to get the additional intrinsics implemented.

coresimd/simd_llvm.rs provides the clue:

extern "platform-intrinsic" {
    pub fn simd_eq<T, U>(x: T, y: T) -> U;
    pub fn simd_ne<T, U>(x: T, y: T) -> U;
    pub fn simd_lt<T, U>(x: T, y: T) -> U;
    pub fn simd_le<T, U>(x: T, y: T) -> U;
    pub fn simd_gt<T, U>(x: T, y: T) -> U;
    pub fn simd_ge<T, U>(x: T, y: T) -> U;
    ...
}

Unfortunately onlysimd intrinsics are listed there. I'd never seen the platform-intrinsic extern type before, but after a bit of digging I found where they are defined: rust/src/etc/platform-intrinsics/arm.json.

Looking further, stdsimd issue #112 mentions link_llvm_intrinsic which enables
using #[link_name="llvm.*"] to access LLVM intrinsics directly.

Implement all x86 vendor intrinsics shows how to create new intrinsics and also gave me enough information to find the list of LLVM arm intrinsics: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td

So, as @gnzlbg points out, we should be able to do this through pull requests to stdsimd.

@gnzlbg
Copy link

gnzlbg commented Apr 27, 2018

For ACLE using link_llvm_intrinsics is the way to go. If there is an intrinsic that cannot be implemented that way we'll handle those on a case-by-case basis.

@japaric
Copy link
Member Author

japaric commented May 10, 2018

Update: rust-lang/stdarch#437 is tracking adding these assembly operations (instructions) to stdsimd.

@japaric
Copy link
Member Author

japaric commented Jul 3, 2018

Discussed (triaged) in the last meeting:

This is more of a nice to have as it's not required for embedded Rust on stable. If we are to get this done by the edition release these are the final deadlines:

@japaric
Copy link
Member Author

japaric commented Jul 27, 2018

Triage: Several CMSIS intrinsics have been implemented and are now available in core::arch::arm. The linked API documentation may not correctly reflect what's currently available in the latest nightly but you can check the source code: cmsis.rs and dsp.rs.

Most of the functionality in cortex-m that requires external assembly is now provided in core::arch::arm as an unstable API, with the notable exception of the BKPT instruction which will be added in rust-lang/stdarch#540 once its API and implementation are decided.

The stabilization path for the non-SIMD subset of these intrinsics is being discussed in rust-lang/stdarch#518 (comment).

@japaric
Copy link
Member Author

japaric commented Aug 7, 2018

As this mainly affects the Cortex-M ecosystem we should have someone on the @rust-embedded/cortex-m team champion this work. (This doesn't mean that you have to implement this; mentoring / helping a collaborator to implement this is also valid).

Solving rust-lang/stdarch#437 (comment) is the first step towards implementing this.

@korken89
Copy link
Contributor

korken89 commented Aug 7, 2018

I have been doing work with asm instructions before, so I can certainly help. Reading the referenced comments and PRs I do not quite see the issue, as it is started the difference between HAL and compiler spec.

@japaric To clarify, do you want help simply pushing it forward (is seems you already did the implementation) or to make the decision on CMSIS / ACLE discussion?

@japaric
Copy link
Member Author

japaric commented Aug 7, 2018

@korken89 we want to stabilize the non-SIMD instructions (WFI, CPSID, etc.) as that would let us drop the build dependency on arm-none-eabi-gcc in a few crates (cortex-m and cortex-m-rt) w/o requiring nightly. The details about the contents of the RFC are in #63 (comment). One of the questions that needs to be answered to write the RFC is ACLE vs CMSIS. The above comment has link(s) to more discussions on that topic.

@korken89
Copy link
Contributor

korken89 commented Aug 7, 2018

@japaric Thanks for the clarification! I have never written an RFC before, but I'd very much like to learn.
So if no-one else with more experience wants to take it (or if my lack of experience is a problem), I can do this. Would be a good learning opportunity for me in the procedures of the Rust ecosystem.

@gnzlbg
Copy link

gnzlbg commented Aug 7, 2018

A summary of "What is ACLE" and "What is CMSIS" from the point-of-view of "What should a compiler implement" would probably be a good start. It isn't really necessary to put that into an RFC or anything right now. Just posting it there as a comment would be enough to keep the discussion moving and unblock future work.

The only thing we are waiting for right now is for somebody to make a good case for which intrinsics should be implemented and why. We tried that before, but we have learned new things, so it is time to briefly re-evaluate whether CMSIS was the right choice or not (or whether we should provide only ACLE, or also ACLE, etc.).

@andre-richter
Copy link
Member

andre-richter commented Aug 7, 2018

Hmm the last two comments there, including yours, kind of settle it from a logical point of view. It should be ACLE.

Any reason you reconsider that statement here?

@gnzlbg
Copy link

gnzlbg commented Aug 7, 2018

Well, I have no idea what ACLE and CMSIS are, never used them, etc. The only things I know about ARM are from reviewing stdsimd PRs and implementing some bit and pieces here and there, and I honestly have the feeling that I still haven't gorked how the whole ARM architecture ecosystem fits together. So from my point-of-view, my statements about ARM should have zero weight.

What I was hoping is that somebody would come along the thread with an authoritative answer or source of truth about how std::arch should be organized for the ARM architectures at least with respect to CMSIS and ACLE since those are the pieces that were currently being implemented and generating some confusion. Something of the form "this is the technical document with the APIs that ARM says that compilers must implement" / "here is a PDF of the spec" / "here are json / xml files describing the spec APIs in a programatic way", etc.

We have been making lists about how to organize std::arch for a year, and every time something new gets implemented, we discover that the current organization is wrong in some new way. I find this very unsettling and it makes me very wary of stabilizing anything until a big chunk of it is actually implemented and we can all see how it all fits together :/

Are the any resources that explain how the whole family of architectures / ISAs are related, which extensions are there, which can be changed dynamically, etc. ? Maybe that would help me. The whole v5/.../v8 architecture families +/- 32/64 +/- thumb +/- neon/... +- {r,m,..}class +- hard/soft floats +/-... is like a big spaghetti mess inside my head. It is hard to me to tell which code is allowed in which binary for which architecture. What are the ISAs, what are the ISA extensions, which code for which extensions can I include in my binary as long as I only execute it on hardware that implements the extension (that is, which extensions can actually be detected and used at run-time). For example, I can generate an arm64 binary that contains neon instructions as long as these are not executed at run-time, but IIUC the same cannot be done for arm32v7+neon.

@andre-richter
Copy link
Member

andre-richter commented Aug 7, 2018

I was mainly referring to this specific comment: rust-lang/stdarch#518 (comment)

His Github profile says he works for ARM. If true, I think we have our authority :)
Maybe we should just ping him to sign off on ACLE being the way to go?

Edit: One comment above states the same: rust-lang/stdarch#518 (comment)

@japaric
Copy link
Member Author

japaric commented Aug 21, 2018

Triage: A pre-RFC discussing what should go in core::arch::arm have been opened in #184

@japaric
Copy link
Member Author

japaric commented Sep 4, 2018

Update(last week): a PR that implement acle (see pre-RFC in #184) has been sent to the stdsimd repo: rust-lang/stdarch#557. Also see rust-lang/stdarch#558.

Update(today): as discussed in today's meeting I'm going to remove this issue from the edition milestone. This means that this is no longer a priority for us and that we will prioritize the issues in the edition milestone over this. The rationale is that the binary blob trick already lets us implement the most used subset of CMSIS without depending on an external assembler, e.g. arm-none-eabi-gcc (although with runtime and code size overhead).

@japaric japaric removed this from the RC1 milestone Sep 4, 2018
bors bot added a commit that referenced this issue Oct 8, 2018
207: [RFC] Initial Cortex-A team r=hannobraun a=andre-richter

This is a proposal for creating a Cortex-A team.

This is based on and will close issue #182. Please refrain from approving until I removed the [DRAFT] from the title and every proposed member has his points on the agenda.

@raw-bin, @wizofe and me will be the initial members.

The following work is planned for this team:

### Maintain a set of ecosystem crates under the embedded WG

- [ ] Write a tool that generates accessors for the ARM System Register XML.
- [ ] Move [cortex-a](https://github.com/andre-richter/cortex-a) to the WG and keep extending/completing it.
    - [ ] If feasible, feed the XML accessors into the crate.
    - [ ] Add ready-to-use idioms for bare-metal use cases.
        - [ ] MMU idioms
        - [ ] GIC idioms
        - [ ] ...? 
- [ ] Move [register-rs](https://github.com/rust-osdev/register-rs) to the WG. CC @phil-opp for sign-off.
- [ ] Discuss if something the likes of `cortex-m-rt` is feasible for cortex-a.

### Provide resources for learning

- [ ] Move [rust-raspi3-tutorial](https://github.com/andre-richter/rust-raspi3-tutorial) to the WG

### Make it possible to build AArch64 kernels / programs on stable.

- [ ] Add a `rust-std` component for `aarch64-unknown-none`
- [ ] Differentiate into further aarch64 targets, e.g. a non-fp version of `aarch64-unknown-none`, and add them as well.

### Stabilization

- [ ] Push for a stable core::arch::arm module (see #63) to make the `cortex-a` crate (see its asm module) work on stable w/o depending on GCC.

------------------

@raw-bin and @wizofe, kindly let me know your agenda points and I will edit them into this comment.
I understand your Redox work will most probably be a part of it, but I want to leave it to you guys to describe it.

Once we are set, I will remove the [DRAFT] and open this for members to approve.

@jamesmunns I tend to agree to your thought about having a microprocessor group. However, I think that cortex-a alone is such a complex architecture, it can easily keep the initial three of us busy for years to come.
I would argue that we would need separate teams for PPC, RISC-V and the likes.

Co-authored-by: Andre Richter <andre-richter@users.noreply.github.com>
@japaric japaric removed their assignment Jan 8, 2019
@japaric
Copy link
Member Author

japaric commented Feb 27, 2019

Update: a PR that exposes the ACLE API on ARM and AArc64 recently landed in
stdsimd.;though it has not yet made its way into core::arch::{arm,aarch64}.

The next steps here would be to use the core::arch::arm API in the cortex-m crate
(behind the "inline-asm" feature, which requires nightly), and then proceed to
stabilize (a subset of) the core::arm::arch API. But we need a champion to
drive this to completion! (@adamgreig might be interested?)

Finally, to me it's a bit unclear if a full RFC is needed in this case given the
precedent of stabilizing core::arch::wasm32 with just an FCP. @gnzlbg thinks
an RFC is needed but during the all hands T-libs said that an RFC may not be
necessary. It would be good to get a clarification on what's required for stabilization. cc @Amanieu @SimonSapin

@gnzlbg
Copy link

gnzlbg commented Feb 27, 2019

@gnzlbg thinks an RFC is needed

We have stabilized other APIs since the std::arch RFC without going through the RFC process, but that's because those APIs had no new design in them. Their design was covered 1:1 by std::arch (which basically states we map C APIs 1:1 to Rust) and the same guarantees agreed on there (automatic verification, etc.) were provided. This essentially means: there isn't really anything to discuss, there are no trade-offs, because given some particular rules to map C to Rust, often there is only one way to actually implement these APIs, and when there are two ways, an FCP is enough.

The merged PR is awesome, and definitely something worth using on nightly (it landed already) so that we can start ironing implementation bugs, and getting experience with the design.

However, the API exposed by the PR does not do 1:1 what C does, it is something different than what was agreed on in the std::arch RFC, has a certain level of design in it, and this design has trade-offs (there are things that can be done with the C API that cannot be done with this API). Whether this is the design we want to commit to was a question that was not resolved during the review of the PR, and it will need to be resolved before stabilization.

It will be hard to have a discussion about that without writing down somewhere "What does C do", "Why this API is designed differently", "What are the trade-offs involved in this design", "What are the alternatives", "How can we close the gap with C in the future", etc. and using that as a starting point to discuss these things.

@alexcrichton
Copy link

FWIW I personally still feel that it's ok to "only" FCP these APIs. Such an FCP would, however, want to be advertised to relevant parties (aka the compiler team and the embedded WG).

@gnzlbg while for x86/x86_64 we have C precedent, wasm was a platform where we didn't have any C precedent so we made our own. I think that's fine to do in limited bases within platforms. For example it sounds like ARM doesn't have a widely-agreed-upon set of C APIs like x86/x86_64 does for intrinsics, so having our own idioms which take this into account for ARM seems reasonable.

As for RFC vs FCP, I think an RFC is probably a little too heavyweight for this in that we've already had an RFC for std::arch in general which, while x86-specific, paved the way for stabilization of future architectures and platforms.

It will be hard to have a discussion about that without writing down somewhere "What does C do", "Why this API is designed differently", "What are the trade-offs involved in this design", "What are the alternatives", "How can we close the gap with C in the future", etc. and using that as a starting point to discuss these things.

I definitely agree with this, but this is also what I imagined an FCP proposal encompassing :)

@gnzlbg
Copy link

gnzlbg commented Feb 27, 2019

@gnzlbg while for x86/x86_64 we have C precedent, wasm was a platform where we didn't have any C precedent so we made our own. I think that's fine to do in limited bases within platforms.

For WASM we coordinated this a bit with the spec and the clang/LLVM devs to make sure that things "matched".

For PowerPC, even though that's not stable yet, @lu-zero contacted the spec developers and they agreed to include the Rust API into the spec.

For example it sounds like ARM doesn't have a widely-agreed-upon set of C APIs like x86/x86_64 does for intrinsics, so having our own idioms which take this into account for ARM seems reasonable.

AFAIK the ACLE headers are just normal C headers, and most (all?) C and C++ programs use those. Isn't this the case? (I can't recall the exact result of the ACLE vs CMSIS discussion).

I think an RFC is probably a little too heavyweight

Instead of mapping the C headers, the current implementation designed a new "object-oriented" API over the registers and intrinsics that has no prior art (and limited experience on nightly), even though the API looks really good.

If the FCP proposal discussed those issues "mini-RFC-style" then that's fine for me, but the other FCPs that we had for similar things were basically a one liner "This is what C / theSpec does, we did that." and that was it. I don't think that a one-liner FCP would be enough this time.

@alexcrichton
Copy link

Hm ok, I was unaware that it had switched far enough away to a more object-oriented style, sorry about that! If that's the case then I agree that is a big enough change it may want an RFC

@gnzlbg
Copy link

gnzlbg commented Feb 28, 2019

Maybe one could draft the FCP post first, and then we can discuss which process this should follow? With a draft of the FCP it should be pretty obvious how much new design there is in here.

@jamesmunns
Copy link
Member

Note, as of Rust 1.34, it may be possible to achieve this in a limited fashion using linker-plugin-lto functionality. Please see the related cortex-m issue, and this test repo demonstrating this works.

As a note, this isn't really inline ASM, as originally described, but instead a way to define ASM in an external C file, that can be linked in as a static archive (.a file), which can be inlined as C/Rust code.

@adamgreig
Copy link
Member

Revisiting as part of today's issue triage. Our ACLE intrinsics never made it, but core::arch::arm did gain many types and intrinsics for explicit SIMD, and there's also rust-lang/rust#117219 to track some additional barrier-related intrinsics. In the meantime, core::arch::asm!() did stabilise, so it's now possible to write any assembly on stable Rust, which means this issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests