Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upStable assembly operations #63
Comments
japaric
added
the
help wanted
label
Mar 14, 2018
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Mar 14, 2018
•
|
Is there any sequence of instructions for which a single call to an I am thinking here about compiling in debug mode without optimizations were the code generated might differ, potentially in ways that break it. |
This comment has been minimized.
This comment has been minimized.
nagisa
commented
Mar 14, 2018
|
I’ve always been a strong proponent of the "intrinsic" functions that map down to a single instruction. For example, instead of writing Of course they have their downsides, but for most of the use cases in embedded, intrinsics are more than sufficient. In my experience using the IAR toolchain, which exposes such intrinsics for most of the important instructions, I’ve only had to maintain assembly to implement exactly two things: atomic byte increment and the context switch handler. Both of them implemented in an external assembly file. I believe that we could work on adding intrinsics to rustc (LLVM doesn’t have a platform intrinsic for ARM cpsid, for example, but it has a ... So basically, what your proposal says. |
This comment has been minimized.
This comment has been minimized.
nagisa
commented
Mar 14, 2018
|
@gnzlbg yes, compiler is free to insert whatever code it wishes between the separate |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Mar 14, 2018
•
|
Thanks @nagisa that makes sense. So my opinion is that it only makes sense to do this after we have stabilized inline assembly, which kind of defeats the point of doing this in the first place. IIUC the idea behind this is that if we expose the hardware ISA via intrinsics then nothing will break if LLVM changes the syntax or semantics of inline assembly because we will be able to fix this in core. But this is the problem: core will break. That is, somebody will need to fix core, and the amount of work required to fix core is linearly proportional to the usage of inline assembly within core. This might be a lot of work already, but if we go this route that will turn into a titanic amount of work. The same would happen if we decided to add another backed to rustc and had to reimplement all usages of inline assembly in "Cretonne-syntax". So IMO the only way to make upgrading to a new syntax/backend realistic is to turn the porting effort from O(N) in the usage of inline assembly to O(1). The only way I can think of to achieve this is to stabilize the inline assembly macro enough (e.g. like this https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443), so that upgrading can be done in Rust by "just" mapping Rust's inline assembly syntax to whatever syntax a new or upgraded backed uses. This will be a lot of work, but at least its independent from how often inline assembly is used. Once we are there, we might just as well stabilize inline assembly instead of pursuing this. Libraries that implement this can be developed in the ecosystem, and "core" intrinsics can continue to be added to Then there is also the issue that two consecutive |
This comment has been minimized.
This comment has been minimized.
nagisa
commented
Mar 14, 2018
|
The intrinsics would be implemented in the backend, rather than libcore. While it is true, that it would increase burden when upgrading a backend, it wouldn’t be any greater than the burden of adapting whatever we stabilise as our inline assembly implementation. |
This comment has been minimized.
This comment has been minimized.
nagisa
commented
Mar 14, 2018
With volatile assembly statements, the "code" would be limited to code that would be necessary to satisfy the constraints of the |
This comment has been minimized.
This comment has been minimized.
|
why can't global_asm! be stabilized? Isn't that just like an external assembly file? for riscv it would be quite a few intrinsics I believe. this is what I'm doing to r/w csr regs: #[cfg(target_arch = "riscv")]
macro_rules! csr_asm {
($op:ident, $csr:expr, $value:expr) => (
{
let res: usize;
unsafe {
asm!(concat!(stringify!($op), " $0, ", stringify!($csr), ", $1")
: "=r"(res)
: "r"($value)
:
: "volatile");
}
res
}
)
}the $csr value isn't an operand that can be loaded from a register, so there would need to be an intrinsic for each csr. |
This comment has been minimized.
This comment has been minimized.
|
For reference, here is ARM's C Language Extensions 2.0 document: ARM® C Language Extensions Release 2.0 I believe these were implemented by ARM's in-house compilers and by GCC according to this page: 6.59.7 ARM C Language Extensions (ACLE) I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics. Having a standard to point to should make it easier to get them implemented. For instance, section "8.4 Hints" describe |
This comment has been minimized.
This comment has been minimized.
I've taken a look over the full instruction set listing, here's what stands out Assembly operations that are often required for nontrivial programs:
Somewhat more obscure stuff:
Unlike AVR-GCC, LLVM transparently handles accesses to and from program memory, meaning that that whole class of operations doesn't require Although currently unimplemented, I believe LLVM's atomic intrinsics directly could be mapped to the RMW atomic instructions (FWIW these map to something like |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Mar 17, 2018
|
This comment has been minimized.
This comment has been minimized.
|
I discussed this with @alexcrichton during Rust All Hands and he said he was fine with adding an stable API for assembly ops that have an unstable implementation (e.g. inline |
japaric
added this to the 2018 edition milestone
Apr 3, 2018
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Apr 3, 2018
|
@japaric would those go into |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg That can be bikeshed in the RFC |
This was referenced Apr 3, 2018
This comment has been minimized.
This comment has been minimized.
|
In order to understand which intrinsics might be needed for the various Cortex-M processors, I The first group of intrinsics is "Core Register Access". These are mainly wrappers around https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html
The second group of intrinsics provides access to CPU instructions. Each of these Some of these instructions have direct equivalents in core::intrinsics or https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__CPU__gr.html
In the first group, disable_fault_irq, disable_irq, enable_fault_irq, enable_irq seem to be pretty critical. The rest of the get_ / set_ functions are more specialized. In the second group, NOP, WFI, WFE, SEV, ISB, DSB are ones that I am familiar with and use often. REVSH, RBIT, and RRX seem like they would be primarily for optimization. SSAT and USAT provide more flexibility than the core saturated math primitives by allowing selection of a bit width. LDRxT and STRxT are mainly for unprivileged access checking. The rest should be covered by built-ins and atomics. |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Apr 26, 2018
|
Do gcc, clang, or msvc provide any of these as functions ? |
This comment has been minimized.
This comment has been minimized.
|
As far as I can tell, all three implement them via the ACLE (ARM C Language Extensions) specification. gcc - 6.59.7 ARM C Language Extensions (ACLE) clang - The arm_acle.h header is shown in their documentation msvc - ARM Intrinsics |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Apr 26, 2018
•
|
All of it? The |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the pointer back to coresimd - I looked at this briefly a while back but didn't dig in to figure out the specifics of how to get the additional intrinsics implemented. coresimd/simd_llvm.rs provides the clue:
Unfortunately onlysimd intrinsics are listed there. I'd never seen the Looking further, stdsimd issue #112 mentions link_llvm_intrinsic which enables Implement all x86 vendor intrinsics shows how to create new intrinsics and also gave me enough information to find the list of LLVM arm intrinsics: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td So, as @gnzlbg points out, we should be able to do this through pull requests to stdsimd. |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Apr 27, 2018
|
For ACLE using |
This comment has been minimized.
This comment has been minimized.
|
Update: rust-lang-nursery/stdsimd#437 is tracking adding these assembly operations (instructions) to stdsimd. |
This comment has been minimized.
This comment has been minimized.
|
Discussed (triaged) in the last meeting: This is more of a nice to have as it's not required for embedded Rust on stable. If we are to get this done by the edition release these are the final deadlines:
|
This comment has been minimized.
This comment has been minimized.
|
Triage: Several CMSIS intrinsics have been implemented and are now available in Most of the functionality in The stabilization path for the non-SIMD subset of these intrinsics is being discussed in rust-lang-nursery/stdsimd#518 (comment). |
This comment has been minimized.
This comment has been minimized.
nagisa
commented
Jul 27, 2018
|
re __BKPT. Adding const generics is very likely to be backwards compatible, so we could expose I personally haven’t had a need for |
This comment has been minimized.
This comment has been minimized.
paoloteti
commented
Jul 30, 2018
|
CMSIS, that at the end is an HAL spec. and not a compiler spec, contain intrinsics just as wrapper to ACLE (see |
japaric
self-assigned this
Aug 7, 2018
This comment has been minimized.
This comment has been minimized.
|
As this mainly affects the Cortex-M ecosystem we should have someone on the @rust-embedded/cortex-m team champion this work. (This doesn't mean that you have to implement this; mentoring / helping a collaborator to implement this is also valid). Solving rust-lang-nursery/stdsimd#437 (comment) is the first step towards implementing this. |
This comment has been minimized.
This comment has been minimized.
korken89
commented
Aug 7, 2018
•
|
I have been doing work with asm instructions before, so I can certainly help. Reading the referenced comments and PRs I do not quite see the issue, as it is started the difference between HAL and compiler spec. @japaric To clarify, do you want help simply pushing it forward (is seems you already did the implementation) or to make the decision on CMSIS / ACLE discussion? |
This comment has been minimized.
This comment has been minimized.
|
@korken89 we want to stabilize the non-SIMD instructions (WFI, CPSID, etc.) as that would let us drop the build dependency on |
This comment has been minimized.
This comment has been minimized.
korken89
commented
Aug 7, 2018
|
@japaric Thanks for the clarification! I have never written an RFC before, but I'd very much like to learn. |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Aug 7, 2018
|
A summary of "What is ACLE" and "What is CMSIS" from the point-of-view of "What should a compiler implement" would probably be a good start. It isn't really necessary to put that into an RFC or anything right now. Just posting it there as a comment would be enough to keep the discussion moving and unblock future work. The only thing we are waiting for right now is for somebody to make a good case for which intrinsics should be implemented and why. We tried that before, but we have learned new things, so it is time to briefly re-evaluate whether CMSIS was the right choice or not (or whether we should provide only ACLE, or also ACLE, etc.). |
This comment has been minimized.
This comment has been minimized.
|
Hmm the last two comments there, including yours, kind of settle it from a logical point of view. It should be ACLE. Any reason you reconsider that statement here? |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Aug 7, 2018
•
|
Well, I have no idea what ACLE and CMSIS are, never used them, etc. The only things I know about ARM are from reviewing stdsimd PRs and implementing some bit and pieces here and there, and I honestly have the feeling that I still haven't gorked how the whole ARM architecture ecosystem fits together. So from my point-of-view, my statements about ARM should have zero weight. What I was hoping is that somebody would come along the thread with an authoritative answer or source of truth about how We have been making lists about how to organize Are the any resources that explain how the whole family of architectures / ISAs are related, which extensions are there, which can be changed dynamically, etc. ? Maybe that would help me. The whole v5/.../v8 architecture families +/- 32/64 +/- thumb +/- neon/... +- {r,m,..}class +- hard/soft floats +/-... is like a big spaghetti mess inside my head. It is hard to me to tell which code is allowed in which binary for which architecture. What are the ISAs, what are the ISA extensions, which code for which extensions can I include in my binary as long as I only execute it on hardware that implements the extension (that is, which extensions can actually be detected and used at run-time). For example, I can generate an arm64 binary that contains neon instructions as long as these are not executed at run-time, but IIUC the same cannot be done for arm32v7+neon. |
This comment has been minimized.
This comment has been minimized.
|
I was mainly referring to this specific comment: rust-lang-nursery/stdsimd#518 (comment) His Github profile says he works for ARM. If true, I think we have our authority :) Edit: One comment above states the same: rust-lang-nursery/stdsimd#518 (comment) |
This was referenced Aug 20, 2018
This comment has been minimized.
This comment has been minimized.
|
Triage: A pre-RFC discussing what should go in |
japaric
added
S-blocked
and removed
help wanted
labels
Aug 21, 2018
This comment has been minimized.
This comment has been minimized.
|
Update(last week): a PR that implement acle (see pre-RFC in #184) has been sent to the stdsimd repo: rust-lang-nursery/stdsimd#557. Also see rust-lang-nursery/stdsimd#558. Update(today): as discussed in today's meeting I'm going to remove this issue from the edition milestone. This means that this is no longer a priority for us and that we will prioritize the issues in the edition milestone over this. The rationale is that the binary blob trick already lets us implement the most used subset of CMSIS without depending on an external assembler, e.g. arm-none-eabi-gcc (although with runtime and code size overhead). |
japaric
removed this from the RC1 milestone
Sep 4, 2018
bors bot
added a commit
that referenced
this issue
Oct 8, 2018
japaric
removed their assignment
Jan 8, 2019
This comment has been minimized.
This comment has been minimized.
|
Update: a PR that exposes the ACLE API on ARM and AArc64 recently landed in The next steps here would be to use the Finally, to me it's a bit unclear if a full RFC is needed in this case given the |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Feb 27, 2019
•
We have stabilized other APIs since the The merged PR is awesome, and definitely something worth using on nightly (it landed already) so that we can start ironing implementation bugs, and getting experience with the design. However, the API exposed by the PR does not do 1:1 what C does, it is something different than what was agreed on in the It will be hard to have a discussion about that without writing down somewhere "What does C do", "Why this API is designed differently", "What are the trade-offs involved in this design", "What are the alternatives", "How can we close the gap with C in the future", etc. and using that as a starting point to discuss these things. |
This comment has been minimized.
This comment has been minimized.
alexcrichton
commented
Feb 27, 2019
|
FWIW I personally still feel that it's ok to "only" FCP these APIs. Such an FCP would, however, want to be advertised to relevant parties (aka the compiler team and the embedded WG). @gnzlbg while for x86/x86_64 we have C precedent, wasm was a platform where we didn't have any C precedent so we made our own. I think that's fine to do in limited bases within platforms. For example it sounds like ARM doesn't have a widely-agreed-upon set of C APIs like x86/x86_64 does for intrinsics, so having our own idioms which take this into account for ARM seems reasonable. As for RFC vs FCP, I think an RFC is probably a little too heavyweight for this in that we've already had an RFC for
I definitely agree with this, but this is also what I imagined an FCP proposal encompassing :) |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Feb 27, 2019
For WASM we coordinated this a bit with the spec and the clang/LLVM devs to make sure that things "matched". For PowerPC, even though that's not stable yet, @lu-zero contacted the spec developers and they agreed to include the Rust API into the spec.
AFAIK the ACLE headers are just normal C headers, and most (all?) C and C++ programs use those. Isn't this the case? (I can't recall the exact result of the ACLE vs CMSIS discussion).
Instead of mapping the C headers, the current implementation designed a new "object-oriented" API over the registers and intrinsics that has no prior art (and limited experience on nightly), even though the API looks really good. If the FCP proposal discussed those issues "mini-RFC-style" then that's fine for me, but the other FCPs that we had for similar things were basically a one liner "This is what C / theSpec does, we did that." and that was it. I don't think that a one-liner FCP would be enough this time. |
This comment has been minimized.
This comment has been minimized.
alexcrichton
commented
Feb 27, 2019
|
Hm ok, I was unaware that it had switched far enough away to a more object-oriented style, sorry about that! If that's the case then I agree that is a big enough change it may want an RFC |
This comment has been minimized.
This comment has been minimized.
gnzlbg
commented
Feb 28, 2019
|
Maybe one could draft the FCP post first, and then we can discuss which process this should follow? With a draft of the FCP it should be pretty obvious how much new design there is in here. |
japaric commentedMar 14, 2018
•
edited
Triage(2018-08-21)
A pre-RFC discussing what should go in
core::arch::armhave been opened in #184Update - 2018-07-27
Intrinsics like
__NOPare now available incore::arch::arm. Path towards stabilization is being discussed in rust-lang-nursery/stdsimd#518 (comment).Help wanted
We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm. Details in #63 (comment).
One of the features that ties embedded development to the nightly channel is the
asm!macro, andby extension the
global_asm!macro.In some cases it's possible to turn an unstable
asm!call into a stable FFI call that invokes somesubroutine that comes from a pre-compiled external assembly file. This alternative comes at the cost
of a function call overhead per
asm!invocation.In other cases the function call overhead of the FFI call breaks the intended semantics of the
original
asm!call; this can be seen when reading registers like the Program Counter (PC) or theLink Register (LR).
The
asm!feature is hard to stabilize because it's directly tied to LLVM;rustcliterally passesthe contents of
asm!invocations to LLVM's internal assembler. It's not possible to guarantee thatthe syntax of LLVM assembly won't change across LLVM releases so stabilizing the
asm!feature inits current form is not possible.
This ticket is about exploring making operations that require assembly available on the stable
channel. This proposal is not about stabilizing the
asm!macro itself.The main idea is that everything that requires assembly and can be implemented as external assembly
files should use that approach. For everything that requires inlining the assembly operation the
proposal is that the
corecrate will provide such functionality.This idea is not unlike what's currently being done in stdsimd land:
core::archprovidesfunctions that are thin wrappers around unstable LLVM intrinsics that provide SIMD functionality.
Similarly
core::asm(module up for bikeshedding) would provide functionality that requiresasm!but in a stable fashion. Example below:
This way the functionality can be preserved across LLVM upgrades; the maintainers of the
corecrate would update the implementation to match the current LLVM assembly syntax.
TODO (outdated)
provided in
core::asm.cortex-mthat movesmost of the assembly operations to external assembly files by enabling some Cargo feature. What can't be moved to assembly files would be a candidate for inclusion in
core::asm.cc @gnzlbg @nagisa could we get your thoughts on this? does this sound feasible, or does this sound
like a bad idea?
cc @dvc94ch @dylanmckay @pftbest could you help me identify assembly operations that require inline assemly on AVR, MSP430 and RICSV?