Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a new `#[instruction_set(...)]` attribute for supporting per-function instruction set changes #2867

Open
wants to merge 3 commits into
base: master
from

Conversation

@ketsuban
Copy link

ketsuban commented Feb 16, 2020

This RFC proposes a new function attribute, #[instruction_set(...)]. The minimal initial implementation will provide #[instruction_set(a32)] and #[instruction_set(t32)] on ARM targets, corresponding respectively to disabling and enabling the LLVM feature thumb-mode for the annotated function.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 16, 2020

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Feb 16, 2020

I think it would be helpful to give some examples of why someone might want to use this attribute.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 16, 2020

It touches upon that a bit:

ARM targets have a denser but less feature-packed instruction set named T32 alongside the normal A32.

More specifically, T32 code is 16 bits per operation while A32 code is 32 bits per operation. This makes a fairly big difference in code size, and since many ARM devices might have only a 16-bit bus for some parts of the system this can even make a big difference in CPU cycles taken to run the program.

@repnop

This comment has been minimized.

Copy link

repnop commented Feb 17, 2020

RISC-V is mentioned in the RFC as well since it does have the "C" extension for the compressed instruction set, which can be found here: https://riscv.org/specifications/isa-spec-pdf/ in Chapter 16

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Feb 17, 2020

Before reading the content I thought #[isa] means "is a" 😓.

@hanna-kruppe

This comment has been minimized.

Copy link
Member

hanna-kruppe commented Feb 17, 2020

RISC-V is mentioned in the RFC as well since it does have the "C" extension for the compressed instruction set, which can be found here: https://riscv.org/specifications/isa-spec-pdf/ in Chapter 16

But unlike Thumb, RVC is a ISA extension that just adds more instructions rather than replacing the ISA entirely. So it can (and should) be handled through the existing target_feature mechanisms.

@Centril

This comment has been minimized.

Copy link
Member

Centril commented Feb 17, 2020

Some notes from initial review:

  • I agree with @kennytm (#2867 (comment)) that #[isa] isn't suggestive of what this attribute does. I think something more verbose would be warranted in this case to illuminate what it means.

  • What guarantees does #[isa] actually give the user? From the sound of it, it's an optimization hint? If so, we could tuck this into #[optimize] (rust-lang/rust#54882).

  • I'm concerned that this seems to be adding quite a domain-specific / niche feature to the language that might only end up benefiting a single family of targets, and that this is a lot of complexity relative to the generality of the feature. Attempts to make this more generally useful, either now, or in the future, would be welcome I think.

  • What would the interactions with e.g. Cranelift be?

  • Functions are inlined across ISA boundaries as if the #[isa] attribute did not exist.

    This needs some elaboration, preferably with examples of what this means. This does seem to confirm the notion that this is a hint (see the point re. #[optimize]) as opposed to something with semantic guarantees.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 17, 2020

  • Name: Sure, any ideas?
  • Guarantees: The generated function must use the assembly / machine code flavor of the designated isa. Failure to do so would effectively change the overall calling convention of the function. If an isa attribute is given then the compiler must respect it, the same as the compiler couldn't ignore extern "C" in a function signature.
  • Domain-specific: This is similar to many intrinsics and target feature settings. Mostly specific to a particular architecture because other architectures don't use this technique of having two froms of machine code. It is basically a requirement for practical ergonomic use of Rust in many embedded contexts. I believe this was on the embedded-wg wishlist for all of 2019.
  • Cranelift: This affects code generation, but only on a function level. In the "worst case", alternative isa functions can be placed in separate translation units and compiled into separate objects. The final interwork adjustments are made by the linker.
  • Inlining: This is effectively a calling convention adjustment, so if a function is inlined then the attribute would have no other effect.
@Centril

This comment has been minimized.

Copy link
Member

Centril commented Feb 17, 2020

  • Name: Sure, any ideas?

If we're sticking with the overall semantic notion and just renaming, I'd maybe go for instruction_set_arch or some variation of that.

  • Guarantees: The generated function must use the assembly / machine code flavor of the designated isa. Failure to do so would effectively change the overall calling convention of the function. If an isa attribute is given then the compiler must respect it, the same as the compiler couldn't ignore extern "C" in a function signature.

That's not particularly clear from the RFC. I would like to see some elaboration on this aspect. From reading the text, I primarily understood this as an internal aspect of functions, not as something affecting the signature, particularly because the RFC talks about "shims" and whatnot. This is different from extern "C" which has an observable difference in the type system:

extern "C" fn foo() {}
const X: fn() = foo; //~ ERROR expected "Rust" fn, found "C" fn

Presumably Rust would need to insert shims when dealing with different #[isa]s and function pointers.

(Also, please point out some documentation in the LLVM LangRef in the RFC.)

  • Domain-specific: This is similar to many intrinsics and target feature settings. Mostly specific to a particular architecture because other architectures don't use this technique of having two froms of machine code. It is basically a requirement for practical ergonomic use of Rust in many embedded contexts. I believe this was on the embedded-wg wishlist for all of 2019.

The main difference is that instrinsics and target feature settings are already established categories of target specific mechanisms. This makes this RFC different than e.g. introducing a new target feature or a new intrinsic. That is why I find it important to answer e.g. (as noted in the RFC):

Are there any presently-supported architectures with a mechanism like A32/T32 which #[isa] could support?

  • Cranelift: This affects code generation, but only on a function level. In the "worst case", alternative isa functions can be placed in separate translation units and compiled into separate objects. The final interwork adjustments are made by the linker.

Can you elaborate on how that interacts with the calling convention?

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 17, 2020

Sure. On chips that support both a32 and t32 code there's a specific bit within the CPU status register that determines if the program counter address should be used to read a single 32-bit value (a32) or one to two 16-bit value(s) (t32). Depending on the "thumb" bit the CPU will perform the appropriate read and take appropriate action. Of course, the bit patterns between a32 and t32 are totally non-compatible. If the CPU is reading code from one isa while having the bit set (or not) for the other you'll get either the wrong legal instructions or just illegal instructions (UB either way).

This is all sorted out by having specific forms of branch instruction that let you enable or disable the bit (if needed) when making calls. The linker performs the task of generating the "interwork" shims during the linking process so that branches go to the correct location and also perform the correct transition as necessary.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 19, 2020

Hello I'm Lokathor and welcome to my TED talk. Everyone please be sure to thank @Centril for asking me to give this presentation.

This target audience for this post is T-Lang specifically. Others can read it too of course, and I hope you all enjoy it, but the text here will be largely conversational and probably not suitable for direct inclusion into this RFC or into any particular Rust documentation.

A Primer On A32 / T32 Code

Some of the CPUs in the ARM chip family support more than one form of machine code. This is not really like the x86 and x86_64 split, where a single binary is built using entirely one machine code format or the other, and an x86_64 CPU can just also run an old x86 binary. I'm talking about two different machine code formats within a single binary and the CPU can jump back and forth between both the two during the program's execution.

Before the ARMv4T series there was just one flavor of assembly / machine code for ARM chips, naturally called "ARM code". Starting with ARMv4T they added a "Thumb code" flavor as well. (There's also a "Thumb2" extension supported in even later chips.) The assembly text of Thumb code is intentionally as close as possible to the assembly text of ARM code, but the binary encoding of the instructions is totally different.

  • ARM code is align4, with an encoding format that always uses 32 bits per opcode.
  • Thumb code is align2, with an alternate encoding format that uses (mostly) 16 bits per opcode (some "Thumb2" instructions are encoded as two 16-bit values).

Thumb code can't use as many of the registers, and it can't even do all the operations the CPU is capable of performing, but since ARM chips are usually used for embedded stuff, the code space savings actually are a huge deal. Also, since the bus from the storage to the CPU might be only a 16-bit bus using smaller opcodes has a runtime speed effect as well. The CPU literally stalls while waiting for the "second half" of each 32-bit ARM opcode to transfer across the bus.

"Thumb code is typically 65% of the size of the ARM code, and provides 160% of the performance of ARM code when running on a 16-bit memory system." --ARM7TDMI Technical Reference Manual 1.2.2. The Thumb instruction set

As embedded developers, we would naturally like to compile as much of the program as possible in Thumb code to get this advantage.

However, the CPU generally boots in ARM state, and some parts of the code might also be required to be jumped to in ARM state because the chip is just built that way, so we cannot program all of the program just in Thumb code.

Also, as I mentioned above, ARM code has access to more registers at once and can perform more kinds of operation than Thumb can, so even select parts from the "normal" portion of the program might be better written using ARM code.

Reference-level explanation

Precisely the way that this works is both simple and clever:

  • There's a bit in the CPU status register, the "thumb bit", and if it's on then the CPU is in "thumb state", and if it's off then the CPU is in "ARM state".
  • Whenever the CPU performs a bx (branch-exchange) or blx (branch-link-exchange) instruction, the CPU exchanges the least bit of the target address with the status register thumb bit.
  • The lowest bit of the program counter is always ignored for purposes of actually getting the opcode, so a target address of 0x0800_ABC0 and 0x0800_ABC1 will both lead to the same position in the code, and the low bit says if that code is to be used as ARM or Thumb.

That's it.

Code objects generated by LLVM store the address of a given label as either even or odd, so the object files continue to know if each part is ARM or Thumb (link to the ARM ELF spec, check 5.5.3). The linkers for ARM targets can adjust function calls so that calls from ARM to Thumb and back can use bx and blx as appropriate (eg: GNU ld calls the flag -mthumb-interwork), again based on the address of the jump target.

Motivation (yes it's out of order from the official template)

Currently, Rust supports two target groups for ARM devices (many of which are tier2!):

  • There are targets with names that start with arm, such as arm-unknown-linux-musleabihf or armv5te-unknown-linux-gnueabi or others
  • There are targets with names that start with thumb, such as thumbv6m-none-eabi, thumbv7neon-linux-androideabi, and so on.

For these targets, all code of the entire program is restricted to only a32 (ARM targets) or t32 (Thumb targets). LLVM allows you to specify, per function, that you would like a particular format (link to what looks to be the PR that added this, and that shows the C usage as LLVM tests). This RFC allows you to tell LLVM to code generate a function using a specific format.

FAQ

  • Is this an optimization hint?

    • No, to be useful it must be an absolute guarantee that tagged functions are generated using the correct code format. It affects the inline assembly that you write, and it even affects the code's ability to be used at all in select situations.
  • Is this an alternate ABI, like C-unwind and C-nounwind? / Does this need to show up in the type system?

    • Nope. LLVM, the linker, and the CPU all carry around the correct info about functions and function pointers based on the lowest bit of the address. The type of the function itself doesn't actually change. The inter-procedure call format also doesn't change (link if you wanna read about that one).
@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Feb 19, 2020

I think this is a useful attribute to have, it is useful for embedded systems and mirrors existing GCC/Clang functionality with __attribute__((target)). However it is entirely specific to ARM targets so it should at the very least have "arm" somewhere in the name.

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Feb 19, 2020

So the main reason you would use this attribute is:

  1. You're writing code for an embedded arm device.
  2. You want your code to mostly compile to the "thumb" instruction set.
  3. You have specific functions (interrupt handlers or some such) that will be called by the CPU in ARM mode.

And the way you would use it is:

  1. Compile your code for one of the "thumb" targets.
  2. Mark those specific functions as #[isa = "arm"].

?

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Feb 19, 2020

Are there use cases where you'd want to do the inverse, i.e. compile for one of the "arm" targets and mark some functions as #[isa = "thumb"]?

How does this work with libraries? Would some ARM libraries always want to be compiled with one or the other isa in any ARM application? Would it ever make sense for a cross-platform library (e.g. for a hash function) to say things like "if I'm being used on ARM, this part should use thumb isa"? Or do the only embedded programs where this matters use very few libraries or vendor/fork when it matters so it's not a concern?

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Feb 19, 2020

I think the most common use case would be to have t32 be the default and explicitly annotating certain hot functions as a32 for performance (or functions that use instructions not available to thumb via intrinsics/inline asm). I can't think of a good use case for opting into t32 when a32 is the default.

I do not expect any of these attributes to be used by cross-platform libraries, and if an embedded project really needs a function in an external library to be compiled for a32 then they will most likely fork that library.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 19, 2020

  • t32 default and select a32 functions is also my assumption for how this would be used in practice.
  • I also agree that if you need to specify this you'll probably pull in your own copy of a function, library authors in general should not bother thinking about this when writing a general purpose library.
  • No one specifically asked this but I do not think it should be a compilation error to use this in code being compiled for a non-ARM target because that makes cargo test needlessly tricky to use. I'd compare it to how the #![window_subsystem()] attribute just "does nothing" on linux and mac.

@Diggsey: yes.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Feb 19, 2020

#[isa] -> #[instruction_set] perhaps?
It's both more precise if you know what this is about, and less ambiguous if you don't.

Are there any other architectures using this trick with multiple instruction decoding modes in single application-level execution? What terminology they use?

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Feb 20, 2020

Inlining an A32 function with inline assembly that uses an A32-only instruction into a T32 function can break code, so it does affect inlining.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Feb 22, 2020

I feel like this needs to comment on the ABI for the generated function as well, and whether blocks can be annotated with #[isa].

What is the future scope for this kind of attribute? Will an x86 #[isa = "mov"] ever be supported?

Would there ever be a case where additional instructions are required to switch the processor to the new ISA before running the code? If this is allowed, should there be #[isa = "protected"], #[isa = "real"], #[isa = "long"], etc. provided for x86?

The current use case seems extremely tailored to thumb, and definitely needs more elaboration on how this not only makes sense on ARM, but also on x86 and other architectures. It also needs to elaborate why this should be an attribute instead of a separate target.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 22, 2020

Well I think I explained the "why an attribute and not a target?" question fairly well already. In short, we have both forms as targets already. What's needed is the ability to intermix things.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Feb 22, 2020

Indeed, this comment thread has very thoroughly explained why the proposed feature is the way it is, but the RFC text is still missing most of that reasoning, which is what I think @clarfon meant by "it also needs to elaborate..."

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Feb 22, 2020

Yes, I was basing my response on the RFC text, and had only skimmed the comments. The RFC as is is quite bare.

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Feb 23, 2020

One place #[isa] might also help is with inter-ISA calls when rustc and LLVM eventually gain support for that. We (Libre-SOC -- formerly Libre-RISCV) are currently building a hybrid PowerPC/RISC-V processor that supports calling between ISAs. Admittedly, rustc and/or LLVM will need some rearchitecting before it can be done without some custom linking steps.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 23, 2020

  1. Ketsuban doesn't have the time to continue the PR and so has added me as a maintainer on the PR, so I'll try to get edits done soon.

  2. @programmerjake So I suppose that'd be a "future possibilities"?

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Feb 23, 2020

  1. @programmerjake So I suppose that'd be a "future possibilities"?

Definitely. It will probably take a huge amount of work to implement cross-ISA calls in LLVM, so we may want to use a wasm-bindgen kind of approach meanwhile.

@ketsuban ketsuban changed the title RFC: Add a new attribute, `#[isa]` RFC: Add a new `#[instruction_set(...)]` attribute for supporting per-function instruction set changes Feb 24, 2020
@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Feb 24, 2020

I feel that for an MVP, we should just focus on making an ARM-specific attribute (#[arm_isa = "a32|t32"]) rather than trying to make this generic. There simply isn't a good use case for this outside of ARM and we should avoid making the attribute too generic.

@chorman0773

This comment has been minimized.

Copy link

chorman0773 commented Feb 24, 2020

I'd like to voice my support for this RFC, but against an arm-specific attribute. I am designing an llvm target for the 65816, which could also benefiet from this RFC. It is possible on the 65816 (through the use of the XCE instruction) to switch between 65816 native mode, and 6502 emulation mode. This attribute could indicate functions to be compiled in emulation mode, and apply the necessary entry/exit-point transformation.

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Feb 24, 2020

@chorman0773 This seems quite different from the feature that is being proposed here. Keep in mind that on ARM, the target ISA is encoded in the low bit of a function pointer value. This means that when you invoke a function pointer, you don't need to care whether the target function is compiled as A32 or T32 since the indirect call instruction will handle this automatically.

In the case of 65816, the compiler would need to either know the target ISA of a function pointer in advance, or carry that information in the function pointer and decode it every time a function pointer is called.

@chorman0773

This comment has been minimized.

Copy link

chorman0773 commented Feb 24, 2020

Indeed, and that can easily be dealt with. Emulation mode would also be more useful for interrupt handlers, as there are both native mode handlers and emulation mode handlers (you can’t call an interrupt handler, period (hoping that the ABI definition enforces that))

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Feb 24, 2020

I think that instruction_set(a32) is a totally fine attribute name even if we never again in the future history of Rust add another alternative instruction. Closing it off to future expansion doesn't really improve the design, so we might as well use a slightly more general name.

That said, @chorman0773, the question that really helped me sort out the design here was this: "Would a pointer to an arbitrary function need to carry the alternative mode info in its type? Would it need to be a separate type from a standard function pointer?" It sounds like you're saying that it would need to be a new ABI type? In that case, that would probably be a mechanic best served by an extern "6502emu" ABI, or whatever name.

I would like this to be open to future expansion, but I agree with Amanieu that a key part of this particular proposal is that there is no new function pointer type involved with a32/t32, so it doesn't complicate the type system at all.

@chorman0773

This comment has been minimized.

Copy link

chorman0773 commented Feb 24, 2020

@ketsuban

This comment has been minimized.

Copy link
Author

ketsuban commented Feb 25, 2020

I feel that for an MVP, we should just focus on making an ARM-specific attribute (#[arm_isa = "a32|t32"]) rather than trying to make this generic. There simply isn't a good use case for this outside of ARM and we should avoid making the attribute too generic.

I've found another instruction set with a setup like ARM's - MIPS has an optional subset called MIPS16. As the name implies, its instructions are sixteen bits wide. Like T32, its function pointers are transparent to the rest of the instruction set, but have bit 0 set.

Rust has some MIPS support, and LLVM supports MIPS16 via a target feature (mips16), so drawing a circle around this attribute and making it ARM-specific would be premature.

}
```

To ease the amount of `cfg_attr` required with this attribute, if you specify an instruction set that isn't available on the target used the attribute is simply ignored. For example, if you specify `t32` and then build the code for `x86_64` or `wasm32`, the attribute is ignored.

This comment has been minimized.

Copy link
@Amanieu

Amanieu Feb 28, 2020

Contributor

If you intend for instruction_set to not be solely restricted to ARM then you can't do this. Implicitly ignoring instruction sets that aren't available on a target means that you can't check for typos (t31 oops) since that might be a valid instruction set name on a different target.

This comment has been minimized.

Copy link
@Lokathor

Lokathor Feb 28, 2020

Since the most probable names to be used with this are t32 and a32, possibly also m16, and m32 later, that feels unlikely to happen. I think it would be worse to have to put cfg_attr all over the place all the time. I'd be willing to change it if we had to though.

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Feb 28, 2020

It would probably be good to use arm_a32 and arm_t32 (instead of a32 and t32) and later mips_m32 and mips_m16 since that way the instruction sets are more clearly disambiguated, allowing people to more readily look them up as well as being more resistant to typos due to not being confused with different architectures.

@ketsuban

This comment has been minimized.

Copy link
Author

ketsuban commented Mar 1, 2020

I can't say I'm a fan of changing the names like that. ARM refers to the instruction sets as A32 and T32, and MIPS documentation refers to MIPS16, so that's the terminology Rust should use.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Mar 1, 2020

That definitely helps the "google factor" to stick to the real name used everywhere else as much as possible.

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Mar 1, 2020

I can't say I'm a fan of changing the names like that. ARM refers to the instruction sets as A32 and T32, and MIPS documentation refers to MIPS16, so that's the terminology Rust should use.

I had meant that the standard name would be prefixed with the parent ISA name -- so arm_a32, arm_t32, and mips_mips16 (mips_m16 was a typo). Most search engines can split words on underlines, so that shouldn't reduce searchability.

Alternatively, instruction_set can be 2 argument:

#[instruction_set(arm, a32)]

The intention is that having another disambiguating name will make it harder to make typos and accidentally use some other name that then gets silently ignored because it is defined for some other ISA.

Additionally, not everyone knows that t32 is for ARM, but it's easy to see that
#[instruction_set(arm, t32)] is for ARM and not MIPS or some other ISA.

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Mar 1, 2020

I like the two argument approach. I'll try to sit down and adjust the RFC with this idea tomorrow if I can.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Mar 1, 2020

When the "real name" is as short as "a32" or "t32", sticking to it actively hurts the google factor, not helps. For me, Google thinks "a32" is a road and "t32" is a tank.

The prefixes idea and the two arguments idea both seem like good solutions to me. If we go with two arguments, can we also specify that values for the 2nd argument are meant to be mutually exclusive, so that providing more than one (for the same arch/1st arg value) is always a compile-time error?

@Lokathor

This comment has been minimized.

Copy link

Lokathor commented Mar 1, 2020

Yes, max of one arch selection per function

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Mar 1, 2020

Yes, max of one arch selection per function

Just make sure to not accidentally prevent something like the following, since the arm and mips specifications are independent:

#[instruction_set(arm, t32)]
#[instruction_set(mips, mips16)]
fn f() {}
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 6, 2020

Nominating for mention at lang team meeting: @Lokathor mentioned to me that this RFC had reached a point where more feedback would be useful, and they were curious whether there is a potential @rust-lang/lang sponsor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.