Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Intrinsics: Define common availability and naming rules #25

Closed
wants to merge 1 commit into from

Conversation

cmuellner
Copy link
Collaborator

To avoid various naming schemes for intrinsics, let's create
a few rules to streamline them across the RISC-V ISA.

@mjosaarinen
Copy link

mjosaarinen commented Feb 6, 2022

This is not in line with how Intel and ARM intrinsics work. I'd prefer the _rv prefixed format that Bitmanip and Scalar Crypto intrinsics use, which is less likely to cause namespace collisions. Furthermore, no rationale is offered for the change from what we've had for several years now.

@cmuellner
Copy link
Collaborator Author

I probably should have referenced the origin of this naming convention: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#naming-rules

Note, that I have reached out regarding the two naming conventions on tech-toolchain: https://lists.riscv.org/g/tech-toolchain-runtime/message/303

@mjosaarinen
Copy link

mjosaarinen commented Feb 6, 2022

I removed the pull request for crypto intrinsics. If you define new ones, please present them to CETG for approval.

To avoid various naming schemes for intrinsics, let's create
a few rules to streamline them across the RISC-V ISA.

Signed-off-by: Christoph Muellner <cmuellner@gcc.gnu.org>
@cmuellner
Copy link
Collaborator Author

I've reworked this RFC to make it compatible with all intrinsics formats that I could find (GCC, LLVM, and scalar crypto).
This requires no modifications to existing specifications but at the same time prevents the definition of new formats.

int32_t __builtin_riscv_orc_b_32(int32_t rs);

vadd.vv vd, vs2, vs1:
vint8m1_t __rvv_vadd_vv_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we define vector intrinsic without prefix now (e.g. vadd_vv_i8m1), and it's already implemented on upstream LLVM for a while and used in several open source projects like OpenCV.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, my team has written and shipped thousands of lines using the current API.

But a few lines up it says,

The PREFIX has the intention to avoid naming collisions.
RISC-V header files can provide aliases to the intrinsics without the prefix.

So maybe this won't affect me? (And can this aliasing be done without further bloat?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we cannot pull back published APIs.
The intent is that we standardize the naming so that we have a common pattern across all the intrinsics for ISA extensions.

So yes, the example is wrong. Should be vint8m1_t vadd_vv_i8m1(...) after including riscv_vector.h, right?

What about the type prefix __rvv_ (e.g. __rvv_int8m1_t) as defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/RISCVVTypes.def and the function prefix __builtin_rvv_ as defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/riscv_vector.td#L114, i.e. would this be a reasonable prefix pattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The riscv_vector.h file in LLVM is currently somewhere around 8-10MB containing 10s of thousands of intrinsics and is quite slow to parse. I worry that adding aliases will make this even larger and increase the already slow compile time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could define the common rule for intrinsic header first (#17), and those new naming rule API are only supported in new common intrinsic headers.
For now, toolchain still need to support old intrinsic header with old ABI, and they will be deprecated in the future.
In this way, I think we could get rid of the current API designed and make things moving forward.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could define the common rule for intrinsic header first (#17), and those new naming rule API are only supported in new common intrinsic headers. For now, toolchain still need to support old intrinsic header with old ABI, and they will be deprecated in the future. In this way, I think we could get rid of the current API designed and make things moving forward.

On file naming, how about this ( already in the prototype/algorithm test repo https://github.com/rvkrypto/rvkrypto-fips )

  • To unify things (with vector), the main name for the crypto extension is riscv_crypto.h. With version 1.0 this just pulls in risc_crypto_scalar.h as the first ratified version didn't specify vector crypto exentsoins.
  • Once vector cryptography is defined, this riscv_crypto.h can pull in both riscv_crypto_scalar.h and riscv_crypto_vector.h. Only the latter latter has a dependency on the vector extension, and there are reasons why one might not want to include it in all source modules that just need a few bits from the scalar crypto.
  • Once bitmanip intrinsics are defined again, riscv_crypto_scalar.h can pull in the relevant riscv_bitmanip_.._.h files, as the scalar cryptography extension actually includes a set of bitmanip instructions, which we currently define "for them". One can't define those twice.
  • I guess a unified API rule is what this PR is about, I basically just need the rule to be fixed for this to go forward, So currently I'm just waiting for @cmuellner to signal that names are fixed, and then I'll pull the trigger and rename all of the intrinsics accordingly.
  • At least for scalar instructions, It seems best to consider builtins to be a compiler-specific implementation mechanism for intrinsics. Intrinsics are enabled by including the relevant header file. The underscore prefix is reserved for the compiler/system namespace, and I don't want the scalar header to cause namespace collisions. However for vector intrinsics, for convenience purposes, the prefix may be dropped I guess.

Some historical comments:

  • Name of the file aside, it's a shame that the Bitmanip file rvintrin.h got blocked due to FSF copyright assignment paperwork (I have the FSF paperwork, if that matters). Its prefix naming convention _rv/_rv32/_rv64 based on the variable/return type is adequate for scalar ISA extensions. Various quirks related to RISC-V sign extension conventions and some other things are handled by it nicely; starting from scratch would require those things to be re-learned. It's easier just to change names.
  • Anyway, rather than including the bitmanip header file, I had to rewrite the selected parts of the bitmanip header. This wasn't too hard since I didn't rewrite the emulation parts, just refactored them out. Emulation should not be done by a header like this (specifically in crypto as it destroys the constant-time properties asserted.)
  • The P extension is not frozen, nor is it in the imminent ratification roadmap. I guess this comes down to it in some ways "competing" with the vector extension. So it can have relatively low priority from RISC-V perspective. However, there is a vendor or two that support it, so perhaps the compiler people can just ignore RISC-V and consider the P extension coming from the vendor instead. But it shouldn't be in this spec before it moves forward in RVI.
  • Scalar crypto has been ratified, as have parts of bitmanip. These are in silicon tape-outs from multiple vendors. Since intrinsics are the main C programming method specifically for scalar crypto, we'd like to get this fixed soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, the example is wrong. Should be vint8m1_t vadd_vv_i8m1(...) after including riscv_vector.h, right?

What about the type prefix _rvv (e.g. __rvv_int8m1_t) as defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/RISCVVTypes.def and the function prefix _builtin_rvv as defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/riscv_vector.td#L114, i.e. would this be a reasonable prefix pattern?

I suppose user should never use those internal stuffs like __rvv_int8m1_t, that is not portable across different compiler implementation.

And user should not expect those intrinsic functions are available without including header, always import that might increase lots of compilation time, especially something like vector intrinsic functions.

Copy link

@zhongjuzhe zhongjuzhe Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer vadd_vv_i8m1 to __rvv_vadd_vv_i8m1. The RVV GCC implementation doesn't support __rvv_vadd_vv_i8m1 but support vadd_vv_i8m1. Because it implement RVV intrinsics using simulate_decl like ARM SVE. The RVV intrinsisc header
riscv_vector.h in https://github.com/riscv-collab/riscv-gcc/blob/riscv-gcc-rvv-next/gcc/config/riscv/riscv_vector.h does not add any aliases and it can reduce compilation time. I personally don't like the api that has prefix _rvv and need alias added in the header for RVV intrinsics.

@topperc
Copy link
Contributor

topperc commented Feb 11, 2022

Should this proposal include the P extension? Looks like the naming in this https://github.com/riscv/riscv-p-spec/blob/master/P-ext-proposal.adoc has __rv_v_ for intrinsics that return packed vectors.

EXTENSION ::= ISA extension letter or string.
MNEMONIC ::= Instruction name in the ISA extensions specification. Replace '.' with '_' (if any).
RET_TYPE ::= Return type indication.
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In vector intrinsic, there is some naming exception cases due to having the same return type under different input types.
I was thinking maybe naming rule need allow the exception or only require PREFIX and MNEMONIC part should be common?

int32_t __builtin_riscv_orc_b_32(int32_t rs);

vadd.vv vd, vs2, vs1:
vint8m1_t __rvv_vadd_vv_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could define the common rule for intrinsic header first (#17), and those new naming rule API are only supported in new common intrinsic headers.
For now, toolchain still need to support old intrinsic header with old ABI, and they will be deprecated in the future.
In this way, I think we could get rid of the current API designed and make things moving forward.

@pz9115
Copy link

pz9115 commented Feb 17, 2022

The P extension spec is still not forzen yet, so it could follow the define. @chuanhua


```
INTRINSIC ::= PREFIX '_' MNEMONIC [ '_' RET_TYPE ]
PREFIX ::= "__builtin_riscv" | "__rv" EXTENSION | "_rv" | "_rv32" | "_rv64"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • __builtin_riscv: I would suggest drop "__builtin_riscv", that might give people some impression is that could be used without include header file.

  • "__rv" EXTENSION: I would suggest __rv and no EXTENSION name encoded there (or just remove that to use _rv?), there are some instruction are appeared in different extension like rol and ror? Although they defined with _rv32 and _rv64 in crypto intrinsic only, but I would like to prevent we have something like __rvX_zz and __rvY_zz with same semantic but different intrinsic name in future.

int32_t __builtin_riscv_orc_b_32(int32_t rs);

vadd.vv vd, vs2, vs1:
vint8m1_t __rvv_vadd_vv_i8m1(vint8m1_t vs2, vint8m1_t vs1, size_t vl);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, the example is wrong. Should be vint8m1_t vadd_vv_i8m1(...) after including riscv_vector.h, right?

What about the type prefix _rvv (e.g. __rvv_int8m1_t) as defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/RISCVVTypes.def and the function prefix _builtin_rvv as defined in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/riscv_vector.td#L114, i.e. would this be a reasonable prefix pattern?

I suppose user should never use those internal stuffs like __rvv_int8m1_t, that is not portable across different compiler implementation.

And user should not expect those intrinsic functions are available without including header, always import that might increase lots of compilation time, especially something like vector intrinsic functions.


The return type is typically encoded into the intrinsic name as part of the prefix or a postfix.
In case the return type is encoded into the prefix, then the following prefixes should be used:
* `_rv`: for intrinsics that operate on the long data type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this naming scheme is very confusing, as it's easy to confuse "_rv32_xxx" (i.e. an intrinsic that operate on 32 bit data) with RV32 devices.

There are RV32 instructions that operate on 64 bit data, so _rv64 intrinsics doesn't have to be RV64-specific. One such instruction is add64 from the P extension.

There are many potential instructions that would not fit into the 32/64 bit input model. One such example is smar64 from the P extension which take one 64 and one 32 bit argument.

My guess is that the prefixes have been used by some compilers to distinguish between the RV32 and the RV64 version of an intrinsic, such as _rv32_clz and _rv64_clz. However, technical limitations are not a good foundation for user-visible names. (If a compiler needs to use different names internally, this can easily he handles by aliases in a header file.)

The C standard has different formulations when it comes to reserved identifiers. _<lowercase letters> is less protected than __xxx and _<Uppercase letters>. See paragraph 7.1.3 of the C standard. Just to be safe, I suggest that we always use double underscores.

I suggest that we drop this and use __rv_xxx or __rv<extension>_xxx for all instructions that map to a single instruction. Things like the P extension vector variants can be used __rv_v_xxx or __rv_xxx_v, where the latter is a better match for the suggested naming scheme, where the return type is included in the name.

/ Anders Lindgren, Lead RISC-V compiler developer at IAR Systems

Copy link

@mjosaarinen mjosaarinen Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the prefix is the return length and is indeed independent of the architecture. I don't know about the P extension, but the ratified extensions include plenty of _rv32 prefix intrinsics that work on RV64; "w" instruction variants are expressed this way. For example _rv32_ror (rotate) translates to rorw[i] on RV64 and ror[i] on RV32. This is not because of a technical limitation, but because 32-bit rotate is a different thing from a 64-bit rotation. My understanding is that these intrinsics work in LLVM; Which compiler is interpreting the prefix as an architecture? I don't recall that the prefix has indicated the architcture anywhere; the bitmanip intrinsics have had output length encoded there for many years.

Copy link

@anderslindgren-iar anderslindgren-iar Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't mean that a compiler would interpret it as an architecture -- I'm concerned that users might find it confusing. (Using the trivial sample set "me" -- 100% of the members of the group was confused by this...)

I wasn't aware that _rv32_ror maps to ror or rorw for RV32 and RV64, respectively. (The IAR compiler has support for this, but I wasn't personally involved in that part.) This is taking things one step further away from the idea that there is a one-to-one mapping between intrinsics and instructions. Well, it might be good, but the question is where to draw the line? It would make sense to have a small set of well-defined helper functions, like __builtin_clz, and then provide trivial intrinsics that correspond to individual instructions.

The crux of the matter is that it's time-consuming to add support for intrinsics in a number of compilers. Today, there are at least three that is in active development (gcc, clang, and IAR). By keeping it as simple as possible on the compiler side, it's easy for somebody to write a layer on top of this that can use the compiler-provided low-level intrinsics functions. This is also true if we plan to provide "emulated" versions of, say, crypto intrinsics when using a device without that extension. It's more work (and probably extremely error prone) if each compiler should provide their own variant.

I'm aware that the RISC-V architecture has been around for some time. However -- if it is successful -- much, much more code will be written for it in the future that what has been done so far. Hence, I think it's a win for everybody if we could agree upon guidelines for intrinsics to be used in the future, so that the look and feel of intrinsics provided by various extensions will be the same. (And, if backward compatibility is important, it's probably trivial to provide headers that could define the old names in terms of the new intrinsics.)

By the way, were are the intrinsics for the ratified extensions formally defined? The B specification doesn't include them, as far as I can see. I've found header files for (presumably) gcc but that doesn't feel like a sound foundation to build a future echo system on.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(.. snip ..)

This is also true if we plan to provide "emulated" versions of, say, crypto intrinsics when using a device without that extension. It's more work (and probably extremely error prone) if each compiler should provide their own variant.

Important. As a cryptographer and co-author of the cryptography extension I would recommend against such emulation of crypto intrinsics specifically; it can easily break the constant-time security guarantees explicitly stated by these extensions (especially if they include table lookups or conditional branches). With some other extensions it may be ok.

The elimination of timing leakage is one of the main reasons for using crypto intrinsics as AES is very difficult/slow to implement in constant time (without table lookups) without such ISA extensions. Emulated code would be basically both insecure and very slow; so not very useful.

The repo at https://github.com/rvkrypto/rvkrypto-fips/ provides such emulation (in addition to inline assembler and some built-in mappings) only for development purposes; compilers should not do that. We just did this to do quantitative analysis before the instruction encodings were fixed (that was actually one of the very last steps in the ratification process). See "Chapter 5. Data Independent Execution Latency Subset: Zkt" of the ratified Scalar Crypto spec for further information: https://github.com/riscv/riscv-crypto/releases

By the way, were are the intrinsics for the ratified extensions formally defined? The B specification doesn't include them, as far as I can see. I've found header files for (presumably) gcc but that doesn't feel like a sound foundation to build a future echo system on.

I'm afraid that the intrinsics have grown mostly organically, even though RISC-V folks have worked to help to upstream them into both gcc and clang (along with other architectural support). So this repo is the closest thing to an official specification but unfortunately has had little progress as there is no little centralized management over the intrinsics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, see the other PR about the scalar crypto intrinsics specifically. There has been quite a lot of review on that one. #23

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications.

As this is still early days in the RISC-V world, I still think that it would be a good idea to standardize the format used by intrinsics.

A quick question: Why does intrinsics, that work with classical bit operations like ror and brev8, have signed parameters and return type? From a user point of view, it's more natural to use unsigned types.

Copy link

@mjosaarinen mjosaarinen Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications.

As this is still early days in the RISC-V world, I still think that it would be a good idea to standardize the format used by intrinsic.

Yep. Unfortunately, defining instrinsics is not a work item for the ISA extension task groups, and the grand harmonization would need to be directed from the "highest level" anyway. This has not occurred, and hence it has been left mainly to the compiler people. The vector extension is different of course and in crypto we knew that people needed those so we've proposed them. But intrinsics are unfortunately not going into the ISA specs themselves as things currently are.

A quick question: Why does intrinsics, that work with classical bit operations like ror and brev8, have signed parameters and return type? From a user point of view, it's more natural to use unsigned types.

Yes, I agree that it's a bit weird. However, sign extension is a "universal" convention with the RISC-V ISA. For example, the load instructions lw always sign-extend. Hence other "W" instructions such as rorw (the 32-bit rotation on RV64) also sign-extend the highest bit. If the return type prototype for the corresponding intrinsic was unsigned, there might be more of a risk of the compiler creating additional instructions to clear the upper half upon variable assignment or type cast.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the P extension, the intrinsics are part of the ISA specification. However, this extension has a different background as it comes from Andes rather than from the academic world.

Types in the intrinsics are an interesting topic, especially if the compiler can't provide additional information than the signature (which is the case when they are defined as inline function that use inline assembly). In the P extension, some instructions only use the lower 16 bits of an input register. The intrinsics are defined using int16_t. Unfortunately, a naive compiler would add code to ensure that the registers are sign-extended from bit 16, even though it's not needed. On the other hand, if they would use int32_t (or intXLEN_t), the compiler would add code in other situations. Hence, without additional keywords or built-in knowledge it's impossible for the compiler to do the right thing.

I'm currently lobbying for using int16_t as the return type for intrinsics that return a sign-extended 16 bit value, like the instruction kaddh do.

Using uint32_t should be safe. On RV32, there is no problem since registers are 32 bits. On RV64, the standard calling convention says that both signed and unsigned 32 bit values should be sign-extended to 64 bits, this interpretation is (or should be) applied to intrinsics as well.

Anyway, I think that we should define the intrinsics to be as natural as possible from the point of view of the user. Hence, instructions that make sense on the bit-level (like rotate), should be defined using unsigned types. When working in C, a user shouldn't need to be concerned how bits outside of a type are represented.

To conclude, in addition to a generic intrinsic specification, I strongly believe we need official documents on the C-level intrinsics for each extension. Personally, I wouldn't mind to have them as part of the ISA specification, but I'm satisfied as long as they are formally defined somewhere.

By they way, what is the status of the Crypto extension? I would like to add it to the IAR tools some day, but I prefer to do that once it has reached a stable state.

Copy link

@mjosaarinen mjosaarinen Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(..)

To conclude, in addition to a generic intrinsic specification, I strongly believe we need official documents on the C-level intrinsics for each extension. Personally, I wouldn't mind to have them as part of the ISA specification, but I'm satisfied as long as they are formally defined somewhere.

RVI ratified a set of new extensions at the end of last year and those are currently being incorporated with the main privileged and unprivileged specifications. The intrinsics are very unlikely to go into this edition, and the issue really needs to be raised at the chairs / technical steering group level for that to happen. RISC-V has paid staff that edits the main specs.

https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions

By they way, what is the status of the Crypto extension? I would like to add it to the IAR tools some day, but I prefer to do that once it has reached a stable state.

Scalar Crypto is well past stable as it is fully ratified (officially v1.01). Vector crypto will come later. That's the main reason why gnu & llvm toolchain support for scalar crypto is already upstreamed and there are scalar crypto C intrinsics in the "main" clang too.

I'm not sure when/if ratification will happen to the P extension as it kind of competes with the V extension, which is the main RISC-V Architects' preferred way of doing parallelism. V was finally ratified in the same batch as scalar crypto last november.

@cmuellner
Copy link
Collaborator Author

Closing because of lack of consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants