- Front matter
- Executive Summary
- Design principles
- Listing of hardware loop builtins (
xcvhwlp
) - Listing of multiply-accumulate builtins (
xcvmac
) - Listing of immediate branch builtins (
xcvbi
) - Listing of post-indexed and register-indexed memory access builtins (
xcvmem
) - Listing of miscellaneous ALU builtins (
xcvalu
) - Listing of PULP 8/16-bit SIMD builtins (
xcvsimd
)- SIMD ALU operations (32-bit)
- SIMD ALU operations (64-bit)
- SIMD bit manipulation operations (32-bit)
- SIMD bit manipulation operations (64-bit)
- SIMD dot product operations (32-bit)
- SIMD dot product operations (64-bit)
- SIMD shuffle and pack operations (32-bit)
- SIMD shuffle and pack operations (64-bit)
- SIMD comparison operations (32-bit)
- SIMD comparison operations (64-bit)
- SIMD complex number operations (32-bit)
- SIMD complex number operations (64-bit)
- Listing of PULP bit manipulation builtins (
xcvbitmanip
) - Listing of event load word builtins (
xcvelw
) - C API Headers
Note. Since version 1.0 was ratified, a number of discrepancies have been identified. Version 1.1 of this specification is in preparation to correct these discrepancies. The tool chains being developed will track this version 1.1, not version 1.0.
Date | Version | Notes |
---|---|---|
19 Jul 2023 | 1.2 | Corrected Unsigned to Signed for SIMD ALU Operation |
28 Jun 2023 | 1.1 | Corrected operand order for Bit Manipulation |
20 Apr 2023 | 1.0 | Ratified release |
27 Mar 2023 | 0.9 | Final draft with small corrections to builtins and |
official OpenHW template. | ||
23 Feb 2023 | 0.5 | Clean up tables of links, correct bitmanip extract. |
16 Feb 2023 | 0.4 | Fourth draft incorporating Pascal Gouedo comments. |
13 Feb 2023 | 0.3 | Third draft after feedback from the GCC |
implementation team. | ||
22 Jan 2023 | 0.2 | Second draft after view and Software TG discussion. |
- naming to match upstream convention; | ||
- naming to include the ISA extension name; | ||
- pass by reference only when loading a value; | ||
- pseudo-overloading to allow 32- and 64-bit support; | ||
15 Dec 2022 | 0.1 | First draft for review. |
This document is licensed under a Creative Commons Attribution Share Alike 4.0 International license. See the full legal code.
Copyright (C) 2023 OpenHW Group. You may use, copy, modify, and distribute this work under the terms of the License, subject to the conditions specified in the License.
We need to define our builtin names for the 200+ instructions in the various CORE-V ISA extensions. For each builtin, we give its prototype and relate the arguments to the semantics of the instruction as described in the CV32E40P architecture manual.
Standard RISC-V builtin functions use the following naming convention.
__builtin_riscv_<name>
The Clang/LLVM project has also established the naming convention for vendor specific builtin functions.
The general naming follows the RISC-V convention and is:
__builtin_riscv_<vendor>_<name>
OpenHW CORE-V processors, will therefore use the general prefix
__builtin_riscv_cv_<name>
At the time of writing, OpenHW defines eight instruction set extensions, some with large numbers of instructions. To avoid congestion and confusion, CORE-V will also included the ISA extension name in the builtin name.
__builtin_riscv_cv_<isaext>_<name>
Where instructions completely rewrite the destination register, that value is returned as a result. Where instructions only partially modify the destination register, the value to be modified is passed as an argument and the modified value returned as a result.
In their basic form, the builtin names need to be suitable for C and C++, which means we cannot use overloading of function names. For some instructions there is a simple one-to-one mapping. So the SIMD instruction to add two vectors of half words:
cv.add.h rd, rs1, rs2
maps to the builtin function
uint32_t __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t)
However we can use GCC to sort out between short and long forms of instructions to reduce the number of builtin functions we need. For example to add a scalar to a vector of half words we have in the SIMD ISA extension the general purpose instruction:
cv.add.sc.h rd, rs1, rs2
but where rs2 is a small (6 bit) constant, we also have
cv.add.sci.h rd, rs1, imm6
We do not need two builtins for these instructions, instead we can have:
uint32_t __builtin_riscv_cv_simd_add_sc_h (uint32_t, int16_t)
GCC can work out whether the second argument is a constant which fits in 6 bits, and if so generate the sci
version of the opcode, otherwise it can generate the sc
version.
We use this general approach to create the basic set of SIMD builtin functions.
For the SIMD extension, this could be an alternative for the basic set of builtins. C and C++ allow you to define explicit vector types, and this can make for cleaner builtins. We can define
typedef int16_t v2hi __attribute__ ((vector_size (4)))
typedef int8_t v4qi __attribute__ ((vector_size (4)))
for vectors of half words and vectors bytes respectively. We can then make the types or arguments and results explicit, so our two functions above become:
v2hi __builtin_riscv_cv_add_h (v2hi, v2hi)
v2hi __builtin_riscv_cv_add_sc_h (v2hi, int16_t)
While this is cleaner semantically, it makes no difference to the number of functions needed, or code generated. A second issue is if future SIMD instructions were to support vectors of 4-bit of 2-bit values, this scheme would not work, since it cannot go smaller than the minimal addressable unit, which for RISC-V in general is 8-bits.
Our recommendation is therefore that we do not adopt this.
There is no problem reusing the same names for 64-bit variants, for example
int32_t __builtin_riscv_cv_mac_mac (int32_t x, int32_t y, int32_t z)
int64_t __builtin_riscv_cv_mac_mac (int64_t x, int64_t y, int64_t z)
This is not overloading in the C++ space, since for any particular compilation, only one will be defined according to whether the compilation target is 32- or 64-bit.
This document gives a detailed description of the 32-bit variants, and then a simple list of the 64-bit variants.
Where an existing generic GCC builtin function exists, this is reused for CORE-V. This is the standard arrangement for GCC, which we must follow if we are to be upstreamed in the future.
The ordering of arguments to builtin functions draws on equivalent standard C library functions. Thus we have
int32_t __builtin_riscv_cv_mac_mac (int32_t x, int32_t y, int32_t z)
to compute x * y + z
, mirroring the argument/result format of the standard C fma
function.
There are no builtin functions for hardware loops.
Applicability. 32-bit cores.
The ordering of arguments mirrors the standard C function fma
For 32-bit architectures, we have the following instructions providing 32-bit x 32-bit multiply-accumulate.
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
Generated assembler:
cv.mac rD,rs1,rs2
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
Generated assembler:
cv.msu rD,rs1,rs2
Applicability. 64-bit cores.
It is anticipated that 64-bit cores may wish to define 64-bit x 64-bit operations by analogy to the 32-bit x 32-bit operations on 32-bit cores. No builtins are defined yet for such operations.
Applicability. 32-bit cores
__builtin_riscv_cv_mac_muluN
__builtin_riscv_cv_mac_mulhhuN
__builtin_riscv_cv_mac_mulsN
__builtin_riscv_cv_mac_mulhhsN
__builtin_riscv_cv_mac_muluRN
__builtin_riscv_cv_mac_mulhhuRN
__builtin_riscv_cv_mac_mulsRN
__builtin_riscv_cv_mac_mulhhsRN
Even though these are 16-bit operands, we pass them as 32-bit, because the different instructions select different 16-bit words within the 32-bit word. The 32-bit words are always passed unsigned, even when the operations are signed, since we are extracting half words, from a 32-bit entity. However the results from signed operations are 32-bit signed values.
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.muluN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.mulhhuN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.mulsN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.mulshsN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.muluRN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.mulhhuRN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.mulsRN rD,rs1,rs2,Is3
Argument/result mapping:
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.mulhhsRN rD,rs1,rs2,Is3
Applicability. 64-bit cores.
It is anticipated that 64-bit cores may wish to define 32-bit x 32-bit operations by analogy to the 16-bit x 16-bit operations on 32-bit cores. No builtins are defined yet for such operations.
Applicability. 32-bit cores.
__builtin_riscv_cv_mac_macuN
__builtin_riscv_cv_mac_machhuN
__builtin_riscv_cv_mac_macsN
__builtin_riscv_cv_mac_machhsN
__builtin_riscv_cv_mac_macuRN
__builtin_riscv_cv_mac_machhuRN
__builtin_riscv_cv_mac_macsRN
__builtin_riscv_cv_mac_machhsRN
Even though these are 16-bit operands, we pass them as 32-bit, because the different instructions select different 16-bit words within the 32-bit word. The 32-bit words are always passed unsigned, even when the operations are signed, since we are extracting half words, from a 32-bit entity. However the existing value passed in, and the results from signed operations are 32-bit signed values.
As with the 32-bit x 32-bit multiply-accumulate, the ordering of arguments mirrors the standard C function, fma
.
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.macuN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.machhuN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.macsN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.machhsN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.macuRN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.machhuRN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.macsRN rD,rs1,rs2,Is3
Argument/result mapping:
- result, z:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
Generated assembler:
cv.machhsRN rD,rs1,rs2,Is3
Applicability. 64-bit cores.
It is anticipated that 64-bit cores may wish to define 32-bit x 32-bit operations by analogy to the 16-bit x 16-bit operations on 32-bit cores. No builtins are defined yet for such operations.
There are no builtin functions for immediate branching.
There are no builtins for post-indexed and register-indexed memory access.
Applicability. 32-bit cores.
__builtin_abs
__builtin_riscv_cv_alu_slet
__builtin_riscv_cv_alu_sletu
__builtin_riscv_cv_alu_min
__builtin_riscv_cv_alu_minu
__builtin_riscv_cv_alu_max
__builtin_riscv_cv_alu_maxu
__builtin_riscv_cv_alu_exths
__builtin_riscv_cv_alu_exthz
__builtin_riscv_cv_alu_extbs
__builtin_riscv_cv_alu_extbz
__builtin_riscv_cv_alu_clip
__builtin_riscv_cv_alu_clipu
__builtin_riscv_cv_alu_addN
__builtin_riscv_cv_alu_adduN
__builtin_riscv_cv_alu_addRN
__builtin_riscv_cv_alu_adduRN
__builtin_riscv_cv_alu_subN
__builtin_riscv_cv_alu_subuN
__builtin_riscv_cv_alu_subRN
__builtin_riscv_cv_alu_subuRN
Note: A number of functions return boolean values. This specification follows the C convention, where boolean values use the int
type.
Argument/result mapping:
- result
rD
- j:
rs1
Generated assembler:
cv.abs rD,rs1
Note: This is a standard GCC builtin.
Argument/result mapping:
- result
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.slet rD,rs1,rs2
Argument/result mapping:
- result
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sletu rD,rs1,rs2
Argument/result mapping:
- result
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.min rD,rs1,rs2
Argument/result mapping:
- result
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.minu rD,rs1,rs2
Argument/result mapping:
- result
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.max rD,rs1,rs2
Argument/result mapping:
- result
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.maxu rD,rs1,rs2
Argument/result mapping:
- result
rD
- i:
rs1
Generated assembler:
cv.exths rD,rs1
Argument/result mapping:
- result
rD
- i:
rs1
Generated assembler:
cv.exthz rD,rs1
Argument/result mapping:
- result
rD
- i:
rs1
Generated assembler:
cv.extbs rD,rs1
Argument/result mapping:
- result
rD
- i:
rs1
Generated assembler:
cv.extbz rD,rs1
Argument/result mapping:
Case a) where j is constant and j + 1
is an exact power of 2 up to 2^30
- result:
rD
- i:
rs1
- j:
Is2
(5-bit unsigned value)
or case b)
- result:
rD
wherej + 1
is not a power of 2 - i:
rs1
- j:
rs2
.
Note: In case a), Is2 = log2 (j + 1) + 1
.
Generated assembler:
Case a)
cv.clip rD,rs1,Is2
or case b)
cv.clipr rD,rs1,rs2
Examples:
__builtin_riscv_cv_alu_clip (i, 15)
would result in
cv.clip rD,rs1,5
__builtin_riscv_cv_alu_clip (i, 10)
would result in
c.lui rs2,10
cv.clipr rD,rs1,rs2
Argument/result mapping:
Case a) where j is constant and and j + 1
is an exact power of 2 up to 2^30
- result:
rD
- i:
rs1
- j:
Is2
(5-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
.
Note: In case a), Is2 = log2 (j + 1) + 1
.
Generated assembler:
Case a)
cv.clipu rD,rs1,Is2
or case b)
cv.clipur rD,rs1,rs2
Examples:
__builtin_riscv_cv_alu_clip (i, 255)
would result in
cv.clipu rD,rs1,9
__builtin_riscv_cv_alu_clip (i, 200)
would result in
c.lui rs2,200
cv.clipur rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.addN rD,rs1,rs2,Is3
or case b)
cv.addNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.adduN rD,rs1,rs2,Is3
or case b)
cv.adduNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.addRN rD,rs1,rs2,Is3
or case b)
cv.addRNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.adduRN rD,rs1,rs2,Is3
or case b)
cv.adduRNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.subN rD,rs1,rs2,Is3
or case b)
cv.subNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.subuN rD,rs1,rs2,Is3
or case b)
cv.subuNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.subRN rD,rs1,rs2,Is3
or case b)
cv.subRNr rD,rs1,rs2
Argument/result mapping:
Case a) shft is a constant in the range 0 <= shft <= 31
- result:
rD
- x:
rs1
- y:
rs2
- shft:
Is3
(5-bit unsigned value)
or case b)
- result, x:
rD
- y:
rs1
- shft:
rs2
Generated assembler:
Case a)
cv.subuRN rD,rs1,rs2,Is3
or case b)
cv.subuRNr rD,rs1,rs2
Applicability. 64-bit cores.
It is anticipated that 64-bit cores may wish to define ALU operations by analogy to the ALU operations on 32-bit cores.
Applicability. 32-bit cores.
__builtin_riscv_cv_simd_add_h
__builtin_riscv_cv_simd_add_b
__builtin_riscv_cv_simd_add_sc_h
__builtin_riscv_cv_simd_add_sc_b
__builtin_riscv_cv_simd_sub_h
__builtin_riscv_cv_simd_sub_b
__builtin_riscv_cv_simd_sub_sc_h
__builtin_riscv_cv_simd_sub_sc_b
__builtin_riscv_cv_simd_avg_h
__builtin_riscv_cv_simd_avg_b
__builtin_riscv_cv_simd_avg_sc_h
__builtin_riscv_cv_simd_avg_sc_b
__builtin_riscv_cv_simd_avgu_h
__builtin_riscv_cv_simd_avgu_b
__builtin_riscv_cv_simd_avgu_sc_h
__builtin_riscv_cv_simd_avgu_sc_b
__builtin_riscv_cv_simd_min_h
__builtin_riscv_cv_simd_min_b
__builtin_riscv_cv_simd_min_sc_h
__builtin_riscv_cv_simd_min_sc_b
__builtin_riscv_cv_simd_minu_h
__builtin_riscv_cv_simd_minu_b
__builtin_riscv_cv_simd_minu_sc_h
__builtin_riscv_cv_simd_minu_sc_b
__builtin_riscv_cv_simd_max_h
__builtin_riscv_cv_simd_max_b
__builtin_riscv_cv_simd_max_sc_h
__builtin_riscv_cv_simd_max_sc_b
__builtin_riscv_cv_simd_maxu_h
__builtin_riscv_cv_simd_maxu_b
__builtin_riscv_cv_simd_maxu_sc_h
__builtin_riscv_cv_simd_maxu_sc_b
__builtin_riscv_cv_simd_srl_h
__builtin_riscv_cv_simd_srl_b
__builtin_riscv_cv_simd_srl_sc_h
__builtin_riscv_cv_simd_srl_sc_b
__builtin_riscv_cv_simd_sra_h
__builtin_riscv_cv_simd_sra_b
__builtin_riscv_cv_simd_sra_sc_h
__builtin_riscv_cv_simd_sra_sc_b
__builtin_riscv_cv_simd_sll_h
__builtin_riscv_cv_simd_sll_b
__builtin_riscv_cv_simd_sll_sc_h
__builtin_riscv_cv_simd_sll_sc_b
__builtin_riscv_cv_simd_or_h
__builtin_riscv_cv_simd_or_b
__builtin_riscv_cv_simd_or_sc_h
__builtin_riscv_cv_simd_or_sc_b
__builtin_riscv_cv_simd_xor_h
__builtin_riscv_cv_simd_xor_b
__builtin_riscv_cv_simd_xor_sc_h
__builtin_riscv_cv_simd_xor_sc_b
__builtin_riscv_cv_simd_and_h
__builtin_riscv_cv_simd_and_b
__builtin_riscv_cv_simd_and_sc_h
__builtin_riscv_cv_simd_and_sc_b
__builtin_riscv_cv_simd_abs_h
__builtin_riscv_cv_simd_abs_b
__builtin_riscv_cv_simd_neg_h
__builtin_riscv_cv_simd_neg_b
Note.* The documentation of these instructions uses op2
, to refer to rs2
for vector operations and rs2
or Is2
for scalar replication instructions.
Note. SIMD registers are always specified as uint32_t, even though the components of the vector may be treated as signed.
The half word vector add
instruction comes in variants which shift by 0, 1, 2 or 3, in order to divide by 1, 2, 4 or 8. The first of these maps to the standard SIMD half word vector addition, the rest to the various forms of the SIMD complex-number additon.
Argument/result mapping:
Case a) shft is zero
- result:
rD
- i:
rs1
- j:
rs2
- shft: unused
or case b) shft is non-zero
- result:
rD
- i:
rs1
- j:
rs2
- shft: unused as argument, indicates which instruction to select.
Generated assembler:
Case a)
cv.add.h rD,rs1,rs2
or case b)
cv.add.div2 rD,rs1,rs2 ;; shft = 1
cv.add.div4 rD,rs1,rs2 ;; shft = 2
cv.add.div8 rD,rs1,rs2 ;; shft = 3
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.add.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.add.sci.h rD,rs1,Is2
or case b)
cv.add.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.add.sci.b rD,rs1,Is2
or case b)
cv.add.sc.b rD,rs1,rs2
The half word vector sub
instruction comes in variants which shift by 0, 1, 2 or 3, in order to divide by 1, 2, 4 or 8. The first of these maps to the standard SIMD half word vector subtraction, the rest to the various forms of the SIMD complex-number subtraction.
Argument/result mapping:
Case a) shft is zero
- result:
rD
- i:
rs1
- j:
rs2
- shft: unused
or case b) shft is non-zero
- result:
rD
- i:
rs1
- j:
rs2
- shft: unused as argument, indicates which instruction to select.
Generated assembler:
Case a)
cv.sub.h rD,rs1,rs2
or case b)
cv.sub.div2 rD,rs1,rs2 ;; shft = 1
cv.sub.div4 rD,rs1,rs2 ;; shft = 2
cv.sub.div8 rD,rs1,rs2 ;; shft = 3
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sub.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sub.sci.h rD,rs1,Is2
or case b)
cv.sub.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sub.sci.b rD,rs1,Is2
or case b)
cv.sub.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.avg.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.avg.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.avg.sci.h rD,rs1,Is2
or case b)
cv.avg.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.avg.sci.b rD,rs1,Is2
or case b)
cv.avg.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.avgu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.avgu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.avgu.sci.h rD,rs1,Is2
or case b)
cv.avgu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.avgu.sci.b rD,rs1,Is2
or case b)
cv.avgu.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.min.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.min.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.min.sci.h rD,rs1,Is2
or case b)
cv.min.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.min.sci.b rD,rs1,Is2
or case b)
cv.min.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.minu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.minu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.minu.sci.h rD,rs1,Is2
or case b)
cv.minu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.minu.sci.b rD,rs1,Is2
or case b)
cv.minu.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.max.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.max.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.max.sci.h rD,rs1,Is2
or case b)
cv.max.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.max.sci.b rD,rs1,Is2
or case b)
cv.max.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.maxu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.maxu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.maxu.sci.h rD,rs1,Is2
or case b)
cv.maxu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.maxu.sci.b rD,rs1,Is2
or case b)
cv.maxu.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.srl.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.srl.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.srl.sci.h rD,rs1,Is2
or case b)
cv.srl.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.srl.sci.b rD,rs1,Is2
or case b)
cv.srl.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sra.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sra.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sra.sci.h rD,rs1,Is2
or case b)
cv.sra.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sra.sci.b rD,rs1,Is2
or case b)
cv.sra.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sll.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sll.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sll.sci.h rD,rs1,Is2
or case b)
cv.sll.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sll.sci.b rD,rs1,Is2
or case b)
cv.sll.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.or.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.or.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.or.sci.h rD,rs1,Is2
or case b)
cv.or.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.or.sci.b rD,rs1,Is2
or case b)
cv.or.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.xor.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.xor.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.xor.sci.h rD,rs1,Is2
or case b)
cv.xor.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.xor.sci.b rD,rs1,Is2
or case b)
cv.xor.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.and.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.and.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.and.sci.h rD,rs1,Is2
or case b)
cv.and.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.and.sci.b rD,rs1,Is2
or case b)
cv.and.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
Generated assembler:
cv.abs.h rD,rs1
Argument/result mapping:
- result:
rD
- i:
rs1
Generated assembler:
cv.abs.b rD,rs1
There is no cv.neg.h
instruction, but as a convenience, we provide this builtin using cv.sub.h
.
Argument/result mapping:
- result:
rD
- i:
rs2
Generated assembler:
cv.sub.h rD,zero,rs2
There is no cv.neg.b
instruction, but as a convenience, we provide this builtin using cv.sub.b
.
Argument/result mapping:
- result:
rD
- i:
rs2
Generated assembler:
cv.sub.b rD,zero,rs2
Applicability. 64-bit cores.
At the time of writing the SIMD architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 32-bit cores.
__builtin_riscv_cv_simd_extract_h
__builtin_riscv_cv_simd_extract_b
__builtin_riscv_cv_simd_extractu_h
__builtin_riscv_cv_simd_extractu_b
__builtin_riscv_cv_simd_insert_h
__builtin_riscv_cv_simd_insert_b
Argument/result mapping:
- result:
rD
- i:
rs1
- sel:
Is2
(1-bit)
Generated assembler:
cv.extract.h rD,rs1,Is2
Argument/result mapping:
- result:
rD
- i:
rs1
- sel:
Is2
(2-bit)
Generated assembler:
cv.extract.b rD,rs1,Is2
Argument/result mapping:
- result:
rD
- i:
rs1
- sel:
Is2
(1-bit)
Generated assembler:
cv.extractu.h rD,rs1,Is2
Argument/result mapping:
- result:
rD
- i:
rs1
- sel:
Is2
(2-bit)
Generated assembler:
cv.extractu.b rD,rs1,Is2
Argument/result mapping:
- result, j:
rD
- i:
rs1
- sel:
Is2
(1-bit)
Generated assembler:
cv.insert.h rD,rs1,Is2
Argument/result mapping:
- result, j:
rD
- i:
rs1
- sel:
Is2
(2-bit)
Generated assembler:
cv.insert.b rD,rs1,Is2
Applicability. 64-bit cores.
At the time of writing the SIMD architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 32-bit cores.
__builtin_riscv_cv_simd_dotup_h
__builtin_riscv_cv_simd_dotup_b
__builtin_riscv_cv_simd_dotup_sc_h
__builtin_riscv_cv_simd_dotup_sc_b
__builtin_riscv_cv_simd_dotusp_h
__builtin_riscv_cv_simd_dotusp_b
__builtin_riscv_cv_simd_dotusp_sc_h
__builtin_riscv_cv_simd_dotusp_sc_b
__builtin_riscv_cv_simd_dotsp_h
__builtin_riscv_cv_simd_dotsp_b
__builtin_riscv_cv_simd_dotsp_sc_h
__builtin_riscv_cv_simd_dotsp_sc_b
__builtin_riscv_cv_simd_sdotup_h
__builtin_riscv_cv_simd_sdotup_b
__builtin_riscv_cv_simd_sdotup_sc_h
__builtin_riscv_cv_simd_sdotup_sc_b
__builtin_riscv_cv_simd_sdotusp_h
__builtin_riscv_cv_simd_sdotusp_b
__builtin_riscv_cv_simd_sdotusp_sc_h
__builtin_riscv_cv_simd_sdotusp_sc_b
__builtin_riscv_cv_simd_sdotsp_h
__builtin_riscv_cv_simd_sdotsp_b
__builtin_riscv_cv_simd_sdotsp_sc_h
__builtin_riscv_cv_simd_sdotsp_sc_b
Note.* The documentation of these instructions uses op2
, to refer to rs2
for vector operations and rs2
or Is2
for scalar replication instructions.
Note. SIMD registers are always specified as uint32_t, even though the components of the vector may be treated as signed.
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.dotup.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.dotup.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.dotup.sci.h rD,rs1,Is2
or case b)
cv.dotup.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.dotup.sci.b rD,rs1,Is2
or case b)
cv.dotup.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.dotusp.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.dotusp.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.dotusp.sci.h rD,rs1,Is2
or case b)
cv.dotusp.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.dotusp.sci.b rD,rs1,Is2
or case b)
cv.dotusp.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.dotsp.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.dotsp.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.dotsp.sci.h rD,rs1,Is2
or case b)
cv.dotsp.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.dotsp.sci.b rD,rs1,Is2
or case b)
cv.dotsp.sc.b rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sdotup.h rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sdotup.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result, k:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sdotup.sci.h rD,rs1,Is2
or case b)
cv.sdotup.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result, k:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sdotup.sci.b rD,rs1,Is2
or case b)
cv.sdotup.sc.b rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sdotusp.h rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sdotusp.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result, k:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sdotusp.sci.h rD,rs1,Is2
or case b)
cv.sdotusp.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result, k:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sdotusp.sci.b rD,rs1,Is2
or case b)
cv.sdotusp.sc.b rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sdotsp.h rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.sdotsp.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result, k:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sdotsp.sci.h rD,rs1,Is2
or case b)
cv.sdotsp.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result, k:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.sdotsp.sci.b rD,rs1,Is2
or case b)
cv.sdotsp.sc.b rD,rs1,rs2
Applicability. 64-bit cores.
At the time of writing the SIMD architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 32-bit cores.
__builtin_riscv_cv_simd_shuffle_h
__builtin_riscv_cv_simd_shuffle_sci_h
__builtin_riscv_cv_simd_shuffle_b
__builtin_riscv_cv_simd_shuffle_sci_b
__builtin_riscv_cv_simd_shuffle2_h
__builtin_riscv_cv_simd_shuffle2_b
__builtin_riscv_cv_simd_packhi_h
__builtin_riscv_cv_simd_packlo_h
__builtin_riscv_cv_simd_packhi_b
__builtin_riscv_cv_simd_packlo_b
Argument/result mapping:
- result:
rD
- i:
rs1
- flgs:
rs2
Generated assembler:
cv.shuffle.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- flgs:
Is2
(2-bit unsigned value)
Generated assembler:
cv.shuffle.sci.h rD,rs1,Is2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.shuffle.b rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- flgs[5:0]:
Is2
(6-bit unsigned value)
Generated assembler:
cv.shuffleI0.sci.b rD,rs1,Is2 ;; flgs[7:6] = 0
cv.shuffleI1.sci.b rD,rs1,Is2 ;; flgs[7:6] = 1
cv.shuffleI2.sci.b rD,rs1,Is2 ;; flgs[7:6] = 2
cv.shuffleI3.sci.b rD,rs1,Is2 ;; flgs[7:6] = 3
Argument/result mapping:
- result, k:
rD
- i:
rs1
- flgs:
rs2
Generated assembler:
cv.shuffle2.h rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- flgs:
rs2
Generated assembler:
cv.shuffle2.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.pack.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.pack rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.packhi.b rD,rs1,rs2
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.packlo.b rD,rs1,rs2
Applicability. 64-bit cores.
At the time of writing the SIMD architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 32-bit cores.
__builtin_riscv_cv_simd_cmpeq_h
__builtin_riscv_cv_simd_cmpeq_b
__builtin_riscv_cv_simd_cmpeq_sc_h
__builtin_riscv_cv_simd_cmpeq_sc_b
__builtin_riscv_cv_simd_cmpne_h
__builtin_riscv_cv_simd_cmpne_b
__builtin_riscv_cv_simd_cmpne_sc_h
__builtin_riscv_cv_simd_cmpne_sc_b
__builtin_riscv_cv_simd_cmpgt_h
__builtin_riscv_cv_simd_cmpgt_b
__builtin_riscv_cv_simd_cmpgt_sc_h
__builtin_riscv_cv_simd_cmpgt_sc_b
__builtin_riscv_cv_simd_cmpge_h
__builtin_riscv_cv_simd_cmpge_b
__builtin_riscv_cv_simd_cmpge_sc_h
__builtin_riscv_cv_simd_cmpge_sc_b
__builtin_riscv_cv_simd_cmplt_h
__builtin_riscv_cv_simd_cmplt_b
__builtin_riscv_cv_simd_cmplt_sc_h
__builtin_riscv_cv_simd_cmplt_sc_b
__builtin_riscv_cv_simd_cmple_h
__builtin_riscv_cv_simd_cmple_b
__builtin_riscv_cv_simd_cmple_sc_h
__builtin_riscv_cv_simd_cmple_sc_b
__builtin_riscv_cv_simd_cmpgtu_h
__builtin_riscv_cv_simd_cmpgtu_b
__builtin_riscv_cv_simd_cmpgtu_sc_h
__builtin_riscv_cv_simd_cmpgtu_sc_b
__builtin_riscv_cv_simd_cmpgeu_h
__builtin_riscv_cv_simd_cmpgeu_b
__builtin_riscv_cv_simd_cmpgeu_sc_h
__builtin_riscv_cv_simd_cmpgeu_sc_b
__builtin_riscv_cv_simd_cmpltu_h
__builtin_riscv_cv_simd_cmpltu_b
__builtin_riscv_cv_simd_cmpltu_sc_h
__builtin_riscv_cv_simd_cmpltu_sc_b
__builtin_riscv_cv_simd_cmpleu_h
__builtin_riscv_cv_simd_cmpleu_b
__builtin_riscv_cv_simd_cmpleu_sc_h
__builtin_riscv_cv_simd_cmpleu_sc_b
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpeq.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpeq.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpeq.sci.h rD,rs1,Is2
or case b)
cv.cmpeq.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpeq.sci.b rD,rs1,Is2
or case b)
cv.cmpeq.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpne.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpne.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpne.sci.h rD,rs1,Is2
or case b)
cv.cmpne.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpne.sci.b rD,rs1,Is2
or case b)
cv.cmpne.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpgt.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpgt.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpgt.sci.h rD,rs1,Is2
or case b)
cv.cmpgt.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpgt.sci.b rD,rs1,Is2
or case b)
cv.cmpgt.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpge.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpge.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpge.sci.h rD,rs1,Is2
or case b)
cv.cmpge.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpge.sci.b rD,rs1,Is2
or case b)
cv.cmpge.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmplt.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmplt.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmplt.sci.h rD,rs1,Is2
or case b)
cv.cmplt.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmplt.sci.b rD,rs1,Is2
or case b)
cv.cmplt.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmple.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmple.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmple.sci.h rD,rs1,Is2
or case b)
cv.cmple.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range -32 <= j <= 31
- result:
rD
- i:
rs1
- j:
Is2
(6-bit signed value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmple.sci.b rD,rs1,Is2
or case b)
cv.cmple.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpgtu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpgtu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpgtu.sci.h rD,rs1,Is2
or case b)
cv.cmpgtu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpgtu.sci.b rD,rs1,Is2
or case b)
cv.cmpgtu.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpgeu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpgeu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpgeu.sci.h rD,rs1,Is2
or case b)
cv.cmpgeu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpgeu.sci.b rD,rs1,Is2
or case b)
cv.cmpgeu.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpltu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpltu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpltu.sci.h rD,rs1,Is2
or case b)
cv.cmpltu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpltu.sci.b rD,rs1,Is2
or case b)
cv.cmpltu.sc.b rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpleu.h rD,rs1,rs2
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.cmpleu.b rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpleu.sci.h rD,rs1,Is2
or case b)
cv.cmpleu.sc.h rD,rs1,rs2
Argument/result mapping:
Case a) j is a constant in the range 0 <= j <= 63
- result:
rD
- i:
rs1
- j:
Is2
(6-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
Case a)
cv.cmpleu.sci.b rD,rs1,Is2
or case b)
cv.cmpleu.sc.b rD,rs1,rs2
At the time of writing the SIMD architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 64-bit cores.
Applicability. 32-bit cores.
__builtin_riscv_cv_simd_cplxmul_r
__builtin_riscv_cv_simd_cplxmul_i
__builtin_riscv_cv_simd_cplxconj
__builtin_riscv_cv_simd_subrotmj
Note. The complex addition and subtraction operations are specialized versions of the SIMD half word addition and subraction described in SIMD ALU operations (32-bit) above.
This instruction comes in variants which shift by 15, 16, 17 or 18, in order to divide the lower half word of rD
by 1, 2, 4 or 8. For consistency with _builtin_riscv_cv_simd_add_h
and __builtin_riscv_cv_simd_subrotmj
, the shift argument takes the values 0 through 3 to indicate which division is needed.
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
- shft: unused as argument, indicates which instruction to select.
Generated assembler:
cv.cplxmul.r rD,rs1,rs2 ;; shft = 0
cv.cplxmul.r.div2 rD,rs1,rs2 ;; shft = 1
cv.cplxmul.r.div4 rD,rs1,rs2 ;; shft = 2
cv.cplxmul.r.div8 rD,rs1,rs2 ;; shft = 3
This instruction comes in variants which shift by 15, 16, 17 or 18, in order to divide the upper half word of rD
by 1, 2, 4 or 8. For consistency with _builtin_riscv_cv_simd_add_h
and __builtin_riscv_cv_simd_subrotmj
, the shift argument takes the values 0 through 3 to indicate which division is needed.
Argument/result mapping:
- result, k:
rD
- i:
rs1
- j:
rs2
- shft: unused as argument, indicates which instruction to select.
Generated assembler:
cv.cplxmul.i rD,rs1,rs2 ;; shft = 0
cv.cplxmul.i.div2 rD,rs1,rs2 ;; shft = 1
cv.cplxmul.i.div4 rD,rs1,rs2 ;; shft = 2
cv.cplxmul.i.div8 rD,rs1,rs2 ;; shft = 3
Argument/result mapping:
- result:
rD
- i:
rs1
Generated assembler:
cv.cplxconj rD,rs1
This instruction comes in variants which shift by 0, 1, 2 or 3, in order to divide the upper half word of rD
by 1, 2, 4 or 8.
Argument/result mapping:
- result:
rD
- i:
rs1
- j:
rs2
- shft: unused as argument, indicates which instruction to select.
Generated assembler:
cv.subrotmj rD,rs1,rs2 ;; shft = 0
cv.subrotmj.div2 rD,rs1,rs2 ;; shft = 1
cv.subrotmj.div4 rD,rs1,rs2 ;; shft = 2
cv.subrotmj.div8 rD,rs1,rs2 ;; shft = 3
Applicability. 64-bit cores.
At the time of writing the SIMD architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 32-bit cores.
__builtin_riscv_cv_bitmanip_extract
__builtin_riscv_cv_bitmanip_extractu
__builtin_riscv_cv_bitmanip_insert
__builtin_riscv_cv_bitmanip_bclr
__builtin_riscv_cv_bitmanip_bset
__builtin_riscv_cv_bitmanip_ff1
__builtin_riscv_cv_bitmanip_fl1
__builtin_riscv_cv_bitmanip_clb
__builtin_riscv_cv_bitmanip_cnt
__builtin_riscv_cv_bitmanip_ror
__builtin_riscv_cv_bitmanip_bitrev
Note: Some of these functions modify the destination register, so need the value passed by reference, although for convenience the modfied value is returned as result.
Case a) range is a constant
- result:
rD
- i:
rs1
- range[4:0]:
Is2
(5-bit unsigned value) - range[9:5]:
Is3
(5-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- range:
rs2
Generated assembler:
Case a)
cv.extract rD,rs1,Is3,Is2
or case b)
cv.extractr rD,rs1,rs2
Case a) range is a constant
- result:
rD
- i:
rs1
- range[4:0]:
Is2
(5-bit unsigned value) - range[9:5]:
Is3
(5-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- range:
rs2
Generated assembler:
Case a)
cv.extractu rD,rs1,Is3,Is2
or case b)
cv.extractur rD,rs1,rs2
Case a) range is a constant and (range[9:5] + range [4:0]) <= 32
- result, k:
rD
- i:
rs1
- range[4:0]:
Is2
(5-bit unsigned value) - range[9:5]:
Is3
(5-bit unsigned value)
or case b)
- result, k:
rD
- i:
rs1
- range:
rs2
Generated assembler:
Case a)
cv.insert rD,rs1,Is3,Is2
or case b)
cv.insertr rD,rs1,rs2
Case a) range is a constant
- result:
rD
- i:
rs1
- range[4:0]:
Is2
(5-bit unsigned value) - range[9:5]:
Is3
(5-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- range:
rs2
Generated assembler:
Case a)
cv.bclr rD,rs1,Is3,Is2
or case b)
cv.bclrr rD,rs1,rs2
Case a) range is a constant
- result:
rD
- i:
rs1
- range[4:0]:
Is2
(5-bit unsigned value) - range[9:5]:
Is3
(5-bit unsigned value)
or case b)
- result:
rD
- i:
rs1
- range:
rs2
Generated assembler:
Case a)
cv.bset rD,rs1,Is3,Is2
or case b)
cv.bsetr rD,rs1,rs2
- result:
rD
- i:
rs1
Generated assembler:
cv.ff1 rD,rs1
- result:
rD
- i:
rs1
Generated assembler:
cv.fl1 rD,rs1
- result:
rD
- i:
rs1
Generated assembler:
cv.clb rD,rs1
- result:
rD
- i:
rs1
Generated assembler:
cv.cnt rD,rs1
- result:
rD
- i:
rs1
- j:
rs2
Generated assembler:
cv.ror rD,rs1,rs2
- result:
rD
- i:
rs1
- pts:
Is2
(5-bit unsigned integer) - radix: `Is3[1:0] (2-bit unsigned integer)
Generated assembler: (TBC)
cv.bitrev rD,rs1,Is3,Is2
Applicability. 64-bit cores.
At the time of writing the bit manipulation architecture for 64-bit is not defined, so no builtin functions are specified.
Applicability. 32-bit cores.
- result:
rD
- loc:
Imm(rs1)
(Imm
andrs1
determined by the compiler).
Generated assembler:
cv.elw rD,Imm(rs1)
Applicability. 64-bit cores.
At the time of writing event load word for 64-bit is not defined, so no builtin functions are specified.
The C API header files we need for the CORE-V ISA extensions should contain the additions brought by the ISA extensions to the C language: the compiler intrinsics and their default attributes.
As described in the RISC-V Non-ISA Secifications about the C API headers
- RISC-V header files that enable intrinsics require the prefix
riscv_
(e.g.riscv_vector.h
orriscv_crypto.h
). - RISC-V specific intrinsics use the common prefix
__riscv_
to avoid namespace collisions. - The intrinsic name describes the functional behaviour of the function. In case the functionality can be expressed with a single instruction, the instruction's name (any '.' replaced by '
_
') is the preferred choice.- Note, that intrinsics that are restricted to RISC-V vendor extensions need to include the vendor prefix (as documented in the RISC-V toolchain conventions).
- If intrinsics are available for multiple data types, then function overloading is preferred over multiple type-specific functions.
- If an intrinsic function has parameters or return values that reference registers with XLEN bits, then the data type
long
should be used. - In case a function is only available for one data type and this type cannot be derived from the function's name, then the type should be appended to the function name, delimited by a
_
character. Typical type postfixes are32
(32-bit),i32
(signed 32-bit),i8m4
(vector register group consisting of 4 signed 8-bit vector registers).
Any compiler specific implementation of the instrinsic must be wrapped into an interface as described above. Where these intrinsic functions are simple wrappers to the compiler specific implementations it is encouraged to use compiler attributes where available to inline and remove these wrappers from the call tree.
E.g.:
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__))
The first attribute makes sure that the wrapper functions are inlined, hence leaving only the enclosed implementation in the compiled code. The second attribute removes the possibility to have debug information that still refers to the wrapper. This could create issues when using a debugger.
RISC-V instrinsics examples:
long __DEFAULT_FN_ATTRS __riscv_clz_32 (long rs) { // Count leading zeroes: clz rd, rs
return __builtin_riscv_clz_32(rs);
}
long __DEFAULT_FN_ATTRS __riscv_clmul (long a, long b) { // Carry-less multiply: clmul rd, rs1, rs2
long __res;
__asm__("clmul %0,%1,%2\n" : "=r"(__res) : "r"(a), "r"(b) :);
return (__res);
}
CORE-V instrinsics examples:
/* As CORE-V intrinsics are vendor specific they must include the vendor prefix in the name */
long __DEFAULT_FN_ATTRS __riscv_cv_alu_addN (long x, long y, uint8_t shft) { // Add and shift right (arithmetical): cv.addN rd, rs1, rs2, Shift
return __builtin_riscv_cv_alu_addN (x, y, shft);
}