Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zvk vector crypto support (v5) #1303

Merged
merged 10 commits into from
Jun 20, 2023

Conversation

egouriou-rivos
Copy link
Contributor

@egouriou-rivos egouriou-rivos commented Apr 3, 2023

[Updated 2023-06-02]

This patch series implements Zvk at version Version v0.9.7, 31 May, 2023, from https://github.com/riscv/riscv-crypto/tree/master/doc/vector:

  • Zvbb
  • Zvbc
  • Zvkg
  • Zvkned
  • Zvknha / Zvknhb
  • Zvksed
  • Zvksh
    no-instruction extension (only accepted as part of the ISA string, no-op otherwise):
  • Zvkt
    and combo extensions (expanded to the underlying extensions):
  • Zvkn
  • Zvknc
  • Zvkng
  • Zvks
  • Zvksc
  • Zvksg

Some Zvk instructions conflict with instructions part of Zvpn, part of the proposed packed SIMD extensions. The conflicting instructions are listed in the overlapped file and only enabled when their extension is declared. Extension in conflicts are detected in isa_parser.cc, triggering errors.

The implementation of element groups does not support WORDS_BIGENDIAN correctly. An error is emitted when WORDS_BIGENDIAN is enabled together with extensions using element group accesses.

@egouriou-rivos
Copy link
Contributor Author

@aswaterman Andrew, this is my first PR in this repository. As such I am quite unfamiliar with the review process. Don't hesitate to point me in the right direction. Thanks.

Copy link
Collaborator

@scottj97 scottj97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the details of the instructions, but generally I'd say this uses too many macros when compilable C++ code would suffice. (The existing instructions also overuse macros, IMO; let's try to improve that, not make it worse.)

@aswaterman is there a good example of another extension that uses custom data types and/or helper functions instead of macros?

riscv/encoding.h Outdated Show resolved Hide resolved
riscv/encoding.h Outdated Show resolved Hide resolved
riscv/riscv.mk.in Show resolved Hide resolved
Comment on lines +193 to +22
do { \
require_vector(true); \
require_extension(EXT_ZVBB); \
} while (0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Zvbb requires V, that should be enforced in isa_parser.cc. It does not need to be enforced here in every instruction. (Same comment for all of this family of macros; it looks like none of these macros are needed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point here is to raise an exception when an instruction is executed without proper support. I am not that familiar with spike, but I would imagine require_extension(EXT_ZVBB); is required to raise the exception if the instruction is not supported. The dependency on V (which may be V or Zve* for Zvbb I think) should indeed be handled when the ISA support is initialized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The require_vector(true) checks sstatus.VS and vtype.vill. So, even though the ISA parser should enforce the extension dependency, this code (or something similar) is still necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The require_vector(true) checks sstatus.VS and vtype.vill. So, even though the ISA parser should enforce the extension dependency, this code (or something similar) is still necessary.

Good point. Should those checks be part of require_extension(EXT_ZVBB), do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the current formulation is OK.

The only problem with it, I think, is that it assumes Zvbb requires V. This isn't strictly true; Zvbb only requires a subset of V: Zve64x, IIRC. But since Spike doesn't support any of the V subsets currently, that is not a problem currently.

Furthermore, if Spike is augmented to support Zve64x etc., then essentially all of the vector instructions will need to be retooled in the same manner as this one. In other words, this approach doesn't create a new type of technical debt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion. I updated the documentation of those new require_xyz macros to state

// Ensures that the ... extension is present,
// and the vector unit is enabled and in a valid state.```

This captures the semantics of require_vector(true) more clearly than the previous "and that vector is present".

Comment on lines 256 to 260
// Ensures that the vector instruction is not using a mask.
#define require_no_vmask require(insn.v_vm() == 1)

// Ensures that the current VSEW is the given N number of bits.
#define require_vsew(N) require(P.VU.vsew == (N))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are so trivial and self-expanatory that I'd say they should just be inlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for require_vsew() and got rid of them.

I have a stronger attachment to require_no_vmask as I find it a lot clearer than require(insn.v_vm() == 1), especially since VM is a highly overloaded acronym and "VM == 1" meaning "vector mask is not used" is very counter-intuitive.
It is a case where someone intimate with the vector spec may find the "insn.v_vm() == 1" very clear but newcomers or more casual contributors will either misinterpret or need to go back to the spec to understand. When I wrote that code I very much needed the more explicit semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you still want me to remove the require_no_vmask.

// containing element groups of 8 uint32_t values (EGW=256, EEW=32, EGS=8),
// *ignoring* the current SEW that applies to the vectors.
//
// IMPORTANT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this info important? Isn't it obvious from reading the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this documentation is for most readers to not have to read the code. I also want to express what semantics are intended, vs. what the code is actually encoding.

The V macros are inscrutable for multiple reasons. For me the main one is the lack of documentation. One has to expand the macros level-by-level to figure what the uses of those macros is achieving (and even then...). I want readers of the Zvk code to only have to read the macro documentation to get an 80% understanding of what is happening.

riscv/zvk_ext_macros.h Outdated Show resolved Hide resolved
// into the (mutable) variables named by the W arguments, provided in
// "Little Endian" (LE) order, i.e., from the least significant (W0)
// to the most significant (W3).
#define EXTRACT_EGU32x4_WORDS_LE(X, W0, W1, W2, W3) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the macros seem like they would be better off as methods on a class EGU32x4_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 100% with you on the desire to reduce macro usage. I wish I had known that when I started writing that code, instead of following the vector idioms.

EGU32x4_t ends up being std:array<uint32_t, 4>, and using instance methods is not going to work as-is. We can use freestanding methods. We could hide the std::array<> as an implementation detail of a class but it will require forwarding quite a few operations.

@scottj97
Copy link
Collaborator

scottj97 commented Apr 3, 2023

Disabling P extension doesn't sound good. I'm pretty sure this is not the first time we've had conflicting opcodes, though I don't remember how we dealt with it last time.

@aswaterman
Copy link
Collaborator

@scottj97 As a very small example, adding the various xperm instructions was accomplished by adding a new function to arith.h: https://github.com/riscv-software-src/riscv-isa-sim/pull/629/files

I fully agree that, in cases where we can use functions, possibly with templates, rather than macros, we should. Although Spike is full of examples to the contrary, we are trying to move away from that style.

@aswaterman
Copy link
Collaborator

As for handling overlapping opcodes, the problematic ones need to be added to the overlap list; here's an example: 087626c#diff-19e2a333b37ddb8e4a5ec7706ce201da3561f4ff223756fa8b9e692ea4679a8fR10

riscv/insns/vaesef_vv.h Outdated Show resolved Hide resolved
riscv/insns/vaesef_vs.h Outdated Show resolved Hide resolved
riscv/insns/vaesdm_vv.h Outdated Show resolved Hide resolved
riscv/insns/vaesdm_vs.h Outdated Show resolved Hide resolved
riscv/insns/vwsll_vi.h Outdated Show resolved Hide resolved
riscv/insns/vwsll_vv.h Outdated Show resolved Hide resolved
riscv/insns/vwsll_vx.h Outdated Show resolved Hide resolved
@scottj97
Copy link
Collaborator

scottj97 commented Apr 5, 2023

Also please fix the CI failure:

/home/runner/work/riscv-isa-sim/riscv-isa-sim/ci-tests/../riscv/insns/sm4_common.h:29:23: error: unused variable ‘ck’ [-Werror=unused-variable]
   29 | static const uint32_t ck[32] = {
      |                       ^~

@scottj97
Copy link
Collaborator

scottj97 commented Apr 5, 2023

@egouriou-rivos can you tell us how you've tested this?

@charmitro
Copy link

Just a heads-up for anyone following this, as this is needed for ACT testing.

The SAIL implementation is written for Vector Crypto Draft v20230303 which leads to incorrect ACT testing.

In order to have some(not all Extensions are possible) valid results between DUT(Spike) & Sail Signatures you'd have to apply a fair amount of workarounds so the ACT tooling can match the correct Extensions between Spike <-> SAIL.

Successful Test reports could be generated only for the following extensions:

  • Zvkg
  • Zvksed
  • Zvksh
  • Zvkned

Failing Test reports, with the reason, are:

  • Zvkb - Naming conventions and instructions included in extension changed in newer version
  • Zvknhab - Not able to run at all with the ACT tooling. Even executing spike $(which pk) --varch=vlen:256,elen:32 --isa=rv64imfdcvzvknha_zvknhb a.out doesn't seem to work. LMK if this is a user error on my side.

@egouriou-rivos
Copy link
Contributor Author

@scottj97 As a very small example, adding the various xperm instructions was accomplished by adding a new function to arith.h: https://github.com/riscv-software-src/riscv-isa-sim/pull/629/files

I fully agree that, in cases where we can use functions, possibly with templates, rather than macros, we should. Although Spike is full of examples to the contrary, we are trying to move away from that style.

I am overjoyed to read this, as getting into the vector code for this extension was quite frustrating due to those layers of (unhygienic) macros and the utter lack of documentation of their intent. Unfortunately the Zvk implementation strove to fit within the existing idioms we found, with the exception of (over-)documenting.

I will not have the time to perform the suggested cleanup until later this year. I feel that such a cleanup needs to involve the V instructions and will represent a significant effort. As such, I would like to see it decoupled from landing the Zvk support.

@egouriou-rivos
Copy link
Contributor Author

Also please fix the CI failure:

/home/runner/work/riscv-isa-sim/riscv-isa-sim/ci-tests/../riscv/insns/sm4_common.h:29:23: error: unused variable ‘ck’ [-Werror=unused-variable]
   29 | static const uint32_t ck[32] = {
      |                       ^~

This should be fixed now.

@egouriou-rivos
Copy link
Contributor Author

I pushed updated commits with the following changes:

  • Fixed vwsll's shift amount mask to use 2*SEW bits (vs SEW previously), matching a spec pseudo-code fix.
  • Fixed repeated typos mentioned above
  • Deleted the require_vsew macro, per Scott's suggestion
  • Moved the SM4 ck array from sm4_common.h to the only use location, vsm4k.vi, resolving a compiler warning in CI
  • Manually fixed the URL in encoding.h to point to the github.com/rivosinc/riscv-opcode source
  • Removed useless "do { } while (0)" wrapping the BODY blocks, per Scott's review

With this, the PR is up to date with the current specification, which should allow for the spec to move into freeze for this deliverable.

Issues I intend to address in upcoming versions:

I would like to defer the cleanup from C-style macros to more idiomatic C++ to subsequent PRs. As discussed above, this PR was built over many months following the example of the V extension which was considered idiomatic. I wish I had checked when that work started.

@egouriou-rivos
Copy link
Contributor Author

One issue I am aware of and would like recommendation on: the element-group accessor vectorUnit_t::elt_grp<>(...) is currently broken when WORDS_BIGENDIAN is defined. I have a hard time mapping big-endian to the element groups and the vector register layout.

vectorUnit_t::elt<>(...) contains the following logic to account for BIGENDIAN:

#ifdef WORDS_BIGENDIAN
  // "V" spec 0.7.1 requires lower indices to map to lower significant
  // bits when changing SEW, thus we need to index from the end on BE.
  n ^= elts_per_reg - 1;
#endif

I worry that no such trick will allow us to fix elt_grp to work with BIGENDIAN. I may need to access the elements separately to reproduce a similar effect instead of performing a contiguous mapping to a std::array.

@egouriou-rivos
Copy link
Contributor Author

@egouriou-rivos can you tell us how you've tested this?

Most of the validation for these changes have been done by running the tests / proof-of-concept code in riscv/riscv-crypto#310. Thanks to the NIST and SM test vectors I am pretty confident in the soundness of the crypto logic itself (mistakes show up quickly). In the absence of a SAIL model and ACT support for vector I did not automate validation that the Spike logic matches the spec pseudo-code, but Ken and I spent quite a bit of time looking for discrepancies, especially related to endianness issues. I am more wary of issues related to details of the RISC-V vector specification, or my misunderstandings of Spike internals.

@egouriou-rivos egouriou-rivos changed the title Zvk vector crypto support (v1, RFC) Zvk vector crypto support (v2, RFC) May 2, 2023
@egouriou-rivos egouriou-rivos changed the title Zvk vector crypto support (v2, RFC) Zvk vector crypto support (v3) May 4, 2023
@egouriou-rivos
Copy link
Contributor Author

I uploaded a new version of this series. Differences from v2:

  • Properly (?) dealt with conflicts between P (Zpn) and Zvk instructions, removing the need to disable P in riscv.mk.in,
  • Imported encodings from the official riscv-opcodes repository where those landed in master/ (riscv/riscv-opcodes@199e646f2a8e36),
  • Added handling of combo extensions Zvkng/Zvksg and no-op Zvkt,
  • Fixed one pre-existing typo in isa_parser.cc

As mentioned above, the one known issue is the handling of WORDS_BIGENDIAN. I'd appreciate if people familiar with big-endian and the vector unit register layout could give some thought as to whether elt_group<>(...) is implementable for big-endian or if we need to change that interface.

@egouriou-rivos
Copy link
Contributor Author

I uploaded a new version of this series (v4). The only difference from the previous version is in 'isa_parsing.cc' in the commit titled 'Zvk extension parsing':

  • Removed Zvbc from the combo extensions Zvkn, Zvks
  • By transitive closure, removed Zvbc from the combo extensions Zvkng, Zvksg

This matches the changes that went into the specification in version 0.9.5, 2023-05-09
(commit riscv/riscv-crypto@49a2d3e, release https://github.com/riscv/riscv-crypto/releases/tag/v20230509)

@egouriou-rivos egouriou-rivos changed the title Zvk vector crypto support (v3) Zvk vector crypto support (v4) May 19, 2023
@egouriou-rivos
Copy link
Contributor Author

No worries about the delay, and thanks again for the reviews and suggestions.

I rebased on top of current /master. No changes to the commits, only resolving merge issues.

I regenerated riscv/encoding.h from riscv-opcodes /master (3ca60c5). Note that the previous encoding file was generated from a commit (8d70e77) that I could not find in the riscv-opcodes upstream repository. It is not clear to me what else we are picking up between those two versions.

@aswaterman
Copy link
Collaborator

Thanks! There's a new batch of CI failures, but it looks like just a missing #include.

@egouriou-rivos egouriou-rivos force-pushed the zvk-vector-crypto branch 3 times, most recently from 4fa1262 to 57ece8d Compare June 19, 2023 00:43
@egouriou-rivos
Copy link
Contributor Author

I uploaded a new patch series against the new /master (7e50839) that addresses a couple of issues:

  • missing include in arith.h, linked to the use of std::is_unsigned<> from <type_traits>.
  • warning about signed/unsigned comparison in vcpop_v.h .

I missed both issues as I spend most of my time compiling macOS with Clang. The issues showed up when using GCC on Linux. Hopefully this will please the CI gods.

@aswaterman aswaterman dismissed scottj97’s stale review June 19, 2023 18:30

I believe the issues @scottj97 raised have been addressed or are now moot.

@egouriou-rivos
Copy link
Contributor Author

Checking those CI failures. I probably need to annotate some unused variables from the V macros, or derive new loop macros that do not declare those variables in the first place. I'll also rebase.

@aswaterman
Copy link
Collaborator

Feel free to deploy the UNUSED macro liberally.

@aswaterman
Copy link
Collaborator

Also, note I just merged #1364, which bumped encoding.h to the same version you've just bumped it to.

egouriou-rivos and others added 10 commits June 19, 2023 14:28
The previous order lacks any obvious logic. Alphabetical order,
while making it difficult to create interesting groupings,
makes it easy to find which extensions are compiled in.

Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Zvk is the short name for the Vector Cryptography Instruction
Set Extension Specification being defined at
<https://github.com/riscv/riscv-crypto/tree/master/doc/vector>.

This commit adds support for parsing/enabling the Zvk extensions
(Zvbb, Zvbc, Zvkg, Zvkned, Zvknha, Zvknhb, Zvksed, Zvksh, Zvkt)
and the combo extensions (Zvkn, Zvknc, Zvkng, Zvks, Zvksc, Zvksg).

This is an early commit in a series implementing Zvk. No instructions
are actually defined here, only infastructure that will
support the coming extensions.

The encodings for Zvk instructions have some conflicts with Zpn
encodings. This commit marks those Zpn instructions as overlapping,
and adds checks to error out if conflicting extensions are enabled.

Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Introduce types and macros useful across multiple Zvk sub-extensions,
including Zvbb and Zvbc. Those will be used by upcoming
per-sub-extension commits.

In particular we introduce "Element Group" types and loop macros handling
those element groups. The concept of element group is described in
<https://github.com/riscv/riscv-crypto/blob/master/doc/vector/riscv-crypto-vector-element-groups.adoc>.

Note that the element group access method is not implemented
for WORDS_BIGENDIAN setup. As such, isa_parser.cc is modified to emit
an error when WORDS_BIGENDIAN is defined and extensions using element
groups are enabled.

Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the proposed instructions in Zvbb:
 - vandn.{vv,vx}, vector bitwise and-not
 - vbrev.v, vector bit reverse in element
 - vbrev8.v, vector bit reverse in bytes
 - vrev8.v, vector byte reverse
 - vctz.v, vector count trailing zeros
 - vclz.v, vector count leading zeros
 - vcpop.v, vector population count
 - vrol.{vv,vx}, vector rotate left
 - vror.{vi,vv,vx}, vector rotate right
 - vwsll.{vi,vv,vx} vector widening shift left logical

A new instruction field, 'zimm6', is introduced, encoded
in bits [15, 19] and [26].. It is used by "vror.vi" to encode
a shift immediate in [0, 63].

Co-authored-by: Raghav Gupta <rgupta@rivosinc.com>
Co-authored-by: Stanislaw Kardach <kda@semihalf.com>
Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the Zvbc instructions
- vclmul.{vv,vx}, vector carryless multiply low
- vclmulh.{vv,vx}, vector carryless multiply high

Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the proposed instruction in Zvkg, vghmac.vv,
Vector Carryless Multiply Accumulate over GHASH Galois-Field.

The instruction performs one step of GHASH routine as described
in "NIST Special Publication 800-38D" a.k.a the AES-GCM specification.
The logic was written to closely track the pseudo-code
in the Zvk specification.

Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Co-authored-by: Kornel Duleba <mindal@semihalf.com>
Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the instructions part of the Zvknha and Zvknhb
sub-extensions:
 - vsha2ms.vv, message schedule
 - vsha2ch.vv / vsha2cl.vv, compression rounds

A header files for common macros is added.

Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the Zvkned extension, "NIST Suite: Vector AES Encryption
& Decryption (Single Round)".
 - vaeskf1.vi: AES forward key scheduling, AES-128.
 - vaeskf2.vi: AES forward key scheduling, AES-256.
 - vaesz.vs: AES encryption/decryption, 0-th round.
 - vaesdm.{vs,vv}: AES decryption, middle rounds.
 - vaesdf.{vs,vv}: AES decryption, final round.
 - vaesem.{vs,vv}: AES encryption, middle rounds.
 - vaesef.{vs,vv}: AES encryption, final round.

An extension specific header containing common logic is added.

Co-authored-by: Stanislaw Kardach <kda@semihalf.com>
Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the Zvksed sub-extension, "ShangMi Suite: SM4 Block Cipher":
 - vsm4k.vi, vector SM4 key expansion,
 - vsm4r.{vs,vv}, vector SM4 rounds.

This also introduces a header for common vector SM4 logic.

Co-authored-by: Raghav Gupta <rgupta@rivosinc.com>
Co-authored-by: Albert Jakieła <aja@semihalf.com>
Signed-off-by: Eric Gouriou <ego@rivosinc.com>
Implement the Zvksh sub-extension, "ShangMi Suite: SM3 Hash
Function Instructions":
 - vsm3me.vv, message expansion,
 - vsm3c.vi, compression rounds.

This also introduces a SM3 specific header for common logic.

Co-authored-by: Raghav Gupta <rgupta@rivosinc.com>
Co-authored-by: Albert Jakieła <aja@semihalf.com>
Co-authored-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Eric Gouriou <ego@rivosinc.com>
@egouriou-rivos
Copy link
Contributor Author

I uploaded a new version. It took me a little while to reproduce the failures.

Changes since last version:

  • rebased on /master at 8b10de6, removing the need for commit updating the encodings;
  • Added a new VI_V_ULOOP macro in v_ext_macros.h for instructions of the form <op> vd, vs2. Adopted it for all Zvbb ".v" instructions that used to use VI_VX_ULOOP without using the rs1 argument;
  • Fixed vaesem_vv.h to use the VI_ZVK_VD_VS2_... macro instead of VD_VS1_VS2 which triggered the unused variable warning/error. All other Zvkned instructions already used that macro;
  • Removed the declaration of the vsew variable in the Zvk macros. None of those were used;
  • Added UNUSED in a few places in vsm3c_vi.h (Zvksh) and the three vsha2... instructions. The unused variables are declared when invoking a macro extracting elements from an element group. I would rather mark them UNUSED than have to define partial-extraction macros.

It would have been easier to sprinkle more UNUSED, but I appreciated the warnings emitted in the CI build. They did not reveal significant issues but some sloppiness.

With that I was able to build cleanly on macOS+Clang and Linux+GCC when adding the same warnings/errors that are used in the CI.

@aswaterman
Copy link
Collaborator

Thanks for working through the remaining issues, @egouriou-rivos. Time to pull the trigger.

@aswaterman aswaterman merged commit 5731a47 into riscv-software-src:master Jun 20, 2023
3 checks passed
@egouriou-rivos
Copy link
Contributor Author

Woohoo! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants