New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RISC-V: GCM: Implement .ghash() #20078
Conversation
251807a
to
91543f9
Compare
Updated PR to address CI errors. |
91543f9
to
54a96e3
Compare
Updated PR for yet another CI error (before we had warnings because of dynamic stack arrays, now zero-initializers were not allowed). I've already sent the ICLA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Just a few small items.
I've not reviewed the RISC-V assembly, I'm not well enough acquainted with the architecture to do this.
54a96e3
to
d668395
Compare
Updated the PR:
|
d668395
to
8f2ec58
Compare
8f2ec58
to
eb65344
Compare
The build was failing because of the The error message was:
I've updated the PR to fail gracefully if the package |
It would be nice to get some independent review from someone who knows RISC-V assembly. |
Ok, understood. I will try to find someone in the RISC-V community to do this review. |
eb65344
to
e385f3b
Compare
Updated the PR:
|
Note that a specification change for GCM is in progress: riscv/riscv-crypto@19d044f We will update this PR (over the course of the next days) as soon as the specification has settled. |
At what point should we expect the standardisation to be completed? OpenSSL generally doesn't accept algorithms before they are published as a national or international standard. |
Once the standard goes to the 45-day public review (called the "FREEZE"-milestone in RISC-V parlance), no semantic changes are expected. This is the point when the Linux kernel, GCC, etc. accept changes as well. However, to make it to FREEZE, prototype patches against upstream projects need to exist (not merged, but publicly available) and have been reviewed to ensure that the proposed extension can be supported in software (i.e., as an end-to-end proof-of-concept requirement). Just a side-note: the current changes to the ghash instruction (creating a second instruction that is more gmult-like) have been triggered by the experience with exactly this pull-request. |
This pull request is ready to merge |
Merged, thanks for the contribution. |
In RISC-V we have multiple extensions, that can be used to accelerate processing. The known extensions are defined in riscv_arch.def. From that file test functions of the following form are generated: RISCV_HAS_$ext(). In recent commits new ways to define the availability of these test macros have been defined. E.g.: #define RV32I_ZKND_ZKNE_CAPABLE \ (RISCV_HAS_ZKND() && RISCV_HAS_ZKNE()) [...] #define RV64I_ZKND_ZKNE_CAPABLE \ (RISCV_HAS_ZKND() && RISCV_HAS_ZKNE()) This leaves us with two different APIs to test capabilities. Further, creating the same macros for RV32 and RV64 results in duplicated code (see example above). This inconsistent situation makes it hard to integrate further code. So let's clean this up with the following steps: * Replace RV32I_* and RV64I_* macros by RICSV_HAS_* macros * Move all test macros into riscv_arch.h * Use "AND" and "OR" to combine tests with more than one extension * Rename include files for accelerated processing (remove extension postfix). We end up with compile time tests for RV32/RV64 and run-time tests for available extensions. Adding new routines (e.g. for vector crypto instructions) should be straightforward. Testing showed no regressions. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20078)
Move helper functions and instruction encoding functions into a riscv.pm Perl module to avoid pointless code duplication. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20078)
On systems where Devel::StackTrace is available, we can use this module to create more usable error messages. Further, don't print error messages in case of official register aliases, but simply accept them. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20078)
A recent commit introduced a Perl module for common code. This patch changes the GCM code to use this module, removes duplicated code, and moves the instruction encoding functions into the module. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20078)
The existing GCM calculation provides some potential for further optimizations. Let's use the demo code from the RISC-V cryptography extension groups (https://github.com/riscv/riscv-crypto), which represents the extension architect's intended use of the clmul instruction. The GCM calculation depends on bit and byte reversal. Therefore, we use the corresponding instructions to do that (if available at run-time). The resulting computation becomes quite compact and passes all tests. Note, that a side-effect of this change is a reduced register usage in .gmult(), which opens the door for an efficient .ghash() implementation. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20078)
RISC-V currently only offers a GMULT() callback for accelerated processing. Let's implement the missing piece to have GHASH() available as well. Like GMULT(), we provide a variant for systems with the Zbkb extension (including brev8). The integration follows the existing pattern for GMULT() in RISC-V. We keep the C implementation as we need to decide if we can call an optimized routine at run-time. The C implementation is the fall-back in case we don't have any extensions available that can be used to accelerate the calculation. Tested with all combinations of possible extensions on QEMU (limiting the available instructions accordingly). No regressions observed. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20078)
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on openssl/openssl#20078 Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
The current RISC-V GCM implementation does not include a
.ghash()
callback. This is an issue for future improvements (i.e. adding new GCM implementations for the upcoming vector crypto ISA extensions). Therefore, this PR adds this missing piece.Unfortunately, this was a bit more work than expected because the current implementation needed to be replaced by an improved
.gmult()
code sequence to free up registers so the loop in.ghash()
could be realized without stack utilization.During testing, it was observed that no test calls
.ghash()
with more than 1 block (i.e. 16 bytes) of data. Therefore, I wrote a new one. Plus, yet another one which also triggers another code path ingcm128.c
(running into theGHASH_CHUNK
limitation).During the work it was observed that the RISC-V extension test macros deserve some clean-ups to make them easier to read (e.g. extension tests are only relevant for run-time decisions) and extend. The related changes are included in this series.
The
.ghash()
implementation itself is straightforward. It just includes the XOR-loop into thegmult()
sequence.All code has been tested using upstream QEMU.
Test instructions: