UPDATE: 2020-04-10 Markku-Juhani O. Saarinen firstname.lastname@example.org
Switches to use NIST SP 800-185 TupleHash and TupleHashXOF for better domain separation.
UPDATE: 2020-01-27 Markku-Juhani O. Saarinen email@example.com
This release updates the parameters, removes some the "public key encryption" wrappers (this is KEM only) and alternative symmetric algorithms, and makes CCA variants directly available as KEMs.
2019-03-04 Markku-Juhani O. Saarinen firstname.lastname@example.org
A self-contained version of Round5 post-quantum algorithms for embedded platforms. This heavily modified fork is NOT OFFICIAL -- but is testvector-compatible with the NIST submission. Round5 (in a slightly different form) is a currently a 2nd round NIST PQC standardization candidate.
For a summary of performance and code size of all supported variants on Cortex M4, see the benchmarks page. The raw benchmark data is at stm32f4/bench.txt. A very brief report rep.pdf is also available, which additionally offers RISC-V hardware-software numbers (running essentially the same code, but with hardware drivers for SHA3 and ring arithmetic).
Otherwise the variants supported are the same as in the "submission":
|Sec||NIST 1||NIST 3||NIST 5||Type||FEC|
- Sec: CPA or CCA Security. The NIST 1/3/5 security classes indicate equivalent of 128/192/256 - bit classical security (quantum security is required to be at least half of that).
- Type: whe ring-structured or general lattice.
- FEC: Forward error correction. If used the number indicates how many bits are guaranteed to be corrected.
Two special cases are supported:
which operate in the ring setting and have 2/4 - bit error correction.
This version does not support
R5N1_3CCA_0smallCT which has very large
(165kB) public keys and requires some special implementation techniques.
Constant Time Option
EXPERIMENTAL 2019-09-23 mjos
I've added a "truly constant time" option to this implementation; this is
enabled with the
ROUND5_CT compile-time flag and currently applies only for the
These instructions were created and tested on an Ubuntu 18.04 LTS system.
./test/testkat.sh will compile all targets on the local system
and verify the (sha256 hashes of) KAT outputs against known good ones
contained in file
test/good.kat. The script launches all kat generation
threads at once (fast testing if you have a lot of cores in your system).
For performance testing on your native system, you can use the
test/speed.sh script. It also computes the simple "tv" test vector
checksums that our embedded test programs display.
Emulation: ARM, MIPS, POWERPC, etc
QEMU on Linux allows transparent execution of foreign binaries of many
targets, including 32 and 64 bit MIPS, POWERPC, SPARC, and ARM. This works by
binfmt_misc module recognising the executable architecture
from the binary elf headers and lauching a QEMU interpreter to execute it.
The interpreter translates all of the system calls made by the application
to host machine system calls; the emulated binary can therefore access files
and services on the host machine transparently. The executables must be
You need to install at least
to enable the feature. You may use
update-binfmts --display to display all
of the available interpreters.
You can comment relevant cross-compiler lines in
to check the portability of the code. The cross compilers must be separately
installed, but many are available as standard Ubuntu/Debian packages;
apt install gcc-arm-linux-gnueabihf suffices for the 32-bit emulated ARMv7
target, for example.
We have used this feature to verify that
r5embed works correctly on
big-endian systems (MIPS). It is also noteworthy that since ARMv7-a
instruction set is basically a superset of the ARMv7-m instruction set used in
(Cortex M3/M4) embedded devices, we are able to directly compile and test
our ARMv7 optimizations by using the
arm-linux-gnueabihf-gcc cross compiler
apt install gcc-multilib-arm-linux-gnueabihf) and compiling with flags
However, this type of emulation is not cycle accurate and therefore not directly useful for performance testing.
Cortex-M4: Compiling and running Round5 on the STM32F407 Discovery board
We use a very similar, but somewhat simpler set-up than the
PQM4 project. The system requirements are
basically the same (with the exception that we do not use or require Python).
If you don't care about public key encryption (just KEM),
you should be able to compile
PQM4 fairly easily; just
remember to define the
ARMV7_ASM macro to enable ARMv7-m assembler
optimizations (up to 50% faster).
For this interface the wires are defined as: Red +5V, Black GND, Green TXD, White RXD. Connect the green TXD to PA3 and white RXD to PA2 on the board. The black GND and red +5V wires can be left unconnected. Like this:
In our system the ST-LINK programming interface appears as
the USB serial interface is
/dev/USB0. You will have to manually change
the scripts if you have other names for these hardware interfaces.
You'll need at least
git to fetch the source code,
make and some other
standard software development tools (
build-essential), and the
arm-none-eabi cross compiler toolchain to compile to code.
For the cross compiler you can either install the standard Ubuntu/Debian
gcc-arm-none-eabi package, or use the distribution provided by
We use the ARM toolchain version ourselves.
To actually program the ST Discovery board we are using stlink, which we compiled from source.
Building the OpenCM3 firmware
You must first initialize and build the OpenCM3 firmware, which is included as a submodule in this distribution. This is something that needs to be done only once. From the beginning:
$ git clone email@example.com:round5/r5embed.git $ cd r5embed $ git submodule init $ git submodule update $ cd libopencm3 $ make [..] $ cd ..
If you don't have the submodule set up you can just clone
https://github.com/libopencm3/libopencm3.git under the
r5embed directory and build it.
Compiling and executing Round5 variants
Now we can preceed to build the test harness and
themselves. For example:
$ cd stm32f4 $ make ROUND5=R5ND_3CPA_5d run [.. after compiling, flashing, few seconds for measurement:] .. #R5ND_3CPA_5d R5ND_3CPA_5d kilo cycles KG #784 R5ND_3CPA_5d kilo cycles Enc #1081 R5ND_3CPA_5d kilo cycles Dec #396 R5ND_3CPA_5d kilo cycles KEX #2262 R5ND_3CPA_5d stack bytes KG #5550 ..
Makefile expects ROUND5 variable to be defined as one of the variants
given above. The
run argument will also directly flash the implementation
and dump the output.
If you want to just run it again, you can invoke the
dump.sh shell script
and press the black reset button on the board. This will restart the program
on the board and its output will be displayed on console.
Our STTY setup defines ascii 4 (ctrl-D) as EOF -- this causes the dump to exit after output from the board is finished. Otherwise the serial protocol is standard 115200N81 (115200 baud, 1 stop bit, no parity).
run_bench.sh benchmarks all variants and writes it to
run_bench.log. It uses
codesize.sh, a ridiculously hacky script that
computes the relevant code size by summing up the sizes of all functions
xe in their name.