Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack optimizations and refactoring of NTT-based Saber #181

Merged
merged 14 commits into from Feb 18, 2021
Merged

Stack optimizations and refactoring of NTT-based Saber #181

merged 14 commits into from Feb 18, 2021

Conversation

mvanbeirendonck
Copy link
Contributor

This pull request adds stack optimizations for the NTT-based Saber and thoroughly refactors the Saber Cortex-M4 code. Detailed info is in the first commit message (f390aea). Stack optimizations include some help from @vincentvbh.

Stack usage should reduce by ~5x (a little less for LightSaber), performance improves a tiny bit. Benchmarks should still be recreated on your setups, I didn't add them yet.

Let me know if you have any questions or need help merging!

Cheers,
Michiel

mvanbeirendonck and others added 13 commits December 8, 2020 16:01
…TT-based Saber.

Firstly, this commit merges improvements between different Saber implementations.

1) For round 3, the Saber reference code was thoroughly refactored and the codebase reduced [https://github.com/KULeuven-COSIC/SABER]. These changes are now integrated into the m4 code.

2) All unnecessary modular reductions have been removed. The only modular reductions are now in the packing functions.

3) Packing/unpacking functions are simplified [PQClean, commit f8503cb].

4) The secret-key is stored in compressed format [ia.cr/2020/268, Section 4.1]. This reduces the secret-key size, and the packing/unpacking functions are faster. (This requires a fix in pqm4’s testvectors.c, as the secret-key is checked against the one produced by PQclean).

5) During re-encryption, the verification of the ciphertext is performed in place [ia.cr/2020/268, Section 4.2].

6) Use symlinks for Light/FireSaber to make (minimal) differences with Saber more clear.

Secondly, this commit implements some optimizations and reduces the memory footprint of the NTT-based multiplication.

1) Saber does not require any modular reduction apart from bitstream packing. Elements can be kept in int16_t (central-reduced) format.

1.a) The secret-key is sign-extended from 4-bit to 16-bit when unpacked.
1.b) The vectors b and b' are sign-extended from 10-bit to 16-bit when unpacked.
1.c) 1.a and 1.b allow to remove NTT_pk (with central reduction) and use NTT (without central reduction) uniformly.
1.d) NTT_inv and NTT_inv_inner include a final step that converts from int16_t back to mod_p or mod_q. This is not necessary and removed.

2) During encryption, the NTT of s' is only computed once and reused between A*s' and b*s'.

3) Some just-in-time memory optimizations of [ia.cr/2018/682, Section 2.2] are implemented for the NTT-based multiplication. Polynomial vectors are generated from their seed just-in-time, converted to NTT domain, and pointwise multiplied. The next polynomial vectors can reuse all the buffers.

The idea is to extend this from polynomial vectors to individual polynomials. This still requires a new my_mul function.

For {Fire,Light}Saber (keygen/encaps/decaps) the resulting implementation is approximately (2.3-2.6%/4.7-5.5%/7.4-9.5%) faster and uses (27-36%/47-61%/49-62%) less dynamic memory than the current version in pqm4.
and comment out non-stack-optimized (very slightly faster) functions
shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES.

Introduced a few unions to overlap memory.
shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES.

Introduced a few unions to overlap memory.
@mkannwischer
Copy link
Contributor

LGTM! Thank you very much @mvanbeirendonck and also @vincentvbh!
I've added the benchmarks.
I heard rumours that @vincentvbh has something even smaller by now, but it will take a little longer until that is ready, so I'll merge this already.

@mkannwischer mkannwischer merged commit 992f0f2 into mupq:master Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants