New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack optimizations and refactoring of NTT-based Saber #181
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…TT-based Saber. Firstly, this commit merges improvements between different Saber implementations. 1) For round 3, the Saber reference code was thoroughly refactored and the codebase reduced [https://github.com/KULeuven-COSIC/SABER]. These changes are now integrated into the m4 code. 2) All unnecessary modular reductions have been removed. The only modular reductions are now in the packing functions. 3) Packing/unpacking functions are simplified [PQClean, commit f8503cb]. 4) The secret-key is stored in compressed format [ia.cr/2020/268, Section 4.1]. This reduces the secret-key size, and the packing/unpacking functions are faster. (This requires a fix in pqm4’s testvectors.c, as the secret-key is checked against the one produced by PQclean). 5) During re-encryption, the verification of the ciphertext is performed in place [ia.cr/2020/268, Section 4.2]. 6) Use symlinks for Light/FireSaber to make (minimal) differences with Saber more clear. Secondly, this commit implements some optimizations and reduces the memory footprint of the NTT-based multiplication. 1) Saber does not require any modular reduction apart from bitstream packing. Elements can be kept in int16_t (central-reduced) format. 1.a) The secret-key is sign-extended from 4-bit to 16-bit when unpacked. 1.b) The vectors b and b' are sign-extended from 10-bit to 16-bit when unpacked. 1.c) 1.a and 1.b allow to remove NTT_pk (with central reduction) and use NTT (without central reduction) uniformly. 1.d) NTT_inv and NTT_inv_inner include a final step that converts from int16_t back to mod_p or mod_q. This is not necessary and removed. 2) During encryption, the NTT of s' is only computed once and reused between A*s' and b*s'. 3) Some just-in-time memory optimizations of [ia.cr/2018/682, Section 2.2] are implemented for the NTT-based multiplication. Polynomial vectors are generated from their seed just-in-time, converted to NTT domain, and pointwise multiplied. The next polynomial vectors can reuse all the buffers. The idea is to extend this from polynomial vectors to individual polynomials. This still requires a new my_mul function. For {Fire,Light}Saber (keygen/encaps/decaps) the resulting implementation is approximately (2.3-2.6%/4.7-5.5%/7.4-9.5%) faster and uses (27-36%/47-61%/49-62%) less dynamic memory than the current version in pqm4.
and comment out non-stack-optimized (very slightly faster) functions
shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES. Introduced a few unions to overlap memory.
shake_out was SABER_POLYVECBYTES instead of only SABER_POLYBYTES. Introduced a few unions to overlap memory.
LGTM! Thank you very much @mvanbeirendonck and also @vincentvbh! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request adds stack optimizations for the NTT-based Saber and thoroughly refactors the Saber Cortex-M4 code. Detailed info is in the first commit message (f390aea). Stack optimizations include some help from @vincentvbh.
Stack usage should reduce by ~5x (a little less for LightSaber), performance improves a tiny bit. Benchmarks should still be recreated on your setups, I didn't add them yet.
Let me know if you have any questions or need help merging!
Cheers,
Michiel