Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add stack optimizations. #3

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mvanbeirendonck
Copy link

Hi all,

I created a pull request for pqm4 (mupq/pqm4/pull/181) with the stack optimizations for Saber NTT multiplication. I wanted to ask if you also want to fully include them here, or keep compatibility with the original paper?

Right now this draft pull request has the stack-optimized Saber NTT, but this one has a little different function naming than the Toom-Cook reference implementation that is also included. This messes up the testbenches. I could add an update for the Toom-Cook as well, to uniformize them.

Then there's a second problem that the stack-optimized implementation has no native MatrixVectorMul and InnerProd. The new functions unpack the inputs just-in-time, depending on keygen/encrypt/decrypt. To keep all the benchmarks speed.c, these two functions should be recreated again.

Let me know what you prefer!

Cheers,
Michiel

New function naming and parameters is not compatible with testbenches yet.
@vincentvbh
Copy link
Contributor

For benchmarking MatrixVectorMul and InnderProd, I think we can benchmark MatrixVectorMulKeyPairNTT() and InnerProdDecNTT() with only NTT codes(by creating the version of these two functions in speed.c where all non-NTT operations are removed). I test them locally, and they are only -2%~6% compared to the cycles in our paper.

For Toom-Cook, I think as long as the cycles are consistent with the paper, some adjustment is fine.

@mvanbeirendonck
Copy link
Author

mvanbeirendonck commented Feb 12, 2021

  • Add stack-optimized NTT
  • Add stack-optimized Toom-Cook
  • Recreate MatrixVectorMul and InnerProd without decompressing functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants