Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are the pre-computed zetas in the AVX2 implementation calculated? #15

Closed
caesaretos opened this issue Jun 11, 2020 · 2 comments
Closed

Comments

@caesaretos
Copy link

Could you explain how _ZETAS_EXP and _ZETAS_INV_EXP are calculated in const.c?

@gregorseiler
Copy link
Contributor

Hi Caesar,

The arrays _ZETAS_EXP and _ZETAS_INV_EXP hold the precomputed roots of unity (multiplied with the Montgomery factor) for the AVX2-vectorized NTT implementation. They are similar to the corresponding arrays zetas and zetas_inv in the reference implementation, but differ in several ways. Firstly, their order is different. This is to match the recursion in the AVX2 NTT and the fact that the polynomial coefficients are reordered in later levels so that one always has full vector registers of coefficients that need to be multiplied. Secondly, some of the the roots are repeated several times because it is faster to load full vector registers instead of populating them on the fly with broadcast and shuffle instructions. And lastly, for every root there is a root multiplied by q^-1 mod 2^16 since this is needed for the fast Montgomery reduction described in https://ia.cr/2019/040.

For the GP-script that I've used to generated these arrays, see precomp2.txt.

Best,
Gregor

@caesaretos
Copy link
Author

Hi Gregor,

Thank you for clarifying this.
Now I understand how the constants are generated.

I would like to note that precomp2.txt shows only how to generate _ZETAS_EXP, the inverses _ZETAS_INV_EXP are not included.

Thanks and regards,
Caesar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants