Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastest NTT #1

Closed
Sword-Smith opened this issue May 5, 2021 · 6 comments
Closed

Fastest NTT #1

Sword-Smith opened this issue May 5, 2021 · 6 comments

Comments

@Sword-Smith
Copy link
Member

Sword-Smith commented May 5, 2021

This paper from Microsoft Research seems to contain a pseudo-algorithm for what is probably the optimal NTT implementation.

https://eprint.iacr.org/2016/504.pdf

Another article about a fast NTT:
https://crypto.ethz.ch/publications/files/Seiler18.pdf

@Sword-Smith
Copy link
Member Author

Sword-Smith commented May 5, 2021

With @Munksgaard's help and by using --release the runtime over the field with prime p=167772161 = 5 * 2^25 + 1 has been brought down from 17 seconds on my Dell XPS to 2 seconds on a data set with 2^20 i64 entries.

I have not yet managed to implement the algorithm from Microsoft Research linked above though.

@Sword-Smith
Copy link
Member Author

There is something called "bit-reversal" ordering in the Microsoft Research article that I don't know what means.

@Munksgaard
Copy link
Contributor

Perhaps this is what they mean? https://en.wikipedia.org/wiki/Bit-reversal_permutation

@Sword-Smith
Copy link
Member Author

Sword-Smith commented May 9, 2021

Perhaps this is what they mean? https://en.wikipedia.org/wiki/Bit-reversal_permutation

That makes sense as the input always has length 2^n so such a permutation should be possible. So we need to map the value at index 000->000 and 011->110?

Sword-Smith added a commit that referenced this issue Jun 4, 2021
This allows us to compare the previous N^2 implementation of Lagrange
interpolation with the new NTT-based N*log(N) implementation. As
expected, the N*log(N) runtime scales nearly linearly which is awesome.

Our NTT implementation can still be made much more efficient though, as
noted in issue #1.
Sword-Smith added a commit that referenced this issue Jul 18, 2021
This allows us to compare the previous N^2 implementation of Lagrange
interpolation with the new NTT-based N*log(N) implementation. As
expected, the N*log(N) runtime scales nearly linearly which is awesome.

Our NTT implementation can still be made much more efficient though, as
noted in issue #1.
@Sword-Smith
Copy link
Member Author

@Sword-Smith
Copy link
Member Author

Closed by 524f54e and associated commits. We should have a pretty fast implementation now and NTT is nowhere near being a bottle-neck for any of our STARKs (Rescue Prime, Brainfuck, or Triton VM), so there's no reason to dwell more on this I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants