-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native Poseidon implementation(s) #207
Conversation
What do you think about having Also I should mention, but we've been using rustfmt lately, so please run |
Yep, that could work. The constants definitely depend on the choice field, so that makes sense. My only hesitation would be from the fact that the ETA: Yep, agree that #209 is an improvement.
Ah, great to know about |
* Have `Field`s implement `PoseidonInterface` Rather than having a sort of "dummy struct" implement `PoseidonInterface` with the field as a generic param. I think this seems more natural and type-safe. The type safety does come at a price -- it would be harder to do dynamic things such as taking `WIDTH` as a command line option -- but I think that's alright. * Fix missed conflicts. * cargo fmt fixes. * Fix to accommodate changes in latest nightly. Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com> Co-authored-by: Hamish Ivey-Law <hamish@ivey-law.name>
#[unroll_for_loops] | ||
fn sbox_layer(state: &mut [Self; WIDTH]) { | ||
for i in 0..WIDTH { | ||
state[i] = Self::sbox_monomial(state[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if sbox_layer
could be even faster if it was written in a more "row oriented" manner? See e.g. winterfell's Rescue implementation.
I suppose if this is really inlined and unrolled, the compiler should be able to interleave things optimally regardless of how we structure the code. I forget if unroll_for_loops
works with const generics though? I wish we could put it on a specific loop and have it fail if it it couldn't unroll it.
(Definitely not a blocker in any case, just thinking out loud.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the link to winterfell's Rescue implementation is what you meant? It's actually just a 'big' exponentiation: powers of the base a computed and combined using an addition chain for the exponent. I don't think that helps us, does it?
I wrote up a version of sbox_layer
that unrolled and interleaved the computation of the state[i]^7 for i = 0..7 in such a way as to give maximum opportunity for multiplication throughput. Unfortunately the performance was the same.
I have had some issues with unroll_for_loops
and const generics in the past, but they generally work okay together. In this case sbox_layer
is not unrolled, but that is (presumably) because the code for calculating x^7 is too long at 4 modular multiplications. Other functions are definitely unrolled, like mds_layer
for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Bizarrely, somehow the performance has changed a bit somewhere between f118464 and 45613c9. The numbers I'm getting now for width=8 are (name/microseconds/slowdown):
So, compared to the numbers in #4, GMiMC is 20% faster, and 'fast' Poseidon is now 10% slower than 'naive' Poseidon (which itself is slightly slower). I think we can just push ahead though, since they're still very good, and we don't want to be optimising for my old laptop anyway! ;) |
Two implementations of Poseidon, for widths 8 and 12: First implementation is 'naive' taken directly from the definition, and the other uses some trickery to do faster partial rounds (described in the paper). Both versions use a specially crafted MDS matrix.
Performance data is tracked in #4. In summary, the 'fast' implementation is faster on width 12, and the two implementations are similar on width 8. Performance is within about 10-15% of GMiMC with width 8.
Scripts required to generate the constants in the 'fast' implementation and the MDS matrix will be stored in a separate repository.