perf: speeding up poseidon-permutation
#21
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains some performance fixes in the implementation of
poseidon-permutation. Below are some numbers based on benchmarking and profiling. Related to penumbra-zone/penumbra#1123.From the paper the optimized version of Poseidon reduces the number of multiplications in the partial rounds.
The approximate "5x" number cited in the paper came from a Poseidon instance with t=24, R_F = 8, R_P = 42, so the number of multiplications per permutation was:
t^2 (R_F + R_P) = 24^2 * (8 + 42) = 28,800 multiplications
After the optimization, the number of multiplications in the partial rounds goes from t^2 to 2*t, so after:
t^2 * R_F + 2 * t R_P = 24^2 * 8 + 2 * 24 * 42 = 6,624 multiplications
Which is about a 4.3x reduction in multiplications in the permutation. There are big gains here because the instance is very high t, very high R_P (partial round).
For us testing with the highest width hash we use, 4:1, we have t=5, R_F = 8, R_P = 31 so for us the expected reduction in the number of multiplications:
Before: t^2 (R_F + R_P) = 5^2 * (8 + 31) = 975 multiplications
After: t^2 * R_F + 2 * t R_P = 5^2 * 8 + 5 * 2 * 31 = 510 multiplications
Which is about a 1.9x reduction in multiplications.
Comparing the empirical performance for the 4:1 parameter set derived using
poseidon-paramgen:ark-sponge: ~39.5us/hashunoptimized
poseidon-permutation(this repo): ~36.0us/hashoptimized
poseidon-permutation(this repo): ~26.8us/hashAt this point from profiling the computation for the optimized permutation, we're dominated by applying the Sboxes, which take about 60% of the computation of the permutation. The partial round multiplication is down to 26%, so if further improvements are made to that area of the code we'll get some additional speedups.