Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pasta / Halo2 MSM bench #243

Merged
merged 7 commits into from
Jun 4, 2023
Merged

Pasta / Halo2 MSM bench #243

merged 7 commits into from
Jun 4, 2023

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented May 31, 2023

Benchmarks and optimization of MSM for the Halo2 Pasta curves

Current on 8 cores i9-11980HK with Clang

image

TODO

  • Bench vs Zcash / Privacy Scaling Exploration implementation
  • Make "no ASM" an environment variable to divide by 2 the number of nimble tasks
  • Generate coefficient-point pairs in parallel
  • Bench on large core count
  • Tune if needed

@mratsim
Copy link
Owner Author

mratsim commented May 31, 2023

Might have missed something, there is no benchmark for MSM in Halo2 or pasta_curves repo?

Used this PR: https://github.com/zcash/halo2/pull/619/files
file: https://github.com/zcash/halo2/blob/b131df023c9a860244b5b0a24d03a1c249f4c82c/halo2_proofs/benches/multiexp.rs

image

@mratsim
Copy link
Owner Author

mratsim commented May 31, 2023

Rebench on same scale as Halo2

image
image

Something is really strange.

For 2⁸ inputs / 256 inputs it takes 13ms, this is even slower than my naive scalar mul at 8.7ms. And multithreaded I am at 313µs so 41x faster.

The code in the PR is indeed multithreaded: https://github.com/zcash/halo2/blob/b131df0/halo2_proofs/src/arithmetic.rs#L28-L30

Is the benchmark flawed and measuring the trusted setup setup as well? https://github.com/zcash/halo2/blob/b131df0/halo2_proofs/benches/multiexp.rs#L25-L28

Thing is, time is roughly doubling each time we double the input size but MSM should scale as O(n/log n) while the Rust bench seems to grow linearly.

With 2¹⁵ inputs (32748), they take 1.7s while Constantine takes 12ms so a 141.7x ratio which sounds crazy. And the naive implementation in Constantine takes 1.11 seconds

@mratsim
Copy link
Owner Author

mratsim commented May 31, 2023

Benching vs Supranational at https://github.com/supranational/pasta-msm

image

image

2x faster

@mratsim
Copy link
Owner Author

mratsim commented Jun 4, 2023

On a watercooled overclocked 18-core i9-9980XE
ksnip_20230604-165631
ksnip_20230604-165649

@mratsim
Copy link
Owner Author

mratsim commented Jun 4, 2023

From Zcash repo: https://github.com/zcash/halo2/pull/619/files#diff-a07879d4aa4c95cfbfb03f5de33deee89d548aba465ff7bbdc5965d24463b0cb

ksnip_20230604-172628

Similar perf issues on Zcash:

  • 26ms for 256 inputs while on my machine Constantine is 0.412ms, a 63x ratio
  • 3.3932s for 32768 inputs while Constantine is 0.011s, a 308x ratio.

@mratsim
Copy link
Owner Author

mratsim commented Jun 4, 2023

Supranational: https://github.com/supranational/pasta-msm

ksnip_20230604-173622

There is a ratio 2.22x in favor of Constantine

@mratsim mratsim marked this pull request as ready for review June 4, 2023 15:40
@mratsim mratsim merged commit 0eba593 into master Jun 4, 2023
12 checks passed
@mratsim mratsim deleted the pasta-bench branch June 4, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant