Contains a list of optimizations that have been made to the scroll/halo2 codebase. This is a living document and will be updated as new optimizations are made.
We use a faster prime field which is generated by iden3/ffiasm.
Relevant commit: here.
Some parts of the Halo2 code hold polynomials in both coefficient and evaluation form. Considering that polynomials in coefficient form are only needed after squeezing y, we can delay IFFT until y is squeezed. This helps reduce memory pressure.
Relevant commit: here
Relevant code optimization: here.
Relevant halo2 code: permutation product poly.
Relevant code optimization: here.
Relevant halo2 code: permuted input poly, permuted table poly and lookup product poly.
Originally, when committing to multiple polynomials sequentially, the conversion from xyzz point type to affine point is done individually for each commitment, which leads to a large number of expensive inverse operations. We optimize this inefficiency with Batch Normalize.
To set up Batch Normalize, conversions from xyzz point type to affine point are delayed until all commitments are calculated. Once all calculations are finished, xyzz point types are converted to affine points at once with BatchNormalize(). This reduces the number of required inverse operations to once per sequence.
Relevant commit: here
Relevant code optimization: here.
Relevant halo2 code: permuted input poly and permuted table poly.
Relevant code optimization: here.
Relevant halo2 code: permutation product poly and lookup product poly.
Inverse operations are costly for elliptic curves. We can reduce the number of inverse operations on multiple points by using BatchInverse(), which only requires one inverse operation for all the points included.
We fully utilize Batch Inverse to avoid multiple inverse operations. For example, the original batch_normalize() inefficiently computed an inverse operation for every point, so we optimized our BatchNormalize() to use the batch inverse operation to normalize all the points with a single inverse operation.
We merge multiple parallelized parts into a single parallelized part for better performance by reducing the number of thread spawns and joins. Here are the list of examples:
Description | Tachyon | Halo2 |
---|---|---|
Merge parallelization | merging 1-t | merging 1-h |
Use collapse clause | merging 2-t | merging 2-h |
Several optimizations have been made to the Multi-Scalar Multiplication (MSM) operation. For scalar multiplication, we have implemented windowed non-adjacent form (WNAF), which is a more efficient way of representing the scalar. For group operations, we have changed the bucket type to the PointXYZZ type, which is more efficient than the Jacobian type. View the benchmark result regarding the change to PointXYZZ type here.
Considering the memory used by polynomials is typically high, reducing the number of memory allocations and reusing the existing allocated memory is very important.
Description | Tachyon | Halo2 |
---|---|---|
Reuse compressed evals buffer | saving 1-t | saving 1-h |
Reuse z buffer | saving 2-t | saving 2-h |
Compute l polys | saving 3-t | saving 3-h |
Avoid intermediate memory allocation when transposing | saving 4-t | saving 4-h |
We provide benchmarks of Tachyon and Halo2 for three main circuits by Privacy-scaling-explorations and Scroll.
The following are the results of the real_prover
test for the Transfer 0
block in the integration tests of zkevm-circuits
. The times, in seconds, were recorded on an AWS EC2 instance (name: r6i.8xlarge, vCPU: 32, RAM: 256GB).
Steps | Tx:Tachyon | Tx:Halo2 | EVM:Tachyon | EVM:Halo2 | Super:Tachyon | Super:Halo2 |
---|---|---|---|---|---|---|
degree | 20 | 20 | 18 | 18 | 20 | 20 |
before theta | 189.44 | 177.91 | 33.54 | 27.02 | 400.52 | 405.87 |
theta | 12.77 | 20.65 | 6.44 | 13.50 | 103.24 | 171.76 |
- compress | 1.12 | 2.73 | 0.23 | 0.46 | 6.59 | 14.35 |
- permute | 9.67 | 11.04 | 4.93 | 8.37 | 72.87 | 88.26 |
- commit | 1.98 | 6.89 | 1.28 | 4.66 | 23.78 | 69.15 |
beta-gamma | 15.75 | 32.18 | 7.79 | 14.62 | 91.04 | 174.76 |
y | 130.50 | 261.97 | 161.96 | 218.37 | 2130.99 | 3426.80 |
- transform poly | 4.27 | 6.57 | 1.86 | 2.79 | 23.83 | 34.37 |
- build h poly | 126.23 | 255.40 | 160.10 | 215.58 | 2107.16 | 3392.43 |
x | 1.49 | 1.58 | 2.18 | 2.47 | 15.50 | 14.42 |
shplonk | 8.05 | 5.87 | 1.35 | 3.31 | 37.52 | 25.30 |
backend total | 168.56 | 322.25 | 179.72 | 252.27 | 2378.29 | 3813.04 |
end-to-end total | 358.58 | 500.16 | 213.26 | 279.29 | 2778.81 | 4218.91 |
- Greek characters such as
theta
andbeta
are adopted from the halo2-book before theta
refers to the total time taken by the Halo2 including the circuit frontend.backend total
refers to the total time taken fromtheta
toshplonk
.end-to-end total
refers to the total time taken by the program (before theta
+backend total
).