Skip to content

Latest commit

 

History

History
129 lines (85 loc) · 10.9 KB

README.md

File metadata and controls

129 lines (85 loc) · 10.9 KB

Optimizations

Contains a list of optimizations that have been made to the scroll/halo2 codebase. This is a living document and will be updated as new optimizations are made.

1. PrimeField in Assembly

We use a faster prime field which is generated by iden3/ffiasm.

Relevant commit: here.

2. Lazy IFFT

Some parts of the Halo2 code hold polynomials in both coefficient and evaluation form. Considering that polynomials in coefficient form are only needed after squeezing y, we can delay IFFT until y is squeezed. This helps reduce memory pressure.

Relevant commit: here

2.1. Lazy IFFT for Permutation Argument

Relevant code optimization: here.

Relevant halo2 code: permutation product poly.

2.2. Lazy IFFT for Lookup Argument

Relevant code optimization: here.

Relevant halo2 code: permuted input poly, permuted table poly and lookup product poly.

3. Batch Normalize

Originally, when committing to multiple polynomials sequentially, the conversion from xyzz point type to affine point is done individually for each commitment, which leads to a large number of expensive inverse operations. We optimize this inefficiency with Batch Normalize.

To set up Batch Normalize, conversions from xyzz point type to affine point are delayed until all commitments are calculated. Once all calculations are finished, xyzz point types are converted to affine points at once with BatchNormalize(). This reduces the number of required inverse operations to once per sequence.

Relevant commit: here

3.1. Batch Normalize for Permuted Polys

Relevant code optimization: here.

Relevant halo2 code: permuted input poly and permuted table poly.

3.2. Batch Normalize for Grand Product Polys

Relevant code optimization: here.

Relevant halo2 code: permutation product poly and lookup product poly.

3.3. Batch Inverse in Batch Normalize

Inverse operations are costly for elliptic curves. We can reduce the number of inverse operations on multiple points by using BatchInverse(), which only requires one inverse operation for all the points included.

We fully utilize Batch Inverse to avoid multiple inverse operations. For example, the original batch_normalize() inefficiently computed an inverse operation for every point, so we optimized our BatchNormalize() to use the batch inverse operation to normalize all the points with a single inverse operation.

4. Optimizing parallelization code

We merge multiple parallelized parts into a single parallelized part for better performance by reducing the number of thread spawns and joins. Here are the list of examples:

Description Tachyon Halo2
Merge parallelization merging 1-t merging 1-h
Use collapse clause merging 2-t merging 2-h

5. Optimizing MSM

Several optimizations have been made to the Multi-Scalar Multiplication (MSM) operation. For scalar multiplication, we have implemented windowed non-adjacent form (WNAF), which is a more efficient way of representing the scalar. For group operations, we have changed the bucket type to the PointXYZZ type, which is more efficient than the Jacobian type. View the benchmark result regarding the change to PointXYZZ type here.

6. Saving heap allocation

Considering the memory used by polynomials is typically high, reducing the number of memory allocations and reusing the existing allocated memory is very important.

Description Tachyon Halo2
Reuse compressed evals buffer saving 1-t saving 1-h
Reuse z buffer saving 2-t saving 2-h
Compute l polys saving 3-t saving 3-h
Avoid intermediate memory allocation when transposing saving 4-t saving 4-h

7. Benchmarks for each circuit

We provide benchmarks of Tachyon and Halo2 for three main circuits by Privacy-scaling-explorations and Scroll.

The following are the results of the real_prover test for the Transfer 0 block in the integration tests of zkevm-circuits. The times, in seconds, were recorded on an AWS EC2 instance (name: r6i.8xlarge, vCPU: 32, RAM: 256GB).

Steps Tx:Tachyon Tx:Halo2 EVM:Tachyon EVM:Halo2 Super:Tachyon Super:Halo2
degree 20 20 18 18 20 20
before theta 189.44 177.91 33.54 27.02 400.52 405.87
theta 12.77 20.65 6.44 13.50 103.24 171.76
- compress 1.12 2.73 0.23 0.46 6.59 14.35
- permute 9.67 11.04 4.93 8.37 72.87 88.26
- commit 1.98 6.89 1.28 4.66 23.78 69.15
beta-gamma 15.75 32.18 7.79 14.62 91.04 174.76
y 130.50 261.97 161.96 218.37 2130.99 3426.80
- transform poly 4.27 6.57 1.86 2.79 23.83 34.37
- build h poly 126.23 255.40 160.10 215.58 2107.16 3392.43
x 1.49 1.58 2.18 2.47 15.50 14.42
shplonk 8.05 5.87 1.35 3.31 37.52 25.30
backend total 168.56 322.25 179.72 252.27 2378.29 3813.04
end-to-end total 358.58 500.16 213.26 279.29 2778.81 4218.91
  • Greek characters such as theta and beta are adopted from the halo2-book
  • before theta refers to the total time taken by the Halo2 including the circuit frontend.
  • backend total refers to the total time taken from theta to shplonk.
  • end-to-end total refers to the total time taken by the program (before theta + backend total).