You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently there are two approaches to accelerating the ECDSA signature: 1. accelerate ec_add and ec_double. 2. accelerate modmul and modadd only, which are the most time consuming component of ec_add and ec_double. 3. Do both *1 and *2.
*1 reduces execution trace rows by 60%, *2 only 5%, and *3 by 80%.
*2, despite poor performance right now, has the most potential for improvement, because modmul syscall, which is called extensively by *2, spends most rows in register copying from and to memory. *3 has limited potential for improvement because not many non-ec operations calls modmul, so any improvement would be marginal
It's still inconclusive whether *2 or *3 would achieve the best performance eventually.
The issue
Regardless of whether we do *2 or *3, modmul currently has two syscalls on the affine submachine, which is why we propose to reduce modmul and modadd to one syscall in this issue. This change would mostly benefit *2, if we can create a submachine just for modmul and modadd and remove all ec operations in the current arith machine.
Currently we do two syscalls, the first to obtain D and E from A * B + C = D * 2^256 + E, given A, B, C; and the second to obtain R from M * Q + R = D * 2^256 + E, given D, E, and M (SECP modulo). Q and R are computed as hints.
Basic proposal
User provides A, B, C, compute D and E as before. Also compute Q and R as hint columns. Have 32 limbs of M as 32 constants (instead of columns). In the same 32 rows where we enforce A * B + C = D * 2^256 + E, also enforce that M * Q + R = D * 2^256 + E. In addition, would also need a carry column for M * Q + R.
Additional proposal to reduce column C
Instead of having 16 columns for C, have two operations, one for addition and one for multiplication. Enforce that A (* or +) B = D * 2^256 + E depending on the operation. This would be quite easy to do given that the current eq0 constraint is just
let eq0 = |nr|
product(x1f, y1f)(nr) // this should work up to nr = 15, but after that, we would need a(nr - 15) * b(15) + ... + a(15) * b(nr - 15)?
+ x2f(nr)
- shift_right(y2f, 16)(nr) // 0 for nr in 0..16
- y3f(nr);
so we can just remove the product term for the add operation and remove the x2f term for the product operation. This has the advantage of shaving off the 16 C columns.
One concern for this approach is that the add operation would just take 16 rows for the 16 limbs instead of 32 as in the case of multiplication. I think this should be fine if we just keep all columns constant after 16 rows for the add operation?
Additional proposal to reduce columns Q and R
Not sure if possible but we can also put Q and R in the place of B and C and M in the place of A and just add 32 additional rows. So there would be 64 rows for each latch. We would need another fixed columns like [0, 0, 0, ..., 0, 1, 1, 1, ..., 1] with 32 zeroes and 32 ones, and A, B, C will be query columns that is none for when this fixed column is 0 and hints for when this fixed column is 1.
One concern for this approach is that submachine rows might overflow main machine rows, because currently our ECDSA test calls modmul 13k times, for 64 rows per call, this would be 832k rows. Currently the main machine has 3.1 million rows, but after further optimization this might go below 832k rows.
Other potential optimizations
Currently and in this proposal, it's not enforced that modmul and modadd will return remainder that's smaller than the modulus. This is enforced as an assertion on the Rust level of the k256 crate field mul function. We could consider enforcing this in less than constraints in the submachine, but I'm not sure how to efficiently do it for numbers with 16 limbs.
The text was updated successfully, but these errors were encountered:
I think this makes sense, in particular once we reduce the "fixed" cost of doing ecalls, which is quite high right now.
I don't know about the proving cost for each approach (accelerate ec VS modular operations) but we can measure and compare once we have everything in place.
Background
ec_add
andec_double
. 2. acceleratemodmul
andmodadd
only, which are the most time consuming component ofec_add
andec_double
. 3. Do both *1 and *2.modmul
syscall, which is called extensively by *2, spends most rows in register copying from and to memory. *3 has limited potential for improvement because not many non-ec operations callsmodmul
, so any improvement would be marginalThe issue
modmul
currently has two syscalls on the affine submachine, which is why we propose to reducemodmul
andmodadd
to one syscall in this issue. This change would mostly benefit *2, if we can create a submachine just formodmul
andmodadd
and remove all ec operations in the current arith machine.Basic proposal
Additional proposal to reduce column C
eq0
constraint is justso we can just remove the
product
term for the add operation and remove thex2f
term for the product operation. This has the advantage of shaving off the 16 C columns.Additional proposal to reduce columns Q and R
modmul
13k times, for 64 rows per call, this would be 832k rows. Currently the main machine has 3.1 million rows, but after further optimization this might go below 832k rows.Other potential optimizations
modmul
andmodadd
will return remainder that's smaller than the modulus. This is enforced as an assertion on the Rust level of thek256
crate fieldmul
function. We could consider enforcing this in less than constraints in the submachine, but I'm not sure how to efficiently do it for numbers with 16 limbs.The text was updated successfully, but these errors were encountered: