Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single syscall modmul and modadd submachine #1436

Open
qwang98 opened this issue Jun 7, 2024 · 2 comments
Open

Single syscall modmul and modadd submachine #1436

qwang98 opened this issue Jun 7, 2024 · 2 comments

Comments

@qwang98
Copy link
Collaborator

qwang98 commented Jun 7, 2024

Background

  • Currently there are two approaches to accelerating the ECDSA signature: 1. accelerate ec_add and ec_double. 2. accelerate modmul and modadd only, which are the most time consuming component of ec_add and ec_double. 3. Do both *1 and *2.
  • *1 reduces execution trace rows by 60%, *2 only 5%, and *3 by 80%.
  • *2, despite poor performance right now, has the most potential for improvement, because modmul syscall, which is called extensively by *2, spends most rows in register copying from and to memory. *3 has limited potential for improvement because not many non-ec operations calls modmul, so any improvement would be marginal
  • It's still inconclusive whether *2 or *3 would achieve the best performance eventually.

The issue

  • Regardless of whether we do *2 or *3, modmul currently has two syscalls on the affine submachine, which is why we propose to reduce modmul and modadd to one syscall in this issue. This change would mostly benefit *2, if we can create a submachine just for modmul and modadd and remove all ec operations in the current arith machine.
  • Currently we do two syscalls, the first to obtain D and E from A * B + C = D * 2^256 + E, given A, B, C; and the second to obtain R from M * Q + R = D * 2^256 + E, given D, E, and M (SECP modulo). Q and R are computed as hints.

Basic proposal

  • User provides A, B, C, compute D and E as before. Also compute Q and R as hint columns. Have 32 limbs of M as 32 constants (instead of columns). In the same 32 rows where we enforce A * B + C = D * 2^256 + E, also enforce that M * Q + R = D * 2^256 + E. In addition, would also need a carry column for M * Q + R.

Additional proposal to reduce column C

  • Instead of having 16 columns for C, have two operations, one for addition and one for multiplication. Enforce that A (* or +) B = D * 2^256 + E depending on the operation. This would be quite easy to do given that the current eq0 constraint is just
let eq0 = |nr|
        product(x1f, y1f)(nr) // this should work up to nr = 15, but after that, we would need a(nr - 15) * b(15) + ... + a(15) * b(nr - 15)?
        + x2f(nr)
        - shift_right(y2f, 16)(nr) // 0 for nr in 0..16
        - y3f(nr);

so we can just remove the product term for the add operation and remove the x2f term for the product operation. This has the advantage of shaving off the 16 C columns.

  • One concern for this approach is that the add operation would just take 16 rows for the 16 limbs instead of 32 as in the case of multiplication. I think this should be fine if we just keep all columns constant after 16 rows for the add operation?

Additional proposal to reduce columns Q and R

  • Not sure if possible but we can also put Q and R in the place of B and C and M in the place of A and just add 32 additional rows. So there would be 64 rows for each latch. We would need another fixed columns like [0, 0, 0, ..., 0, 1, 1, 1, ..., 1] with 32 zeroes and 32 ones, and A, B, C will be query columns that is none for when this fixed column is 0 and hints for when this fixed column is 1.
  • One concern for this approach is that submachine rows might overflow main machine rows, because currently our ECDSA test calls modmul 13k times, for 64 rows per call, this would be 832k rows. Currently the main machine has 3.1 million rows, but after further optimization this might go below 832k rows.

Other potential optimizations

  • Currently and in this proposal, it's not enforced that modmul and modadd will return remainder that's smaller than the modulus. This is enforced as an assertion on the Rust level of the k256 crate field mul function. We could consider enforcing this in less than constraints in the submachine, but I'm not sure how to efficiently do it for numbers with 16 limbs.
@qwang98
Copy link
Collaborator Author

qwang98 commented Jun 7, 2024

@georgwiese @pacheco Thoughts? :)

@pacheco
Copy link
Collaborator

pacheco commented Jun 10, 2024

I think this makes sense, in particular once we reduce the "fixed" cost of doing ecalls, which is quite high right now.
I don't know about the proving cost for each approach (accelerate ec VS modular operations) but we can measure and compare once we have everything in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants