-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registers in memory #1443
base: main
Are you sure you want to change the base?
Registers in memory #1443
Conversation
2284f46
to
2aedf98
Compare
48195ed
to
8b67cba
Compare
2a1b8f5
to
db4fcbb
Compare
db4fcbb
to
5532422
Compare
Implements #1055 for the Poseidon machines. Pulled out of #1508. Specifically, this PR adds a new `PoseidonGLMemory` machine which receives 2 memory points and then: - Reads 24 32-Bit words and packs them into 12 field elements - Computes the Poseidon permutation (just like `PoseidonGL`) - For each of the 4 output field elements, it: - Invokes the `SplitGL` machine to get the canonical `u64` representation - Writes the 8 32-Bit words to memory at the provided memory pointer The read and write memory regions can even overlap! 🎉 This should simplify our RISC-V machine, as the syscall already expects two memory pointers. We can simply pass it to the machine directly. I started doing that in #1533, but I think it makes sense to wait until #1443 is merged. To test: ``` cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky ``` I recommend reviewing the diff between `std/machines/hash/poseidon_gl.asm` and `std/machines/hash/poseidon_gl_memory.asm` ### Discussion The overhead of the memory read / write is quite high (18 extra witness columns, see [this comment](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L13-L23), mostly because we now need to have the input available in all rows (which previously was only the case for the outputs). If we had offsets other than 0 and 1, this could be avoided. Doing 24 parallel memory reads in the first row would *not* help, because we'd have to add 24 witness columns (instead of 2 now) to store the result of the memory operation. A few more notes: - With Vadcop, 18 extra witness columns in a secondary machine is *a lot* better than introducing more registers (either "regular" registers or assignment registers) in the main machine - As mentioned [here](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L111-L113), we could get rid of two permutations if either: - We were able to express explicitly that we want to call at most one operation in the current row, or - We had an optimizer that would be smart enough to batch the memory reads and writes. - We could also have just 1 read or write at a time (instead of 2), but we'd have to increase the block size from 31 to 32 and the implementation would be more complicated. - We could also store the full final state of the Poseidon permutation, instead of just the first 4 elements. This would need 8 more witness columns to make the entire output available in all rows. Then, one could use the machine to implement a Poseidon sponge, instead of. - Looking at the bootloader, maybe it makes sense to pass 3 input pointers instead of 1: One for the first 4 elements, one for the next 4, and one for the capacity (often just a constant). For example, when computing a Merkle root, you'd pass pointers for the two children hashes and a pointer to the capacity constant.
col witness XX, XX_inv, XXIsZero; | ||
std::utils::force_bool(XXIsZero); | ||
XXIsZero * XX = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use a std tool function here.
col witness val3_col; | ||
col witness val4_col; | ||
|
||
col witness XX, XX_inv, XXIsZero; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have a comment what these are used for
XXIsZero * XX = 0; | ||
|
||
// HACK: This constraint cannot be active globally, because when | ||
// XX is not constrained, witgen will try to set XX, XX_inv and XXIsZero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also use a hint
pc' = (1 - XXIsZero) * l + XXIsZero * (pc + 1) | ||
} | ||
|
||
instr branch_if_zero X, Y, Z, l: label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Z be a number?
|
||
// Skips Y instructions if X is zero | ||
instr skip_if_zero X, Y { pc' = pc + 1 + (XIsZero * Y) } | ||
//instr skip_if_zero X, Y { pc' = pc + 1 + (XIsZero * Y) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//instr skip_if_zero X, Y { pc' = pc + 1 + (XIsZero * Y) } |
link ~> val1_col = regs.mload(X, STEP) | ||
link ~> val2_col = regs.mload(Y, STEP + 1) | ||
{ | ||
(val1_col - val2_col + Z) + 2**32 - 1 = X_b1 + X_b2 * 0x100 + X_b3 * 0x10000 + X_b4 * 0x1000000 + wrap_bit * 2**32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is too late, but why not use e.g. a
for val1_col
? I think the name is a bit unwieldy
instr is_positive X, Y, Z, W | ||
link ~> val1_col = regs.mload(X, STEP) | ||
link ~> val2_col = regs.mload(Y, STEP + 1) | ||
link ~> regs.mstore(W, STEP + 2, val3_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link ~> regs.mstore(W, STEP + 2, val3_col) | |
link ~> regs.mstore(W, STEP + 2, wrap_bit) |
?
link ~> val2_col = regs.mload(Y, STEP + 1) | ||
link ~> regs.mstore(W, STEP + 2, val3_col) | ||
{ | ||
(val1_col - val2_col + Z) + 2**32 - 1 = X_b1 + X_b2 * 0x100 + X_b3 * 0x10000 + X_b4 * 0x1000000 + wrap_bit * 2**32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use a function for X_b1 + X_b2 * 0x100 + X_b3 * 0x10000 + X_b4 * 0x1000000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even the whole thing with wrap_bit (or pass wrap_bit in)
link ~> val1_col = regs.mload(X, STEP) | ||
link ~> regs.mstore(W, STEP + 2, val3_col) | ||
{ | ||
XXIsZero = 1 - XX * XX_inv, | ||
XX = val1_col, | ||
val3_col = XXIsZero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link ~> val1_col = regs.mload(X, STEP) | |
link ~> regs.mstore(W, STEP + 2, val3_col) | |
{ | |
XXIsZero = 1 - XX * XX_inv, | |
XX = val1_col, | |
val3_col = XXIsZero | |
link ~> XX = regs.mload(X, STEP) | |
link ~> regs.mstore(W, STEP + 2, XXIsZero) | |
{ | |
XXIsZero = 1 - XX * XX_inv, |
?
Implements #1055 for the Poseidon machines. Pulled out of #1508. Specifically, this PR adds a new `PoseidonGLMemory` machine which receives 2 memory points and then: - Reads 24 32-Bit words and packs them into 12 field elements - Computes the Poseidon permutation (just like `PoseidonGL`) - For each of the 4 output field elements, it: - Invokes the `SplitGL` machine to get the canonical `u64` representation - Writes the 8 32-Bit words to memory at the provided memory pointer The read and write memory regions can even overlap! 🎉 This should simplify our RISC-V machine, as the syscall already expects two memory pointers. We can simply pass it to the machine directly. I started doing that in #1533, but I think it makes sense to wait until #1443 is merged. To test: ``` cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky ``` I recommend reviewing the diff between `std/machines/hash/poseidon_gl.asm` and `std/machines/hash/poseidon_gl_memory.asm` ### Discussion The overhead of the memory read / write is quite high (18 extra witness columns, see [this comment](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L13-L23), mostly because we now need to have the input available in all rows (which previously was only the case for the outputs). If we had offsets other than 0 and 1, this could be avoided. Doing 24 parallel memory reads in the first row would *not* help, because we'd have to add 24 witness columns (instead of 2 now) to store the result of the memory operation. A few more notes: - With Vadcop, 18 extra witness columns in a secondary machine is *a lot* better than introducing more registers (either "regular" registers or assignment registers) in the main machine - As mentioned [here](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L111-L113), we could get rid of two permutations if either: - We were able to express explicitly that we want to call at most one operation in the current row, or - We had an optimizer that would be smart enough to batch the memory reads and writes. - We could also have just 1 read or write at a time (instead of 2), but we'd have to increase the block size from 31 to 32 and the implementation would be more complicated. - We could also store the full final state of the Poseidon permutation, instead of just the first 4 elements. This would need 8 more witness columns to make the entire output available in all rows. Then, one could use the machine to implement a Poseidon sponge, instead of. - Looking at the bootloader, maybe it makes sense to pass 3 input pointers instead of 1: One for the first 4 elements, one for the next 4, and one for the capacity (often just a constant). For example, when computing a Merkle root, you'd pass pointers for the two children hashes and a pointer to the capacity constant.
Fixes #1492