Registers in memory #1443

leonardoalt · 2024-06-11T09:36:00Z

Implements #1055 for the Poseidon machines. Pulled out of #1508. Specifically, this PR adds a new `PoseidonGLMemory` machine which receives 2 memory points and then: - Reads 24 32-Bit words and packs them into 12 field elements - Computes the Poseidon permutation (just like `PoseidonGL`) - For each of the 4 output field elements, it: - Invokes the `SplitGL` machine to get the canonical `u64` representation - Writes the 8 32-Bit words to memory at the provided memory pointer The read and write memory regions can even overlap! 🎉 This should simplify our RISC-V machine, as the syscall already expects two memory pointers. We can simply pass it to the machine directly. I started doing that in #1533, but I think it makes sense to wait until #1443 is merged. To test: ``` cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky ``` I recommend reviewing the diff between `std/machines/hash/poseidon_gl.asm` and `std/machines/hash/poseidon_gl_memory.asm` ### Discussion The overhead of the memory read / write is quite high (18 extra witness columns, see [this comment](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L13-L23), mostly because we now need to have the input available in all rows (which previously was only the case for the outputs). If we had offsets other than 0 and 1, this could be avoided. Doing 24 parallel memory reads in the first row would *not* help, because we'd have to add 24 witness columns (instead of 2 now) to store the result of the memory operation. A few more notes: - With Vadcop, 18 extra witness columns in a secondary machine is *a lot* better than introducing more registers (either "regular" registers or assignment registers) in the main machine - As mentioned [here](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L111-L113), we could get rid of two permutations if either: - We were able to express explicitly that we want to call at most one operation in the current row, or - We had an optimizer that would be smart enough to batch the memory reads and writes. - We could also have just 1 read or write at a time (instead of 2), but we'd have to increase the block size from 31 to 32 and the implementation would be more complicated. - We could also store the full final state of the Poseidon permutation, instead of just the first 4 elements. This would need 8 more witness columns to make the entire output available in all rows. Then, one could use the machine to implement a Poseidon sponge, instead of. - Looking at the bootloader, maybe it makes sense to pass 3 input pointers instead of 1: One for the first 4 elements, one for the next 4, and one for the capacity (often just a constant). For example, when computing a Merkle root, you'd pass pointers for the two children hashes and a pointer to the capacity constant.

chriseth · 2024-07-09T18:31:41Z

riscv/src/code_gen.rs

+    col witness XX, XX_inv, XXIsZero;
+    std::utils::force_bool(XXIsZero);
+    XXIsZero * XX = 0;


could use a std tool function here.

chriseth · 2024-07-09T18:32:33Z

riscv/src/code_gen.rs

+    col witness val3_col;
+    col witness val4_col;
+
+    col witness XX, XX_inv, XXIsZero;


Could have a comment what these are used for

chriseth · 2024-07-09T18:33:38Z

riscv/src/code_gen.rs

+    XXIsZero * XX = 0;
+
+    // HACK: This constraint cannot be active globally, because when
+    // XX is not constrained, witgen will try to set XX, XX_inv and XXIsZero


could also use a hint

chriseth · 2024-07-09T18:37:03Z

riscv/src/code_gen.rs

+        pc' = (1 - XXIsZero) * l + XXIsZero * (pc + 1)
+    }
+
+    instr branch_if_zero X, Y, Z, l: label


Can Z be a number?

chriseth · 2024-07-09T18:38:21Z

riscv/src/code_gen.rs


    // Skips Y instructions if X is zero
-    instr skip_if_zero X, Y { pc' = pc + 1 + (XIsZero * Y) }
+    //instr skip_if_zero X, Y { pc' = pc + 1 + (XIsZero * Y) }


Suggested change

//instr skip_if_zero X, Y { pc' = pc + 1 + (XIsZero * Y) }

chriseth · 2024-07-09T18:39:57Z

riscv/src/code_gen.rs

+        link ~> val1_col = regs.mload(X, STEP)
+        link ~> val2_col = regs.mload(Y, STEP + 1)
+    {
+        (val1_col - val2_col + Z) + 2**32 - 1 = X_b1 + X_b2 * 0x100 + X_b3 * 0x10000 + X_b4 * 0x1000000 + wrap_bit * 2**32,


Maybe this is too late, but why not use e.g. a for val1_col? I think the name is a bit unwieldy

chriseth · 2024-07-09T18:40:46Z

riscv/src/code_gen.rs

+    instr is_positive X, Y, Z, W
+        link ~> val1_col = regs.mload(X, STEP)
+        link ~> val2_col = regs.mload(Y, STEP + 1)
+        link ~> regs.mstore(W, STEP + 2, val3_col)


Suggested change

link ~> regs.mstore(W, STEP + 2, val3_col)

link ~> regs.mstore(W, STEP + 2, wrap_bit)

?

chriseth · 2024-07-09T18:41:50Z

riscv/src/code_gen.rs

+        link ~> val2_col = regs.mload(Y, STEP + 1)
+        link ~> regs.mstore(W, STEP + 2, val3_col)
+    {
+        (val1_col - val2_col + Z) + 2**32 - 1 = X_b1 + X_b2 * 0x100 + X_b3 * 0x10000 + X_b4 * 0x1000000 + wrap_bit * 2**32,


could use a function for X_b1 + X_b2 * 0x100 + X_b3 * 0x10000 + X_b4 * 0x1000000

or even the whole thing with wrap_bit (or pass wrap_bit in)

chriseth · 2024-07-09T18:43:40Z

riscv/src/code_gen.rs

+        link ~> val1_col = regs.mload(X, STEP)
+        link ~> regs.mstore(W, STEP + 2, val3_col)
+    {
+        XXIsZero = 1 - XX * XX_inv,
+        XX = val1_col,
+        val3_col = XXIsZero


Suggested change

link ~> val1_col = regs.mload(X, STEP)

link ~> regs.mstore(W, STEP + 2, val3_col)

{

XXIsZero = 1 - XX * XX_inv,

XX = val1_col,

val3_col = XXIsZero

link ~> XX = regs.mload(X, STEP)

link ~> regs.mstore(W, STEP + 2, XXIsZero)

{

XXIsZero = 1 - XX * XX_inv,

?

Implements #1055 for the Poseidon machines. Pulled out of #1508. Specifically, this PR adds a new `PoseidonGLMemory` machine which receives 2 memory points and then: - Reads 24 32-Bit words and packs them into 12 field elements - Computes the Poseidon permutation (just like `PoseidonGL`) - For each of the 4 output field elements, it: - Invokes the `SplitGL` machine to get the canonical `u64` representation - Writes the 8 32-Bit words to memory at the provided memory pointer The read and write memory regions can even overlap! 🎉 This should simplify our RISC-V machine, as the syscall already expects two memory pointers. We can simply pass it to the machine directly. I started doing that in #1533, but I think it makes sense to wait until #1443 is merged. To test: ``` cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky ``` I recommend reviewing the diff between `std/machines/hash/poseidon_gl.asm` and `std/machines/hash/poseidon_gl_memory.asm` ### Discussion The overhead of the memory read / write is quite high (18 extra witness columns, see [this comment](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L13-L23), mostly because we now need to have the input available in all rows (which previously was only the case for the outputs). If we had offsets other than 0 and 1, this could be avoided. Doing 24 parallel memory reads in the first row would *not* help, because we'd have to add 24 witness columns (instead of 2 now) to store the result of the memory operation. A few more notes: - With Vadcop, 18 extra witness columns in a secondary machine is *a lot* better than introducing more registers (either "regular" registers or assignment registers) in the main machine - As mentioned [here](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L111-L113), we could get rid of two permutations if either: - We were able to express explicitly that we want to call at most one operation in the current row, or - We had an optimizer that would be smart enough to batch the memory reads and writes. - We could also have just 1 read or write at a time (instead of 2), but we'd have to increase the block size from 31 to 32 and the implementation would be more complicated. - We could also store the full final state of the Poseidon permutation, instead of just the first 4 elements. This would need 8 more witness columns to make the entire output available in all rows. Then, one could use the machine to implement a Poseidon sponge, instead of. - Looking at the bootloader, maybe it makes sense to pass 3 input pointers instead of 1: One for the first 4 elements, one for the next 4, and one for the capacity (often just a constant). For example, when computing a Merkle root, you'd pass pointers for the two children hashes and a pointer to the capacity constant.

leonardoalt mentioned this pull request Jun 13, 2024

[WIP] Registers in memory #1420

Closed

leonardoalt changed the title ~~Registers in memory rebase attempt~~ Registers in memory Jun 13, 2024

leonardoalt force-pushed the registers-in-memory-dev branch 2 times, most recently from 2284f46 to 2aedf98 Compare June 14, 2024 11:35

leonardoalt changed the base branch from main to instr_link June 14, 2024 11:36

leonardoalt force-pushed the registers-in-memory-dev branch 5 times, most recently from 48195ed to 8b67cba Compare June 18, 2024 16:29

Base automatically changed from instr_link to main June 18, 2024 18:00

leonardoalt force-pushed the registers-in-memory-dev branch 5 times, most recently from 2a1b8f5 to db4fcbb Compare June 26, 2024 18:24

registers in memory

5532422

leonardoalt force-pushed the registers-in-memory-dev branch from db4fcbb to 5532422 Compare June 27, 2024 09:44

This was referenced Jul 4, 2024

Start implementing memory poseidon instruction via memory #1533

Draft

Add PoseidonGLMemory machine #1525

Merged

lvella self-assigned this Jul 8, 2024

leonardoalt added 2 commits July 9, 2024 10:25

Merge remote-tracking branch 'origin/main' into registers-in-memory-dev

e57edad

conflict fixes

b0eeb75

leonardoalt marked this pull request as ready for review July 9, 2024 08:47

chriseth reviewed Jul 9, 2024

View reviewed changes

riscv/src/code_gen.rs

pc' = (1 - XXIsZero) * l + XXIsZero * (pc + 1)

}

instr branch_if_zero X, Y, Z, l: label

Copy link

Member

chriseth Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Z be a number?

chriseth reviewed Jul 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Registers in memory #1443

Registers in memory #1443

leonardoalt commented Jun 11, 2024 •

edited

Loading

chriseth Jul 9, 2024

chriseth Jul 9, 2024

chriseth Jul 9, 2024

chriseth Jul 9, 2024

chriseth Jul 9, 2024

chriseth Jul 9, 2024

chriseth Jul 9, 2024 •

edited

Loading

chriseth Jul 9, 2024

chriseth Jul 9, 2024

chriseth Jul 9, 2024

	link ~> regs.mstore(W, STEP + 2, val3_col)
	link ~> regs.mstore(W, STEP + 2, wrap_bit)

Registers in memory #1443

Are you sure you want to change the base?

Registers in memory #1443

Conversation

leonardoalt commented Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chriseth Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonardoalt commented Jun 11, 2024 •

edited

Loading

chriseth Jul 9, 2024 •

edited

Loading