Add `PoseidonGLMemory` machine #1525

georgwiese · 2024-07-03T10:41:59Z

Implements #1055 for the Poseidon machines. Pulled out of #1508.

Specifically, this PR adds a new PoseidonGLMemory machine which receives 2 memory points and then:

Reads 24 32-Bit words and packs them into 12 field elements
Computes the Poseidon permutation (just like PoseidonGL)
For each of the 4 output field elements, it:
- Invokes the SplitGL machine to get the canonical u64 representation
- Writes the 8 32-Bit words to memory at the provided memory pointer

The read and write memory regions can even overlap! 🎉

This should simplify our RISC-V machine, as the syscall already expects two memory pointers. We can simply pass it to the machine directly.

I started doing that in #1533, but I think it makes sense to wait until #1443 is merged.

To test:

cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky

I recommend reviewing the diff between std/machines/hash/poseidon_gl.asm and std/machines/hash/poseidon_gl_memory.asm

Discussion

The overhead of the memory read / write is quite high (18 extra witness columns, see this comment, mostly because we now need to have the input available in all rows (which previously was only the case for the outputs). If we had offsets other than 0 and 1, this could be avoided. Doing 24 parallel memory reads in the first row would not help, because we'd have to add 24 witness columns (instead of 2 now) to store the result of the memory operation.

A few more notes:

With Vadcop, 18 extra witness columns in a secondary machine is a lot better than introducing more registers (either "regular" registers or assignment registers) in the main machine
As mentioned here, we could get rid of two permutations if either:
- We were able to express explicitly that we want to call at most one operation in the current row, or
- We had an optimizer that would be smart enough to batch the memory reads and writes.
We could also have just 1 read or write at a time (instead of 2), but we'd have to increase the block size from 31 to 32 and the implementation would be more complicated.
We could also store the full final state of the Poseidon permutation, instead of just the first 4 elements. This would need 8 more witness columns to make the entire output available in all rows. Then, one could use the machine to implement a Poseidon sponge, instead of.
Looking at the bootloader, maybe it makes sense to pass 3 input pointers instead of 1: One for the first 4 elements, one for the next 4, and one for the capacity (often just a constant). For example, when computing a Merkle root, you'd pass pointers for the two children hashes and a pointer to the capacity constant.

leonardoalt

Awesome!

leonardoalt · 2024-07-08T13:52:06Z

test_data/std/poseidon_gl_memory_test.asm

+    // can read in the given time step and write in the next time step.
+    col fixed STEP(i) { 2 * i };
+    Memory memory;
+    instr mstore_le ADDR1, X1, X2 ->


what's le, little endian?

no it's French

Implements #1055 for the Poseidon machines. Pulled out of #1508. Specifically, this PR adds a new `PoseidonGLMemory` machine which receives 2 memory points and then: - Reads 24 32-Bit words and packs them into 12 field elements - Computes the Poseidon permutation (just like `PoseidonGL`) - For each of the 4 output field elements, it: - Invokes the `SplitGL` machine to get the canonical `u64` representation - Writes the 8 32-Bit words to memory at the provided memory pointer The read and write memory regions can even overlap! 🎉 This should simplify our RISC-V machine, as the syscall already expects two memory pointers. We can simply pass it to the machine directly. I started doing that in #1533, but I think it makes sense to wait until #1443 is merged. To test: ``` cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky ``` I recommend reviewing the diff between `std/machines/hash/poseidon_gl.asm` and `std/machines/hash/poseidon_gl_memory.asm` ### Discussion The overhead of the memory read / write is quite high (18 extra witness columns, see [this comment](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L13-L23), mostly because we now need to have the input available in all rows (which previously was only the case for the outputs). If we had offsets other than 0 and 1, this could be avoided. Doing 24 parallel memory reads in the first row would *not* help, because we'd have to add 24 witness columns (instead of 2 now) to store the result of the memory operation. A few more notes: - With Vadcop, 18 extra witness columns in a secondary machine is *a lot* better than introducing more registers (either "regular" registers or assignment registers) in the main machine - As mentioned [here](https://github.com/powdr-labs/powdr/blob/40bdca4368c3accccb753aa35ac1027ccb8def0e/std/machines/hash/poseidon_gl_memory.asm#L111-L113), we could get rid of two permutations if either: - We were able to express explicitly that we want to call at most one operation in the current row, or - We had an optimizer that would be smart enough to batch the memory reads and writes. - We could also have just 1 read or write at a time (instead of 2), but we'd have to increase the block size from 31 to 32 and the implementation would be more complicated. - We could also store the full final state of the Poseidon permutation, instead of just the first 4 elements. This would need 8 more witness columns to make the entire output available in all rows. Then, one could use the machine to implement a Poseidon sponge, instead of. - Looking at the bootloader, maybe it makes sense to pass 3 input pointers instead of 1: One for the first 4 elements, one for the next 4, and one for the capacity (often just a constant). For example, when computing a Merkle root, you'd pass pointers for the two children hashes and a pointer to the capacity constant.

Poseidon Machines: Receive memory pointers instead of values

a1ec465

georgwiese force-pushed the machines-via-memory-poseidon branch from a0e45c5 to a1ec465 Compare July 4, 2024 08:24

georgwiese added 3 commits July 4, 2024 10:42

Fix test for estark-starky-backend

9843794

Polish

d6b71d1

Allow overlapping memory regions, port rest of the test

40bdca4

georgwiese marked this pull request as ready for review July 4, 2024 10:28

georgwiese changed the title ~~[WIP] Poseidon Machines: Receive memory pointers instead of values~~ Poseidon Machines: Receive memory pointers instead of values Jul 4, 2024

georgwiese changed the title ~~Poseidon Machines: Receive memory pointers instead of values~~ Add PoseidonGLMemory machine Jul 4, 2024

leonardoalt approved these changes Jul 8, 2024

View reviewed changes

georgwiese added this pull request to the merge queue Jul 8, 2024

Merged via the queue into main with commit b6f41e2 Jul 8, 2024
6 checks passed

georgwiese deleted the machines-via-memory-poseidon branch July 8, 2024 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `PoseidonGLMemory` machine #1525

Add `PoseidonGLMemory` machine #1525

georgwiese commented Jul 3, 2024 •

edited

Loading

leonardoalt left a comment

leonardoalt Jul 8, 2024

Schaeff Jul 8, 2024

Add PoseidonGLMemory machine #1525

Add PoseidonGLMemory machine #1525

Conversation

georgwiese commented Jul 3, 2024 • edited Loading

Discussion

leonardoalt left a comment

Choose a reason for hiding this comment

leonardoalt Jul 8, 2024

Choose a reason for hiding this comment

Schaeff Jul 8, 2024

Choose a reason for hiding this comment

Add `PoseidonGLMemory` machine #1525

Add `PoseidonGLMemory` machine #1525

georgwiese commented Jul 3, 2024 •

edited

Loading