Introduce performance benchmarks for Stage 1 ZK WASM implementation #90

aborg-dev · 2023-11-17T12:15:31Z

As a part of Stage 1 plan we aimed for at most 1000x slowdown compared to native execution.

We need to introduce a few representative benchmarks that allow to:

Measure slowdown compared to native execution
Simple enough to work with Stage 1 instruction set
[Bonus] Comparable to other ZK proving systems

A few ideas:

https://github.com/delendum-xyz/zk-benchmarking/tree/main
https://dev.risczero.com/api/zkvm/benchmarks
Merkle proof compression

aborg-dev · 2023-11-20T15:59:53Z

I took a simple program computing SHA256 of a constant string: https://github.com/akashin/bench/blob/sha256/src/main.rs

And generated WASM for it https://gist.github.com/akashin/b038a9e9fcc9bb4ccf0dc5b990f5d9df with following commands:

cargo build --target wasm32-unknown-unknown --release
# Before strip: 1.8M
# After strip: 37K
wasm-strip cranelift-bench.wasm

The final program has ~16k WASM ops.
It uses:

Single memory region and 1 memory.grow instruction
700 i32.xor ops, 400 i32.and ops
300 i32.shl and i32.shr_u instructions; 600 i32.rot
600 i32.load and 600 i32.store ops
434 call ops
1250 i32.add ops
37 call_indirect ops

aborg-dev · 2023-11-21T16:29:34Z

Another very simple program computing fibonacci: https://github.com/akashin/bench/blob/fibo/src/main.rs

Here is generated WAT: https://gist.github.com/akashin/326d3273ec98436e80813ad83f24dfda

It contains ~10k WASM ops, which is much more than I expected.
I will post the breakdown of instructions below.

UPD: The actual fibonacci code is relatively small, so we need to find a way to trim the rest functions that we don't need:

  (func (;4;) (type 0)
    (local i32 i64 i64 i32)
    global.get 0
    i32.const 32
    i32.sub
    local.tee 0
    global.set 0
    i64.const 0
    local.set 1
    i64.const 1
    local.set 2
    i32.const 10000
    local.set 3
    loop  ;; label = @1
      local.get 2
      local.get 1
      local.get 2
      i64.add
      local.tee 1
      i64.add
      local.tee 2
      local.get 1
      local.get 2
      i64.add
      local.tee 2
      i64.add
      local.tee 1
      local.get 2
      local.get 1
      i64.add
      local.tee 2
      i64.add
      local.tee 1
      local.get 2
      local.get 1
      i64.add
      local.tee 2
      i64.add
      local.tee 1
      local.get 2
      local.get 1
      i64.add
      local.tee 1
      i64.add
      local.set 2
      local.get 3
      i32.const -10
      i32.add
      local.tee 3
      br_if 0 (;@1;)
    end
    local.get 0
    local.get 1
    i64.store
    block  ;; label = @1
      local.get 1
      i64.const -2872092127636481573
      i64.ne
      br_if 0 (;@1;)
      local.get 0
      i32.const 32
      i32.add
      global.set 0
      return
    end
    local.get 0
    i32.const 0
    i32.store offset=8
    i32.const 0
    local.get 0
    i32.const 1048592
    local.get 0
    i32.const 8
    i32.add
    i32.const 1048612
    call 3
    unreachable)

aborg-dev · 2023-11-22T16:53:07Z

I've managed to convince Rust to generate the small program for fibonacci by using no_std incantations:

#![no_main]
#![no_std]

use panic_halt as _;

extern "C" {
    fn assert_eq(actual: i64, expected: i64);
}

#[no_mangle]
pub fn main() {
    let mut a = 0i64;
    let mut b = 1i64;

    for _ in 0..10000 {
        let c = a.wrapping_add(b);
        a = b;
        b = c;
    }
    unsafe {
        assert_eq(a, -2872092127636481573);
    }
}

The generated WAT file is 61 lines long and only contains the relevant Fibonacci implementation:

(module
  (type (;0;) (func (param i64 i64)))
  (type (;1;) (func))
  (import "env" "assert_eq" (func (;0;) (type 0)))
  (func (;1;) (type 1)
    (local i64 i64 i32)
    i64.const 1
    local.set 0
    i64.const 0
    local.set 1
    i32.const 10000
    local.set 2
    loop  ;; label = @1
      local.get 0
      local.get 1
      i64.add
      local.tee 1
      local.get 0
      i64.add
      local.tee 0
      local.get 1
      i64.add
      local.tee 1
      local.get 0
      i64.add
      local.tee 0
      local.get 1
      i64.add
      local.tee 1
      local.get 0
      i64.add
      local.tee 0
      local.get 1
      i64.add
      local.tee 1
      local.get 0
      i64.add
      local.tee 0
      local.get 1
      i64.add
      local.tee 1
      local.get 0
      i64.add
      local.set 0
      local.get 2
      i32.const -10
      i32.add
      local.tee 2
      br_if 0 (;@1;)
    end
    local.get 1
    i64.const -2872092127636481573
    call 0)
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (global (;1;) i32 (i32.const 1048576))
  (global (;2;) i32 (i32.const 1048576))
  (export "memory" (memory 0))
  (export "main" (func 1))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2)))

I think this would be good enough for our first benchmark and I believe we already support all necessary opcodes.

As a next step, I will see if I can reuse the same techniques to build SHA2 benchmark and see which instructions it uses.

Related to Fibonacci benchmark there is one outstanding questions: How to generate a start function for the WASM module?

I've seen wasm_bindgen(start) annotation but that requires STD in my understanding.
@nagisa , do you have any ideas how to address this?

nagisa · 2023-11-22T18:49:59Z

There’s a (start) declaration in wasm modules, but it is a little different in concept to the _start entry in native code. WASM’s start is more like the life-before-main code that is used to initialize various data at runtime.

What you will need to do in this case is to look at the (export "main") declarations and decide which function becomes zkASM’s start from there.

aborg-dev · 2023-11-23T12:34:35Z

There’s a (start) declaration in wasm modules, but it is a little different in concept to the _start entry in native code. WASM’s start is more like the life-before-main code that is used to initialize various data at runtime.

What you will need to do in this case is to look at the (export "main") declarations and decide which function becomes zkASM’s start from there.

Makes sense, we can always do this as a part of .zkasm file generation as long as we keep some convention that is easy to follow programmatically.

MCJOHN974 · 2023-11-23T14:52:32Z

I think for first benchmarks it will be easier to use only 64-bit operations, it will be much easier to implement. i32 now have a problem with jumps, and maybe will contain more. Maybe it is even worth it to write benchmark in .wat, since we can run it natively as well?

MCJOHN974 · 2023-11-23T15:08:59Z

Also, on which opcodes should we focus now? From list above shifts and rotations are still not implemented, should we start work on them or chose another benchmark? Should we better ship 32-bit computations ASAP?

aborg-dev · 2023-11-23T18:13:44Z

Also, on which opcodes should we focus now? From list above shifts and rotations are still not implemented, should we start work on them or chose another benchmark? Should we better ship 32-bit computations ASAP?

I think we should start with this Fibonacci benchmark in #90 (comment). It only uses basic opcodes with i32 and i64 and would be a good stepping stone for a more complex ECDSA benchmark. There is no arguing that it would be easier to use i64 for counter in this specific case, but we will very soon need basic i32 operations anyway for other benchmarks, so I think we should include this in Stage 1.

To sum up, for the first Fibonacci benchmark generated from Rust we need:

i64.const, i64.add
i32.const, i32.add
loop, br_if

aborg-dev · 2023-11-24T11:48:50Z

Here is SHA256 benchmark with no_std approach + using existing crate sha2: https://github.com/akashin/bench/blob/9cf8fc4c25e685f8b842db5ad2e57f9df2b0b125/src/main.rs

It results in 5.5k lines of WAT, with a few new instructions on top of the set for Fibonacci:

i32.sub, i32.lt_u, i32.xor, i32.or, i32.and, i32.rotl
i64.shl, i64.or, i64.and, i64.shr_u, i64.extend_i32_u
i32.load offset=1048615 align=1, i32.store8
i64.store align=1, i64.store offset=8, i64.load offset=1048600

This seems like a sizable enough target to me to declare as a cutoff for the instruction set for Stage 1 as it touches most of the integer operations and also involves memory loads and stores.

I'll repeat the same experiment with hand-written SHA256 to see if this helps trim the program size and look at a similar Keccac benchmark for which we already have dedicated instruction in ZK ASM.

aborg-dev · 2023-11-24T12:05:43Z

Handwritten SHA256 benchmark: https://github.com/akashin/bench/blob/21264dfaad1e0b4788e19bb25b5a0314bc879eb9/src/main.rs and associated WAT: https://gist.github.com/akashin/9804635dc299f7fdc727eaefeb94f1df

Very similar results to using sha2, which is a very good sign, meaning that using no_std crates does not impose that much of an overhead.
It results in ~3k WAT and all the same WASM instructions.

It would actually be really interesting how the actual number of ZK ASM counters differs between these two programs, given that SHA2 is supposed to be more "optimized", even though it has more lines in WAT.

This PR changes how run-tests.zkasm.js communicates results to the zkasm-result.py. Before, we used to parse log lines, now we pass a json object with a test result for each of the tests. This will allow storing more detailed information like performance counters for #90 and error messages.

aborg-dev · 2023-12-01T12:13:08Z

Here is the final version of sha256 that I'm importing:

#![no_main]
#![no_std]

use panic_halt as _;

use sha2::{Sha256, Digest};

extern "C" {
    fn assert_eq(actual: u64, expected: u64);
}
// fn assert_eq(actual: u64, expected: u64) {
//     assert_eq!(actual, expected);
// }

fn subslice_to_u64(slice: &[u8], begin: usize, end: usize) -> u64 {
    let mut result = 0u64;
    for i in begin..end {
        result <<= 8;
        result += slice[i] as u64;
    }
    result
}

#[no_mangle]
pub fn main() {
    let mut hasher = Sha256::new();
    hasher.update(b"hello world");
    let result = hasher.finalize();


    unsafe {
        assert_eq(subslice_to_u64(&result, 0, 8), 13352372148217134600);
        assert_eq(subslice_to_u64(&result, 8, 16), 11902541952223915002);
        assert_eq(subslice_to_u64(&result, 16, 24), 14160706888648589550);
        assert_eq(subslice_to_u64(&result, 24, 32), 10414846460208074217);
    }
}

The hashes were generated with

import hashlib

def subslice_to_u64(s, begin, end):
    result = 0
    for i in range(begin, end):
        result <<= 8;
        result += s[i]
    return result

s = b"hello world"
h = hashlib.sha256(s)
d = h.digest()

print(subslice_to_u64(d, 0, 8))
print(subslice_to_u64(d, 8, 16))
print(subslice_to_u64(d, 16, 24))
print(subslice_to_u64(d, 24, 32))

aborg-dev · 2023-12-05T10:30:19Z

Now that SHA256 is submitted (#136) I think we can declare this as done.
The future work for making SHA256 run correctly will be tracked in #143

mooori · 2024-03-18T16:41:51Z

@Akashin Is the Cargo.toml that was used when compiling the sha256 snippet above publicly available? I'm trying to reproduce sha256/from_rust.wat based on this repo and the commands contained in its Makefile. However I end up with different WAT code and that WAT compiled to zkASM triggers an error in zkevm-proverjs.

aborg-dev · 2024-03-19T10:07:57Z

@Akashin Is the Cargo.toml that was used when compiling the sha256 snippet above publicly available? I'm trying to reproduce sha256/from_rust.wat based on this repo and the commands contained in its Makefile. However I end up with different WAT code and that WAT compiled to zkASM triggers an error in zkevm-proverjs.

Yes, you can find it here: https://github.com/akashin/bench/tree/sha256_final
The WAT can be built by running prepare.sh and adding a start section in the resulting WAT file that points to main function.

aborg-dev added this to the ZK WASM: Stage 1 milestone Nov 17, 2023

aborg-dev self-assigned this Nov 17, 2023

aborg-dev mentioned this issue Nov 17, 2023

Support WASM instruction set for Stage 1 #22

Closed

nagisa mentioned this issue Nov 27, 2023

zkasm: implement memory access operations and memory sections #110

Open

4 tasks

aborg-dev mentioned this issue Nov 27, 2023

Unify ZK ASM test infrastructure #112

Merged

aborg-dev mentioned this issue Nov 27, 2023

Pipe test results using JSON #113

Merged

This was referenced Nov 30, 2023

[ProjectTracking]: ZK WASM Stage 1, WASM Spec Test near/near-one-project-tracking#2

Closed

Add Fibonacci benchmark #128

Merged

aborg-dev closed this as completed Dec 5, 2023

aborg-dev mentioned this issue Dec 11, 2023

[Tracking issue] Infrastructure for Stage 2 #159

Closed

mooori mentioned this issue Dec 15, 2023

Efficiency of WASM from Rust->WASM may depend on num loop iterations #162

Open

aborg-dev mentioned this issue Jan 2, 2024

[Tracking issue] Benchmarking infrastructure for Stage 2 #173

Open

aborg-dev mentioned this issue Feb 22, 2024

Submit benchmarks written in Rust to the codebase #228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce performance benchmarks for Stage 1 ZK WASM implementation #90

Introduce performance benchmarks for Stage 1 ZK WASM implementation #90

aborg-dev commented Nov 17, 2023

aborg-dev commented Nov 20, 2023

aborg-dev commented Nov 21, 2023 •

edited

Loading

aborg-dev commented Nov 22, 2023 •

edited

Loading

nagisa commented Nov 22, 2023

aborg-dev commented Nov 23, 2023

MCJOHN974 commented Nov 23, 2023 •

edited

Loading

MCJOHN974 commented Nov 23, 2023

aborg-dev commented Nov 23, 2023

aborg-dev commented Nov 24, 2023 •

edited

Loading

aborg-dev commented Nov 24, 2023

aborg-dev commented Dec 1, 2023

aborg-dev commented Dec 5, 2023

mooori commented Mar 18, 2024

aborg-dev commented Mar 19, 2024

Introduce performance benchmarks for Stage 1 ZK WASM implementation #90

Introduce performance benchmarks for Stage 1 ZK WASM implementation #90

Comments

aborg-dev commented Nov 17, 2023

aborg-dev commented Nov 20, 2023

aborg-dev commented Nov 21, 2023 • edited Loading

aborg-dev commented Nov 22, 2023 • edited Loading

nagisa commented Nov 22, 2023

aborg-dev commented Nov 23, 2023

MCJOHN974 commented Nov 23, 2023 • edited Loading

MCJOHN974 commented Nov 23, 2023

aborg-dev commented Nov 23, 2023

aborg-dev commented Nov 24, 2023 • edited Loading

aborg-dev commented Nov 24, 2023

aborg-dev commented Dec 1, 2023

aborg-dev commented Dec 5, 2023

mooori commented Mar 18, 2024

aborg-dev commented Mar 19, 2024

aborg-dev commented Nov 21, 2023 •

edited

Loading

aborg-dev commented Nov 22, 2023 •

edited

Loading

MCJOHN974 commented Nov 23, 2023 •

edited

Loading

aborg-dev commented Nov 24, 2023 •

edited

Loading