Proof of independent execution – batch validation application #180

xdralex · 2018-09-18T12:50:40Z

In our network batch validators are going to verify computations performed by the real-time nodes and other batch validators. As it was shown earlier, this approach might be prone to the verifier's dilemma where a validator has to make a choice between actually performing the validation or skipping it, stating correctness and receiving a reward. In this case, a validator might simply agree with whatever computation result which was produced before.

Different approaches were proposed to mitigate this issue. One of them is the forced errors approach introduced by TrueBit. It makes the original solver to make unpredictable mistakes in the effort to catch the validator if it confirms every computation without verification. Instead of a single computation result two results are returned – the original and the fake one. The validator doesn't know which one is actually in effect, which forces it to perform the computation independently.

However, it's unclear whether we can use this approach for the Fluence network because it seems difficult to hide from the batch validator which result is the original one. The reason is that in the Fluence network real-time nodes return results to the client as they are computed, and then store them in Swarm for the further validation. Without digging further into whether this does or doesn't prevent the use of forced errors, it might be fruitful to take a look at another approach named "Proof of Independent Execution" which was originally described in Ethereum Research by JustinDrake.

Here I'll try to add a bit of interpretation to the topic as it was discussed with @michailvoronov and @folex, throw in few ideas and describe it's potential application in our environment. The credit for the original idea should go to the JustinDrake / ethresearch community – mostly I've just expanded the moments that were not so easy for me to understand.

Introduction

Basic idea

Let's say we have a source code which might be as simple as:

00  int i = 0;
01  while (i < 10) {
02    i++;
03  }

This code could get translated into the following (pseudo-)WASM program:

00  block $0
01    loop $1
02      get_local $0
03      i32.const 9
04      i32.gt_s
05      br_if $0        ;; jumps to #12 if i > 9 – exits the loop
06      get_local $0
07      i32.const 1
08      i32.add
09      set_local $0    
10      br $1           ;; unconditionally jumps to #02 – repeats the loop iteration
11    end label $1
12  end label $0

Before starting the program execution an executor E generates a secret salt and computes a secret S = hash(salt || address_E). Now it can launch the program execution which will essentially be a stream of instructions computed by the CPU. Let's call the stream an execution trace Tr. This trace Tr can be split into chunks, each chunk hashed with S as a salt and obtained hashes combined into the Merkle root MR.

;; ...
;; omitted the beginning of the execution until i == 9


;; <chunk k> ==> H_k = hash(S || chunk_k)
;;
02    get_local $0    ;; i == 9
03    i32.const 9
04    i32.gt_s


;; <chunk k + 1> ==> H_k+1 = hash(S || chunk_k+1) 
;;
05    br_if $0        ;; don't exit the loop yet
06    get_local $0
07    i32.const 1


;; <chunk k + 2> ==> H_k+2 = hash(S || chunk_k+2) 
;;
08    i32.add
09    set_local $0    ;; i := 10
10    br $1


;; <chunk k + 3> ==> ...
;;
02    get_local $0
03    i32.const 9
04    i32.gt_s


;; <chunk k + 4> ==> ...
;;
05    br_if $0        ;; exit the loop

To compute the hash of the executed instructions chunk, we can treat it as the data, for example:

get_local $0
i32.const 9
i32.gt_s

after converting it into WASM binary representation is equivalent to:

0x20 0x00
0x41 0x09
0x4A

in which case the chunk hash would be calculated like H_k = hash(S || 0x200041094A). Now, we can merkelize the obtained hashes and compute the Merkle root: MR = merkelize(H_0, ..., H_n) where n is the number of chunks in Tr.

Streaming Merkle root computation

Note it's possible to compute the Merkle root in a streaming fashion without the need to store the entire execution trace which might be gigantic for certain computations. To do that, we will need to store not more than ceil(log2(n)) intermediate Merkle tree hashes.

Here is in example of an incomplete Merkle tree built for 7 executed trace chunks. Note that we need to store only three intermediate hashes instead of the entire tree:

one second level hash (H2:0-3)
one first level hash (H1:4-5)
one chunk hash (H_6)

chunk_0 chunk_1    chunk_2 chunk_3       chunk_4 chunk_5    chunk_6
   |       |          |       |             |       |          |
  H_0     H_1        H_2     H_3           H_4     H_5      >>H_6<<
    \      /           \      /             \       /
     H1:0-1             H1:2-3              >>H1:4-5<<
           \          /
            >>H2:0-3<<

Once eighth chunk appears, it's hash H_7 can be combined with H_6 producing first level hash H1:6_7. Now, H1:6_7 can be combined with H1:4_5 producing second level hash H2:4-7, which in it's turn in combination with H2:0-3 will produce the Merkle root H3:0-7.

chunk_0 chunk_1    chunk_2 chunk_3       chunk_4 chunk_5    chunk_6 chunk_7
   |       |          |       |             |       |          |       |
  H_0     H_1        H_2     H_3           H_4     H_5        H_6     H_7
    \      /           \      /             \       /          \       /
     H1:0-1             H1:2-3                H1:4-5             H1:6-7
           \          /                             \          /
              H2:0-3                                   H2:4-7
                    \                                 /
                     \                               /
                       -------- >>H3:0-7<< ---------

Computational overhead

There is a potential issue in the described approach – and that's how much effort it will take to execute the program (cost_execute_trace) compared to the effort to hash the execution trace (cost_hash_trace). For a really simple, non-cryptographically strong hash function it seems these costs are approximately equal: cost_execute_trace ≈ cost_hash_trace.

For example, let's use XOR as a "hash function". To execute a single instruction, it has to be loaded from memory (or processor cache) into the CPU. After that, the instruction will update few registers / memory cells. To use this instruction in the XOR "hash", it will also have to be loaded from memory, and then XOR computation will update few registers / memory cells. This means that even a simple XOR "hash" requires as much effort as executing an instruction.

If we need to use a cryptographically strong hash function, it seems that cost_execute_chunk << cost_hash_chunk because of the numerous hashing rounds. Leaving the question whether we actually need a cryptographically strong hash function open, it means that we won't be able to hash the entire execution trace and keep a reasonable processing performance.

Instead, it seems we need to sample the execution trace and hash only the sampled chunks, in which case we can achieve cost_hash_trace < cost_execute_trace even while using complex hash functions. But before we go further into how the trace sampling could work, let's take a look how the proposed approach deals with the verifier's dilemma.

Validation

Let's imagine the executor A has already performed the computation and produced the execution Merkle root MR_A. Now, the executor B is going to repeat the computation and produce another Merkle root MR_B. Because MR_A depends on the secret S_A = hash(salt_A || address_A) and MR_B depends on the secret S_B = hash(salt_B || address_B), we can expect that no matter what salt B chooses, it won't be able to make MR_B equal to MR_A.

This means that B has to generate the computation Merkle root itself, and it can do so either by repeating the computation from scratch or by fetching execution trace chunks from somewhere (allegedly, the executor A). If we want B to always perform the computation independently, one way would be to make chunks sharing between different machines economically infeasible.

However, there are two possible obstacles to achieve that:

if the execution trace sampling is really sparse it could be easy for colluding machines to exchange chunks because of the small download size
if two executors share the same physical hardware there won't be too much transfer overhead

To get back to the validation, B doesn't only compute MR_B, it also computes MR_A and checks it's equal to the correspondent Merkle root computed by A. However, if computing MR_A has too much overhead compared to the actual program execution, B might opt to just agree with whatever Merkle root the executor A has produced. This means we are still facing the verifier's dilemma, which we could fight by reducing the overhead to compute the Merkle root to the minimum using an aggressive sampling.

Execution trace sampling

Hash modulo approach

We could use the following construction to sample the execution trace:

each chunk has a predefined constant size p
the gap between the previous chunk and the next chunk is selected by computing a H_k % m, where H_k is the hash of the previous chunk, and m is the predefined constant

In this case the sample trace will look like (note the variable gap size):

;; p = 3, m = 5

;; chunk_0 ==> H_0 % 5 = 1
;;
op #01
op #02
op #03

;; <gap> - size 1
op #04

;; chunk_1 ==> H_1 % 5 == 4
;;
op #05
op #06
op #07

;; <gap> - size 4
op #08
op #09
op #10
op #11

;; chunk_2 ==> ...
;;
op #12
op #13
op #14

Assuming a uniform distribution of H_k % m, an expected reduction of the number of chunk hashes to compute will be 1 + m / 2p.

Sampling takeover attack

Intuitively we also don't want an executor to have a control over or be able to predict which chunks will be selected for sampling without actually executing the program. Let's consider the following trivial program:

00  i++
01  goto 00

If we execute it with sampling constants p = 2 and m = 4, and if hash(S || "i++" || "goto 01") % 4 == 2 then this loop will unroll into:

;;  chunk_0 ==> H_0 % 4 = 2
;;
00  i++
01  goto 00


;;  <gap> - size 2
;;
00  i++
01  goto 00


;;  chunk_1 ==> H_1 % 4 = 2
;;
00  i++
01  goto 00


;;  <gap> - size 2
;;
00  i++
01  goto 00


;;  chunk_2 ==> H_2 % 4 = 2
;;
00  i++
01  goto 00

...

Note that chunk_0, chunk_1 and chunk_2 are all identical and will have the same hashes. Remember that an executor has a control over salt – and hence some control over secret S = hash(salt || address_E). So, for certain simple enough programs and specific sampling constants there could exist a salt that would allow to produce the Merkle root without program execution. Such salt could be found by an executor through the trial and error method.

Sampling randomization

It seems there are few potential ways to combat this issue.

One is to make secret S to not only depend on salt, but also on some externally chosen random prefix: S = hash(rnd_prefix || salt || address_E).

Another – to have each chunk hash to be computed using the hash of the previous chunk and the (Merkle) hash of the memory state at the end of the previous chunk execution: chunk_hash_k+1 = hash(S || chunk_hash_k || memory_k || chunk_k+1). Because for non-trivial fully optimized programs it's impossible to predict the memory content at the end of execution without actually executing the program (~halting problem), an executor won't be able to predict or control the sampling.

Execution trace fetching attack mitigation

As it was already described in § Validation, two colluding executors A and B might agree that only A performs the execution, and B fetches required execution trace chunks from A. Because with sampling only a tiny fraction of chunks is required to produce the Merkle root, this practice makes sense economically.

To make it economically infeasible, we could require that the computation performed by B is always launched after the computation performed by A. To do so, we could require the secret S_B to be computed as S_B = hash(salt_B || address_B | MR_A). This has a nice property that during the execution of the program, A doesn't know which chunks B would need. So, to be able to serve chunk requests, A would need to store the entire execution trace.

Now, it looks like for non-trivial programs the cost of storing the execution trace (cost_store_trace) is more than the cost of executing it from scratch (cost_execute_trace).

The upper estimate to execute the program is ||Tr|| * cost_execute_slowest_instruction where ||Tr|| is the number of instructions in the trace. The lower estimate to store the trace is ||Tr|| * cost_store_smallest_instruction * time_execute_trace / 2 – assuming the first instruction in the trace will need to be stored for the entire time the program is running, and the last instruction won't need to be stored at all.

Assuming cost_execute_slowest_instruction < cost_store_smallest_instruction * time_execute_trace / 2 for non-trivial programs, we can derive cost_execute_trace < cost_store_trace. This means we can protect against trace chunks fetching attack, but only if we require sequential programs execution by different executors.

Open questions

It's unclear how to prevent collusion if multiple executors are running the computation at the same time. One of them might do the actual computation and serve the chunks to the rest. With an executing trace sampling it might be economically reasonable for executors.
It's unclear how to actually force the validator B to verify the Merkle root produced by A. By sampling the execution we are reducing an overhead to compute MR_A which means it's not a big deal for B to also check it. However, nothing actually forces B to run the check except possibly losing a security deposit if the computation turns out to be incorrect. But to guarantee this we might need to introduce forced errors into the system which is what we were trying to avoid all the way...

Application in the Fluence network

Haven't thought about this deep enough yet, but I think we could use the Proof of Independent Execution in the following way in our network:

Real-time nodes of the same cluster could share the same (public) salt and secret and produce the same execution trace Merkle root. This seems different from the originally proposed idea where an untimely secret publication would be a reason for slashing the executor's deposit. However, having the same salt would allow the real-time cluster to reach consensus on the execution trace Merkle root.
Batch validation of the history written by the real-time nodes will be always sequential as it's described in § Execution trace fetching attack mitigation. First, this would prevent batch validators from fetching trace chunks from the real-time nodes. Second, for the batch validation it seems we don't really need the parallelism anyways – time-sensitive stuff happens in the real-time cluster.

The text was updated successfully, but these errors were encountered:

mikevoronov · 2018-09-19T07:57:05Z

What about making a smth like global state for salt on cluster level and each next salt used by executor must depend on the previous one? This approach could be a solution to the first open question.

xdralex · 2018-09-25T17:58:03Z

@michailvoronov but if I'm getting you right this global state (and the next salt) will still be accessible by any node in the cluster which kind of moots the point...

I'm not sure if the first open problem would actually hurt the real-time cluster: n nodes in it mostly mean a bigger security deposit rather than actual Byzantine Fault Tolerance – n will normally be around 4-7 anyways.

xdralex added the research label Sep 18, 2018

xdralex changed the title ~~Proof of independent execution – batch validation application~~ Proof of independent execution – application to the batch validation Sep 18, 2018

xdralex changed the title ~~Proof of independent execution – application to the batch validation~~ Proof of independent execution – batch validation application Sep 27, 2018

folex mentioned this issue Sep 28, 2018

TrueBit forced errors approach – batch validation application #199

Closed

xdralex added this to the batch-validation milestone Feb 14, 2019

alari closed this as completed Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of independent execution – batch validation application #180

Proof of independent execution – batch validation application #180

xdralex commented Sep 18, 2018 •

edited

Loading

mikevoronov commented Sep 19, 2018 •

edited

Loading

xdralex commented Sep 25, 2018

Proof of independent execution – batch validation application #180

Proof of independent execution – batch validation application #180

Comments

xdralex commented Sep 18, 2018 • edited Loading

Introduction

Basic idea

Streaming Merkle root computation

Computational overhead

Validation

Execution trace sampling

Hash modulo approach

Sampling takeover attack

Sampling randomization

Execution trace fetching attack mitigation

Open questions

Application in the Fluence network

mikevoronov commented Sep 19, 2018 • edited Loading

xdralex commented Sep 25, 2018

xdralex commented Sep 18, 2018 •

edited

Loading

mikevoronov commented Sep 19, 2018 •

edited

Loading