Phase 1: block headers inside or outside of data committed to in the shard data root? #338

vbuterin · 2018-12-19T12:27:03Z

I see two paths for exactly what data the shard data roots that are attested to in crosslinks commit to. Note that each choice also influences the choice of what clients will need to check to gain assurance of validity and availability.

Proofs of custody on data roots only

Suppose the last crosslink for shard i was of a block at slot N, and the next crosslink is of a block at slot N+k. The crosslink commits to a merkle root of the data roots of each block in the range [N ... N+k-1], where skipped slots are deemed as containing empty data.

To verify a chain, a client must:

Verify the beacon chain
Verify the header chains of every shard and check that they match what's in the crosslinks
If desired, perform data availability checks on the erasure coded data corresponding to the data included in the crosslink

The main weakness of the scheme is higher overhead for clients wishing to fully verify: you need to be a light client on every shard, so that's ~400 bytes * 1024 shards / 6 seconds = 67 kB/sec.

Proofs of custody on shard chains

Instead of one 16-kb data root, each shard block has 5 data roots for 15.5 kb of data, leaving the other 512 bytes empty. You can then calculate the "real data root" of a block as being the Merkle root of the 512 byte space including the header concatenated with the 15.5 kb of the data (the header itself and the 5 data roots suffice for this).

The crosslink commits to a merkle root of these data roots.

To fully verify a chain, a client must now only verify the beacon chain, and do data availability checks on the erasure coded data of the crosslink roots if desired. Verifying the validity of the shard chain can now be done entirely with slashing conditions, where you can declare a data root invalid by providing a Merkle branch into a block, verifying the block in-beacon-chain and showing that the block is invalid (or providing two adjacent blocks to show that their hashes mismatch).

The main weaknesses of the scheme are:

It slightly mixes checking availability and checking state transition validity, as it's checking validity of the headers
It requires a somewhat more awkward data root construction

The text was updated successfully, but these errors were encountered:

djrtwo · 2018-12-19T16:54:11Z

I'm a bit confused about the construction of the 5 data roots.
Given 15.5kb of block data, do we create 5 distinct roots but placing the 512 empty bytes at different locations in the data hash tree?

vbuterin · 2018-12-19T18:29:39Z

Let [M0 .... M31] be the full Merkle tree and R[i, j] be the sub-root of M[i:j] (eg. R[0, 32] is the root of the whole tree). The block header would include M1, R[2, 4], R[4, 8], R[8, 16], R[16, 32], and then you would calculate the tree via R[0, 32] = hash(hash(hash(hash(hash(hash(block_header, M1), R[2, 4]), R[4, 8]), R[8, 16]), R[16, 32]). So the block header effectively contains the Merkle branch that lets you compute the full data root of the block once you drop in the header.

vbuterin · 2018-12-22T00:23:05Z

Notes from a discussion with @JustinDrake:

We could also not store any information about the headers, because it's all either redundant info that could be recalculated or data that could be put into the data root, except for the signatures, which can be BLS aggregated into a single 96-byte sig.
Alternatively, the header of block N could be put into the body of block N+1
Instead of doing a single PoC (and erasure code) of unknown length, we can do it on a per-epoch basis, maintaining in the shard blocks an accumulator that tracks all historical super-data-roots for each epoch
Another approach is to build up two separate super-data-roots during each epoch: one for tx data, and one for headers + intermediate state roots + eventually possibly a STARK or something similar, then you can just combine together the roots at the end. This will lead to erasure coding a root with ~1.9x more data than necessary initially (because the size of the data root is much bigger) but eh whatever; FFT erasure coding is cheap

See #338 and #529 for discussion.

vbuterin added the general:RFC Request for Comments label Dec 19, 2018

hwwhww added the phase1 label Dec 23, 2018

vbuterin added a commit that referenced this issue Feb 7, 2019

Updated phase 1: commitments

97e6c2c

See #338 and #529 for discussion.

vbuterin mentioned this issue Feb 7, 2019

Updated phase 1: commitments #579

Merged

JustinDrake pushed a commit that referenced this issue Feb 10, 2019

Updated phase 1: commitments (#579)

1f97206

See #338 and #529 for discussion.

vbuterin closed this as completed Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1: block headers inside or outside of data committed to in the shard data root? #338

Phase 1: block headers inside or outside of data committed to in the shard data root? #338

vbuterin commented Dec 19, 2018

djrtwo commented Dec 19, 2018

vbuterin commented Dec 19, 2018 •

edited

vbuterin commented Dec 22, 2018

Phase 1: block headers inside or outside of data committed to in the shard data root? #338

Phase 1: block headers inside or outside of data committed to in the shard data root? #338

Comments

vbuterin commented Dec 19, 2018

Proofs of custody on data roots only

Proofs of custody on shard chains

djrtwo commented Dec 19, 2018

vbuterin commented Dec 19, 2018 • edited

vbuterin commented Dec 22, 2018

vbuterin commented Dec 19, 2018 •

edited