Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 1: block headers inside or outside of data committed to in the shard data root? #338

Closed
vbuterin opened this issue Dec 19, 2018 · 3 comments
Labels
general:RFC Request for Comments phase1

Comments

@vbuterin
Copy link
Contributor

I see two paths for exactly what data the shard data roots that are attested to in crosslinks commit to. Note that each choice also influences the choice of what clients will need to check to gain assurance of validity and availability.

Proofs of custody on data roots only

Suppose the last crosslink for shard i was of a block at slot N, and the next crosslink is of a block at slot N+k. The crosslink commits to a merkle root of the data roots of each block in the range [N ... N+k-1], where skipped slots are deemed as containing empty data.

To verify a chain, a client must:

  • Verify the beacon chain
  • Verify the header chains of every shard and check that they match what's in the crosslinks
  • If desired, perform data availability checks on the erasure coded data corresponding to the data included in the crosslink

The main weakness of the scheme is higher overhead for clients wishing to fully verify: you need to be a light client on every shard, so that's ~400 bytes * 1024 shards / 6 seconds = 67 kB/sec.

Proofs of custody on shard chains

Instead of one 16-kb data root, each shard block has 5 data roots for 15.5 kb of data, leaving the other 512 bytes empty. You can then calculate the "real data root" of a block as being the Merkle root of the 512 byte space including the header concatenated with the 15.5 kb of the data (the header itself and the 5 data roots suffice for this).

The crosslink commits to a merkle root of these data roots.

To fully verify a chain, a client must now only verify the beacon chain, and do data availability checks on the erasure coded data of the crosslink roots if desired. Verifying the validity of the shard chain can now be done entirely with slashing conditions, where you can declare a data root invalid by providing a Merkle branch into a block, verifying the block in-beacon-chain and showing that the block is invalid (or providing two adjacent blocks to show that their hashes mismatch).

The main weaknesses of the scheme are:

  • It slightly mixes checking availability and checking state transition validity, as it's checking validity of the headers
  • It requires a somewhat more awkward data root construction
@vbuterin vbuterin added the general:RFC Request for Comments label Dec 19, 2018
@djrtwo
Copy link
Contributor

djrtwo commented Dec 19, 2018

I'm a bit confused about the construction of the 5 data roots.
Given 15.5kb of block data, do we create 5 distinct roots but placing the 512 empty bytes at different locations in the data hash tree?

@vbuterin
Copy link
Contributor Author

vbuterin commented Dec 19, 2018

Let [M0 .... M31] be the full Merkle tree and R[i, j] be the sub-root of M[i:j] (eg. R[0, 32] is the root of the whole tree). The block header would include M1, R[2, 4], R[4, 8], R[8, 16], R[16, 32], and then you would calculate the tree via R[0, 32] = hash(hash(hash(hash(hash(hash(block_header, M1), R[2, 4]), R[4, 8]), R[8, 16]), R[16, 32]). So the block header effectively contains the Merkle branch that lets you compute the full data root of the block once you drop in the header.

@vbuterin
Copy link
Contributor Author

Notes from a discussion with @JustinDrake:

  • We could also not store any information about the headers, because it's all either redundant info that could be recalculated or data that could be put into the data root, except for the signatures, which can be BLS aggregated into a single 96-byte sig.
  • Alternatively, the header of block N could be put into the body of block N+1
  • Instead of doing a single PoC (and erasure code) of unknown length, we can do it on a per-epoch basis, maintaining in the shard blocks an accumulator that tracks all historical super-data-roots for each epoch
  • Another approach is to build up two separate super-data-roots during each epoch: one for tx data, and one for headers + intermediate state roots + eventually possibly a STARK or something similar, then you can just combine together the roots at the end. This will lead to erasure coding a root with ~1.9x more data than necessary initially (because the size of the data root is much bigger) but eh whatever; FFT erasure coding is cheap

@hwwhww hwwhww added the phase1 label Dec 23, 2018
vbuterin added a commit that referenced this issue Feb 7, 2019
See #338 and #529 for discussion.
JustinDrake pushed a commit that referenced this issue Feb 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:RFC Request for Comments phase1
Projects
None yet
Development

No branches or pull requests

3 participants