Tree hash functions for SSZ #54

vbuterin · 2018-10-05T15:19:47Z

The following is a general-purpose strategy for making all data structures in the beacon chain more light client friendly. When (i) hashing the beacon chain active state, (ii) hashing the beacon chain crystallized state, or (iii) hashing beacon chain blocks, we instead use the following hash function specific to SSZ objects, where hash(x) is some underlying hash function with a 32-byte output (eg. blake(x)[0:32])

def hash_ssz_object(obj):
    if isinstance(obj, list):
        objhashes = [hash_ssz_object(o) for o in obj]
        return merkle_root(objhashes)
    elif not isinstance(obj, SSZObject):
        return hash(obj)
    else:
        o = b''
        for f in obj.fields:
            val = getattr(obj, f)
            o += hash_ssz_object(val)
        return hash(o)

Where merkle_root is defined as follows:

def merkle_root(objs):
    min_pow_of_2 = 1
    while min_pow_of_2 <= len(objs):
        min_pow_of_2 *= 2
    o = [0] * min_pow_of_2 + [len(objs).to_bytes(32, 'big')] + objs + [b'\x00'*32] * (min_pow_of_2 - len(objs))
    for i in range(min_pow_of_2 - 1, 0, -1):
        o[i] = hash(o[i*2] + o[i*2+1])
    return o[1]

Collision resistance is only guaranteed between objects of the same type, not objects of different types.

Efficiency

Fundamentally, Merkle-hashing instead of regular hashing doubles the amount of data hashes, but because hash functions have fixed costs the overhead is higher. Here are some simulation results, using 111-byte objects for accounts because this is currently roughly the size of a beacon chain ValidatorRecord object:

>>> import blake2b
>>> def hash(x): blake2b(x).digest()[:32]
>>> import time
>>> accounts = [b'\x35' * 111 for _ in range (1000000)]
>>> a = time.time(); x = hash(b''.join(accounts)); print(time.time() - a)
0.42771387100219727
>>> a = time.time(); x = merkle_root(accounts); print(time.time() - a)
1.2481215000152588

The text was updated successfully, but these errors were encountered:

paulhauner · 2018-10-11T14:32:10Z

Very minor, but I suspect return hash(p) should be return hash(o).

Update: I was expecting hash_ssz_object to return a byte array (hash digest) but I'm getting an array with a mix of ints and byte arrays. Should I be expecting to see a hash digest?

vbuterin · 2018-10-11T18:00:51Z

Fixed both! Merkle root accidentally returned the entire tree.

sorpaas · 2018-11-02T11:22:34Z

In hash_ssz_object, for structs, it looks like we hash each fields individually, combine them, and then hash again. Are there any reasons why we need those many hashes (which I don't see how they can be used for light clients, as they're not part of merkle trie)? If we combine fields first, it would save many hash rounds.

djrtwo · 2018-11-06T16:26:38Z

@vbuterin I'd like to get this merged soon and reduce state to one state root.

I believe @sorpaas is correct. I don't see the need to hash the SSZ base types in the following line

    elif not isinstance(obj, SSZObject):
        return hash(obj)

We can just return the raw obj data here.

mkalinin · 2018-11-09T10:47:48Z

Did an evaluation in Java.
There is a room for tiny improvement which addresses very rare case (when min_pow_of_2 == len(objs), original lines are commented out):

def merkle_root(objs):
    min_pow_of_2 = 1
    # while min_pow_of_2 <= len(objs):
    while min_pow_of_2 < len(objs):
        min_pow_of_2 *= 2
    # o = [0] * min_pow_of_2 + [len(objs).to_bytes(32, 'big')] + objs + [b'\x00'*32] * (min_pow_of_2 - len(objs))
    o = [len(objs).to_bytes(32, 'big')] + [0] * (min_pow_of_2 - 1) + objs + [b'\x00'*32] * (min_pow_of_2 - len(objs))
    for i in range(min_pow_of_2 - 1, 0, -1):
        o[i] = hash(o[i*2] + o[i*2+1])
    # return o[1]
    return o[0]

But original algorithm with @sorpaas's correction looks good to me either.

vbuterin · 2018-11-09T14:26:11Z

I made a modified algorithm that has some efficiency improvements:

#120

hwwhww added the general:enhancement New feature or request label Oct 6, 2018

vbuterin mentioned this issue Oct 9, 2018

Merkleise beacon chain blocks #40

Closed

terencechain mentioned this issue Nov 1, 2018

Implement Tree Hash Functions for Simple Serialize prysmaticlabs/prysm#716

Closed

sorpaas mentioned this issue Nov 2, 2018

Add ssz-hash crate for tree hash functions paritytech/shasper#21

Merged

paulhauner mentioned this issue Nov 4, 2018

Implement tree hashing function sigp/lighthouse#70

Closed

mkalinin mentioned this issue Nov 12, 2018

Added tree hashing algorithm #120

Merged

djrtwo closed this as completed Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree hash functions for SSZ #54

Tree hash functions for SSZ #54

vbuterin commented Oct 5, 2018 •

edited

paulhauner commented Oct 11, 2018 •

edited

vbuterin commented Oct 11, 2018

sorpaas commented Nov 2, 2018 •

edited

djrtwo commented Nov 6, 2018

mkalinin commented Nov 9, 2018

vbuterin commented Nov 9, 2018

Tree hash functions for SSZ #54

Tree hash functions for SSZ #54

Comments

vbuterin commented Oct 5, 2018 • edited

Efficiency

paulhauner commented Oct 11, 2018 • edited

vbuterin commented Oct 11, 2018

sorpaas commented Nov 2, 2018 • edited

djrtwo commented Nov 6, 2018

mkalinin commented Nov 9, 2018

vbuterin commented Nov 9, 2018

vbuterin commented Oct 5, 2018 •

edited

paulhauner commented Oct 11, 2018 •

edited

sorpaas commented Nov 2, 2018 •

edited