Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSZ TLDR #679

Closed
JustinDrake opened this issue Feb 24, 2019 · 5 comments
Closed

SSZ TLDR #679

JustinDrake opened this issue Feb 24, 2019 · 5 comments
Labels
general:presentation Presentation (as opposed to content)

Comments

@JustinDrake
Copy link
Collaborator

JustinDrake commented Feb 24, 2019

Below is all the SSZ logic needed for the phase 0 spec. We define types ("basic" and "composite"), serialisation of basic types, and hash_tree_root. (The TLDR below incorporates tuples—see #665—and a chunk size reduction to 32 bytes.)

Basic types

  • uint64 (8-byte little-endian serialisation)
  • byte (1-byte serialisation)
  • bool (1-byte serialisation to 0x00 or 0x01)

Composite types

  • Container (inhomogeneous, fixed sized)
  • Tuple (homogenous, fixed size)
  • List (homogeneous, variable size)

Helper functions

  • pack: Given ordered objects of the same basic type, serialise them, pack them into 32-byte chunks, right-pad the last chunk with zero bytes, and return the chunks.
  • merkleize: Given ordered 32-byte chunks, right-pad them with zero chunks to the closest power of two, Merkleize the chunks, and return the root.
  • mix_in_length: Given a Merkle root r and a length l (32-byte little-endian serialisation) return hash(r + l).

hash_tree_root

Let o be an object. We recursively define hash_tree_root(o) as:

  • merkleize(pack(o)) if o is a basic object or a tuple of basic objects
  • mix_in_length(merkleize(pack(o)), len(o)) if o is a list of basic objects
  • merkleize([hash_tree_root(element) for element in o]) if o is a tuple of composite objects or a container
  • mix_in_length(merkleize([hash_tree_root(element) for element in o]), len(o)) if o is a list of composite objects
@JustinDrake JustinDrake added the general:presentation Presentation (as opposed to content) label Feb 24, 2019
@vbuterin
Copy link
Contributor

vbuterin commented Feb 25, 2019

I recall the 32 -> 128 change only reducing hashing time by ~1/3, so reducing back to 32 seems reasonable. I actually think that if we decide re-hashing the (max 32 MB) validator balances array every epoch is too much, we can eke out a better efficiency/ugliness tradeoff by constricting validator balances from 8 bytes to 4 (or even storing the lower 4 bytes in the balances array and the upper 4 bytes in the registry; doing so would actually be a significant boon for light clients [1/3 Merkle proof size decrease] and may even be worth just doing now!)

@JustinDrake
Copy link
Collaborator Author

or even storing the lower 4 bytes in the balances array and the upper 4 bytes in the registry

I think we can get away with just 2 bytes in the balances array. The reason is that we can round rewards and penalties (applied at the end of the every epoch) to the closest 2^-16 ETH (about $0.002). We can pick the rounding mechanism so that the expected drift over the long term is 0 (in particular, avoid always rounding up or always rounding down).

@vbuterin
Copy link
Contributor

vbuterin commented Feb 25, 2019

Summarizing my concerns about that:

In order to have the room to have different reward sizes (eg. the 5 components of the base reward, the delay penalty, the inactivity penalties....) the average per-epoch reward probably needs to be >100x larger than the minimum increment. Hence, a 2-byte fractional balance buffer would get overwritten in <650 epochs, and so on average ~1/650 of the validator registry tree would get updated every epoch. Multiply by ~15 re-hashes in the Merkle branches, and we get ~1/40 of the tree being changed every epoch counting Merkle branches. That feels like a very high load, given the larger size of the registry tree, possibly enough to cancel out the gains from cutting 4 bytes -> 2 bytes (or at least definitely reducing the gains to the point where it's not clear that the gains override the "keep things simple and don't change too much stuff at this point" principle; this WILL require changing things as eg. balances would have to either not be calculated in gwei anymore, or be calculated and added in a batched form)

Additionally, I was hoping that the non-fractional component of the balance could be directly reusable as the balance for fork-choice-rule purposes, which needs to update very infrequently because in many LMD GHOST implementations each balance update requires an expensive O(T) propagation through a saved-metadata tree; making it update once every <650 epochs per validator could have quite high costs.

JustinDrake added a commit that referenced this issue Feb 27, 2019
* Implement tuples and a chunk-size reduction to 32 bytes (see #665 and #679)
* Simplify presentation where appropriate. For example, deserialisation is implicit from serialisation (similar to `bls_sign` being implicit from `bls_verify`) and is left as an implementation exercise.
* Dramatically reduce spec size and hopefully improve readability.
@protolambda
Copy link
Collaborator

I'm trying to implement it, but have some struggles, the tuple type seems strange:

* `Tuple` (homogenous, fixed size)

Or in your work-in-progress spec (taken from https://github.com/ethereum/eth2.0-specs/blob/JustinDrake-patch-4/specs/simple-serialize.md):

tuple: ordered fixed-length homogeneous collection of values

And here you consider tuples of composite objects (implicit: variable size items):

  • merkleize([hash_tree_root(element) for element in o]) if o is a tuple of composite objects or a container

Yet, you encode all types of tuples in a packed way (which is mostly good btw, it makes a fixed-length byte array encoding the same as that of a uintN, due to it being little endian). But packed = expectancy of item access by offset = i * size_of(my_arr[0]), which doesn't hold for a tuple of variable-length items.

Maybe you were you meaning to say list? I.e.:

merkleize([hash_tree_root(element) for element in o]) if o is a list of composite objects or a container

Also, if so, does that mean that the encoding would not consider tuples of variable-length items to be valid? Can you make this explicit? (change tuple definition and add an encoding condition?)

Alternatively, can you make it explicit that offset = i * size_of(my_arr[0]) is not possible for encoded tuples?

@JustinDrake
Copy link
Collaborator Author

Implemented in f93e6fe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:presentation Presentation (as opposed to content)
Projects
None yet
Development

No branches or pull requests

3 participants