-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Initial SimpleSerialize (SSZ) spec #18
Changes from 3 commits
4c75cd0
b1c873c
0b0f618
6287573
78a830d
8521bd9
cd71c22
a2ad4bf
0325263
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,230 @@ | ||
# SimpleSerialize (SSZ) Spec | ||
|
||
***Work In Progress*** | ||
|
||
This is the work in progress document to describe `simpleserialize`, the | ||
current selected serialization method for Ethereum 2.0 using the Beacon Chain. | ||
|
||
This document specifies the general information for serializing and | ||
deserializing objects and data types. | ||
|
||
## ToC | ||
|
||
* [About](#about) | ||
* [Terminology](#terminology) | ||
* [Constants](#constants) | ||
* [Overview](#overview) | ||
+ [Serialize/Encode](#serializeencode) | ||
- [uint: 8/16/24/32/64/256](#uint-816243264256) | ||
- [Address](#address) | ||
- [Hash32](#hash32) | ||
- [Bytes](#bytes) | ||
- [List](#list) | ||
+ [Deserialize/Decode](#deserializedecode) | ||
- [uint: 8/16/24/32/64/256](#uint-816243264256-1) | ||
- [Address](#address-1) | ||
- [Hash32](#hash32-1) | ||
- [Bytes](#bytes-1) | ||
- [List](#list-1) | ||
* [Implementations](#implementations) | ||
|
||
## About | ||
|
||
`SimpleSerialize` was first proposed by Vitalik Buterin as the serializaiton | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! Completely missed the misspelling 😄 Thanks! |
||
protocol for use in the Ethereum 2.0 Beacon Chain. | ||
|
||
The core feature of `ssz` is the simplicity of the serialization with low | ||
overhead. | ||
|
||
## Terminology | ||
|
||
| Term | Definition | | ||
|:-------------|:-----------------------------------------------------------------------------------------------| | ||
| `big` | Big Endian | | ||
| `byte_order` | Specifies [endianness:](https://en.wikipedia.org/wiki/Endianness) Big Endian or Little Endian. | | ||
| `len` | Length/Number of Bytes. | | ||
| `to_bytes` | Convert to bytes. Should take parameters ``size`` and ``byte_order``. | | ||
| `from_bytes` | Convert form bytes to object. Should take ``bytes`` and ``byte_order``. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo: "form" -> "from" |
||
| `value` | The value to serialize. | | ||
| `rawbytes` | Raw serialized bytes. | | ||
|
||
## Constants | ||
|
||
| Constant | Value | Definition | | ||
|:---------------|:-----:|:------------------------------------------------------------------------| | ||
| `LENGTH_BYTES` | 4 | Number of bytes used for the length added before the serialized object. | | ||
NatoliChris marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe say 'before a variable-length serialized object'. To be clear at this point that not all items need a length prefix |
||
|
||
|
||
## Overview | ||
|
||
### Serialize/Encode | ||
|
||
#### uint: 8/16/24/32/64/256 | ||
|
||
Convert directly to bytes the size of the int. (e.g. ``uint16 = 2 bytes``) | ||
|
||
All integers are serialized as **big endian**. | ||
|
||
| Check to perform | Code | | ||
|:---------------------------------|:------------------------| | ||
| Size is a byte integer | ``int_size % 8 == 0`` | | ||
| Value is less than max | ``2**int_size > value`` | | ||
|
||
```python | ||
buffer_size = int_size / 8 | ||
NatoliChris marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return value.to_bytes(buffer_size, 'big') | ||
``` | ||
|
||
#### Address | ||
|
||
The address should already come as a hash/byte format. Ensure that length is | ||
**20**. | ||
|
||
| Check to perform | Code | | ||
|:-----------------------|:---------------------| | ||
| Length is correct (20) | ``len(value) == 20`` | | ||
|
||
```python | ||
assert( len(value) == 20 ) | ||
return value | ||
``` | ||
|
||
#### Hash32 | ||
|
||
The hash32 should already be a 32 byte length serialized data format. The safety | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hash32 works for Keccak/Blake2b[0 ..< 32] not for BLS There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, great point! Did we address in point above? I think we should have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think like |
||
check ensures the 32 byte length is satisfied. | ||
|
||
| Check to perform | Code | | ||
|:-----------------------|:---------------------| | ||
| Length is correct (32) | ``len(value) == 32`` | | ||
|
||
```python | ||
assert( len(value) == 32 ) | ||
return value | ||
``` | ||
|
||
#### Bytes | ||
|
||
For general `byte` type: | ||
1. Get the length/number of bytes; Encode into a 4 byte integer. | ||
2. Append the value to the length and return: ``[ length_bytes ] + [ | ||
value_bytes ]`` | ||
|
||
| Check to perform | Code | | ||
|:-------------------------------------|:-----------------------| | ||
| Length of bytes can fit into 4 bytes | ``len(value) < 2**32`` | | ||
|
||
```python | ||
byte_length = (len(value)).to_bytes(4, 'big') | ||
return byte_length + value | ||
``` | ||
|
||
#### List | ||
|
||
NatoliChris marked this conversation as resolved.
Show resolved
Hide resolved
|
||
For lists of values, get the length of the list and then serialize the value | ||
of each item in the list: | ||
1. For each item in list: | ||
1. serialize. | ||
2. append to string. | ||
2. Get size of serialized string. Encode into a 4 byte integer. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part is Python specific. This should read as #### List/Vectors
1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`.
Encode that as a 4-byte bigEndian uint32
2. Append your elements in a packed manner.
* Note on efficiency: consider using a container that does not need to iterate over all elements to get its length. For example Python lists, C++ vectors or Rust Vec. Then your Python implementation notes. The perf notes are to address the problem highlighted by @AlexeyAkhunov, if the container does keep track of its length (for example LinkedLists) prefixing the length is inefficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! This is great. I think I'll add yours specifically, since it's more generic and should cater to larger of an audience! Thanks for pointing me to the notes, I'll make sure to run through them and give a reference! 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can refer to this post: #2 (comment)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @mratsim! 👍 I'll hold this PR open until we get more clarification/acceptance? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitely don't want to block it, just thought it will be nice to have more eyes on it, like @mratsim. 👍👀 It looks good for me to merge it, as a clear document to be audited and to be disucussed. |
||
|
||
```python | ||
serialized_list_string = '' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For python bytes, use |
||
|
||
for item in value: | ||
serialized_list_string += serialize(item) | ||
|
||
serialized_len = len(serialized_list_string) | ||
|
||
return serialized_len + serialized_list_string | ||
``` | ||
|
||
### Deserialize/Decode | ||
|
||
The decoding requires knowledge of the type of the item to be decoded. When | ||
performing decoding on an entire serialized string, it also requires knowledge | ||
of what order the objects have been serialized in. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "of the order in which the objects have been serialized" |
||
|
||
Note: Each return will provide ``deserialized_object, new_index`` keeping track | ||
of the new index. | ||
|
||
At each step, the following checks should be made: | ||
|
||
| Check Type | Check | | ||
|:-------------------------|:----------------------------------------------------------| | ||
| Ensure sufficient length | ``length(rawbytes) > current_index + deserialize_length`` | | ||
|
||
#### uint: 8/16/24/32/64/256 | ||
|
||
Convert directly from bytes into integer utilising the number of bytes the same | ||
size as the integer length. (e.g. ``uint16 == 2 bytes``) | ||
|
||
All integers are interpreted as **big endian**. | ||
|
||
```python | ||
byte_length = int_size / 8 | ||
new_index = current_index + int_size | ||
return int.from_bytes(rawbytes[current_index:current_index+int_size], 'big'), new_index | ||
``` | ||
|
||
#### Address | ||
|
||
Return the 20 bytes. | ||
|
||
```python | ||
new_index = current_index + 20 | ||
return rawbytes[current_index:current_index+20], new_index | ||
``` | ||
|
||
#### Hash32 | ||
|
||
Return the 32 bytes. | ||
|
||
```python | ||
new_index = current_index + 32 | ||
return rawbytes[current_index:current_index+32], new_index | ||
``` | ||
|
||
#### Bytes | ||
|
||
Get the length of the bytes, return the bytes. | ||
|
||
```python | ||
bytes_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big') | ||
new_index = current_index + 4 + bytes_lenth | ||
return rawbytes[current_index+4:current_index+4+bytes_length], new_index | ||
``` | ||
|
||
#### List | ||
|
||
Deserialize each object in the list. | ||
1. Get the length of the serialized list. | ||
2. Loop through deserializing each item in the list until you reach the | ||
entire length of the list. | ||
|
||
|
||
| Check type | code | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use "Check to perform" as in previous tables, and add the assertion in the sample code |
||
|:------------------------------------|:--------------------------------------| | ||
| rawbytes has enough left for length | ``len(rawbytes) > current_index + 4`` | | ||
|
||
```python | ||
total_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big') | ||
new_index = current_index + 4 + total_length | ||
item_index = current_index + 4 | ||
deserialized_list = [] | ||
|
||
while item_index < new_index: | ||
object, item_index = deserialize(rawbytes, item_index, item_type) | ||
deserialized_list.append(object) | ||
|
||
return deserialized_list, new_index | ||
``` | ||
|
||
## Implementations | ||
|
||
| Language | Implementation | Description | | ||
|:--------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------| | ||
| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py) | Beacon chain reference implementation written in Python. | | ||
| Rust | [ https://github.com/sigp/lighthouse/tree/master/ssz ](https://github.com/sigp/lighthouse/tree/master/ssz) | Lighthouse (Rust Ethereum 2.0 Node) maintained SSZ. | | ||
| Nim | [ https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim ](https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim) | Nim Implementation maintained SSZ. | | ||
| Rust | [ https://github.com/paritytech/shasper/tree/master/util/ssz ](https://github.com/paritytech/shasper/tree/master/util/ssz) | Shasper implementation of SSZ maintained by ParityTech. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With BLS we will have
Hash96
for the public keys maybeHash97
if it comes with a recovery bit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mratsim interesting point! I'll leave this open for discussion if we should tentatively add it in or leave it until we have more clarity on the point?
I'll draft it up and maybe add it into a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a simple
Hash32 and other hash sizes
is probably enough at the moment