From 4c75cd0db29ec81ed685316c150f1cf49c03349c Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Mon, 24 Sep 2018 09:52:38 +1000 Subject: [PATCH 1/9] Initial SimpleSerialize spec --- specs/simpleserialize.md | 225 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 225 insertions(+) create mode 100644 specs/simpleserialize.md diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md new file mode 100644 index 0000000000..ef2bcf33f9 --- /dev/null +++ b/specs/simpleserialize.md @@ -0,0 +1,225 @@ +# SimpleSerialize (SSZ) Spec + +***Work In Progress*** + +This is the work in progress document to describe `simpleserialize`, the +current selected serialization method for Ethereum 2.0 using the Beacon Chain. + +This document specifies the general information for serializing and +deserializing objects and data types. + +## ToC + +* [About](#about) +* [Terminology](#terminology) +* [Constants](#constants) +* [Overview](#overview) + + [Serialize/Encode](#serializeencode) + - [int/uint: 8/16/24/32/64/256](#intuint-816243264256) + - [Address](#address) + - [Hash32](#hash32) + - [Bytes](#bytes) + - [List](#list) + + [Deserialize/Decode](#deserializedecode) + - [int/uint: 8/16/24/32/64/256](#intuint-816243264256-1) + - [Address](#address-1) + - [Hash32](#hash32-1) + - [Bytes](#bytes-1) + - [List](#list-1) +* [Implementations](#implementations) + +## About + +`SimpleSerialize` was first proposed by Vitalik Buterin as the serializaiton +protocol for use in the Ethereum 2.0 Beacon Chain. + +The core feature of `ssz` is the simplicity of the serialization with low +overhead. + +## Terminology + +| Term | Definition | +|:-------------|:-----------------------------------------------------------------------------------------------| +| `big` | Big Endian | +| `byte_order` | Specifies [endianness:](https://en.wikipedia.org/wiki/Endianness) Big Endian or Little Endian. | +| `len` | Length/Number of Bytes. | +| `to_bytes` | Convert to bytes. Should take parameters ``size`` and ``byte_order``. | +| `from_bytes` | Convert form bytes to object. Should take ``bytes`` and ``byte_order``. | +| `value` | The value to serialize. | +| `rawbytes` | Raw serialized bytes. | + +## Constants + +| Constant | Value | Definition | +|:---------------|:-----:|:------------------------------------------------------------------------| +| `LENGTH_BYTES` | 4 | Number of bytes used for the length added before the serialized object. | + + +## Overview + +### Serialize/Encode + +#### int/uint: 8/16/24/32/64/256 + +Convert directly to bytes the size of the int. (e.g. ``uint16 = 2 bytes``) + +All integers are serialized as **big endian**. + +| Check to perform | Code | +|:---------------------------------|:------------------------| +| Size is a byte integer | ``int_size % 8 == 0`` | +| Value is less than max | ``2**int_size > value`` | + +```python +buffer_size = int_size / 8 +return value.to_bytes(buffer_size, 'big') +``` + +#### Address + +The address should already come as a hash/byte format. Ensure that length is +**20**. + +| Check to perform | Code | +|:-----------------------|:---------------------| +| Length is correct (20) | ``len(value) == 20`` | + +```python +assert( len(value) == 20 ) +return value +``` + +#### Hash32 + +The hash32 should already be a 32 byte length serialized data format. The safety +check ensures the 32 byte length is satisfied. + +| Check to perform | Code | +|:-----------------------|:---------------------| +| Length is correct (32) | ``len(value) == 32`` | + +```python +assert( len(value) == 32 ) +return value +``` + +#### Bytes + +For general `byte` type: +1. Get the length/number of bytes; Encode into a 4 byte integer. +2. Append the value to the length and return: ``[ length_bytes ] + [ + value_bytes ]`` + +```python +byte_length = (len(value)).to_bytes(4, 'big') +return byte_length + value +``` + +#### List + +For lists of values, get the length of the list and then serialize the value +of each item in the list: +1. For each item in list: + 1. serialize. + 2. append to string. +2. Get size of serialized string. Encode into a 4 byte integer. + +```python +serialized_list_string = '' + +for item in value: + serialized_list_string += serialize(item) + +serialized_len = len(serialized_list_string) + +return serialized_len + serialized_list_string +``` + +### Deserialize/Decode + +The decoding requires knowledge of the type of the item to be decoded. When +performing decoding on an entire serialized string, it also requires knowledge +of what order the objects have been serialized in. + +Note: Each return will provide ``deserialized_object, new_index`` keeping track +of the new index. + +At each step, the following checks should be made: + +| Check Type | Check | +|:-------------------------|:----------------------------------------------------------| +| Ensure sufficient length | ``length(rawbytes) > current_index + deserialize_length`` | + +#### int/uint: 8/16/24/32/64/256 + +Convert directly from bytes into integer utilising the number of bytes the same +size as the integer length. (e.g. ``uint16 == 2 bytes``) + +All integers are interpreted as **big endian**. + +```python +byte_length = int_size / 8 +new_index = current_index + int_size +return int.from_bytes(rawbytes[current_index:current_index+int_size], 'big'), new_index +``` + +#### Address + +Return the 20 bytes. + +```python +new_index = current_index + 20 +return rawbytes[current_index:current_index+20], new_index +``` + +#### Hash32 + +Return the 32 bytes. + +```python +new_index = current_index + 32 +return rawbytes[current_index:current_index+32], new_index +``` + +#### Bytes + +Get the length of the bytes, return the bytes. + +```python +bytes_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big') +new_index = current_index + 4 + bytes_lenth +return rawbytes[current_index+4:current_index+4+bytes_length], new_index +``` + +#### List + +Deserailize each object in the list. +1. Get the length of the serialized list. +2. Loop through deseralizing each item in the list until you reach the +entire length of the list. + + +| Check type | code | +|:------------------------------------|:--------------------------------------| +| rawbytes has enough left for length | ``len(rawbytes) > current_index + 4`` | + +```python +total_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big') +new_index = current_index + 4 + total_length +item_index = current_index + 4 +deserialized_list = [] + +while item_index < new_index: + object, item_index = deserialize(rawbytes, item_index, item_type) + deserialized_list.append(object) + +return deserialized_list, new_index +``` + +## Implementations + +| Language | Implementation | Description | +|:--------:|--------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------| +| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py) | Beacon chain reference implementation written in Python. | +| Rust | [ https://github.com/sigp/lighthouse/tree/master/ssz ](https://github.com/sigp/lighthouse/tree/master/ssz) | Lighthouse (Rust Ethereum 2.0 Node) maintained SimpleSerialize. | + From b1c873c8f601dc53f2f01ef839cafc48063791a8 Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Tue, 2 Oct 2018 09:41:18 +1000 Subject: [PATCH 2/9] Remove int as per discussions, update implementations --- specs/simpleserialize.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index ef2bcf33f9..c6a4796be5 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -15,13 +15,13 @@ deserializing objects and data types. * [Constants](#constants) * [Overview](#overview) + [Serialize/Encode](#serializeencode) - - [int/uint: 8/16/24/32/64/256](#intuint-816243264256) + - [uint: 8/16/24/32/64/256](#uint-816243264256) - [Address](#address) - [Hash32](#hash32) - [Bytes](#bytes) - [List](#list) + [Deserialize/Decode](#deserializedecode) - - [int/uint: 8/16/24/32/64/256](#intuint-816243264256-1) + - [uint: 8/16/24/32/64/256](#uint-816243264256-1) - [Address](#address-1) - [Hash32](#hash32-1) - [Bytes](#bytes-1) @@ -59,7 +59,7 @@ overhead. ### Serialize/Encode -#### int/uint: 8/16/24/32/64/256 +#### uint: 8/16/24/32/64/256 Convert directly to bytes the size of the int. (e.g. ``uint16 = 2 bytes``) @@ -150,7 +150,7 @@ At each step, the following checks should be made: |:-------------------------|:----------------------------------------------------------| | Ensure sufficient length | ``length(rawbytes) > current_index + deserialize_length`` | -#### int/uint: 8/16/24/32/64/256 +#### uint: 8/16/24/32/64/256 Convert directly from bytes into integer utilising the number of bytes the same size as the integer length. (e.g. ``uint16 == 2 bytes``) @@ -193,9 +193,9 @@ return rawbytes[current_index+4:current_index+4+bytes_length], new_index #### List -Deserailize each object in the list. +Deserialize each object in the list. 1. Get the length of the serialized list. -2. Loop through deseralizing each item in the list until you reach the +2. Loop through deserializing each item in the list until you reach the entire length of the list. @@ -218,8 +218,9 @@ return deserialized_list, new_index ## Implementations -| Language | Implementation | Description | -|:--------:|--------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------| -| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py) | Beacon chain reference implementation written in Python. | -| Rust | [ https://github.com/sigp/lighthouse/tree/master/ssz ](https://github.com/sigp/lighthouse/tree/master/ssz) | Lighthouse (Rust Ethereum 2.0 Node) maintained SimpleSerialize. | - +| Language | Implementation | Description | +|:--------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------| +| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py) | Beacon chain reference implementation written in Python. | +| Rust | [ https://github.com/sigp/lighthouse/tree/master/ssz ](https://github.com/sigp/lighthouse/tree/master/ssz) | Lighthouse (Rust Ethereum 2.0 Node) maintained SimpleSerialize. | +| Nim | [ https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim ](https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim) | Nim Implemetnation maintained SimpleSerialize. | +| Rust | [ https://github.com/paritytech/shasper/tree/master/util/ssz ](https://github.com/paritytech/shasper/tree/master/util/ssz) | Shasper implementation of SSZ maintained by ParityTech. | From 0b0f618c61bd2790e7594816b1b2f69166a98056 Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Tue, 2 Oct 2018 10:36:58 +1000 Subject: [PATCH 3/9] Add check for byte serialization --- specs/simpleserialize.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index c6a4796be5..68d29d93d4 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -110,6 +110,10 @@ For general `byte` type: 2. Append the value to the length and return: ``[ length_bytes ] + [ value_bytes ]`` +| Check to perform | Code | +|:-------------------------------------|:-----------------------| +| Length of bytes can fit into 4 bytes | ``len(value) < 2**32`` | + ```python byte_length = (len(value)).to_bytes(4, 'big') return byte_length + value @@ -218,9 +222,9 @@ return deserialized_list, new_index ## Implementations -| Language | Implementation | Description | -|:--------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------| -| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py) | Beacon chain reference implementation written in Python. | -| Rust | [ https://github.com/sigp/lighthouse/tree/master/ssz ](https://github.com/sigp/lighthouse/tree/master/ssz) | Lighthouse (Rust Ethereum 2.0 Node) maintained SimpleSerialize. | -| Nim | [ https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim ](https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim) | Nim Implemetnation maintained SimpleSerialize. | -| Rust | [ https://github.com/paritytech/shasper/tree/master/util/ssz ](https://github.com/paritytech/shasper/tree/master/util/ssz) | Shasper implementation of SSZ maintained by ParityTech. | +| Language | Implementation | Description | +|:--------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------| +| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py) | Beacon chain reference implementation written in Python. | +| Rust | [ https://github.com/sigp/lighthouse/tree/master/ssz ](https://github.com/sigp/lighthouse/tree/master/ssz) | Lighthouse (Rust Ethereum 2.0 Node) maintained SSZ. | +| Nim | [ https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim ](https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim) | Nim Implementation maintained SSZ. | +| Rust | [ https://github.com/paritytech/shasper/tree/master/util/ssz ](https://github.com/paritytech/shasper/tree/master/util/ssz) | Shasper implementation of SSZ maintained by ParityTech. | From 6287573adc58c3b999d5c33900c49d605a65830a Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Tue, 2 Oct 2018 12:34:20 +1000 Subject: [PATCH 4/9] Update misspelling; Use `LENGTH_BYTES` variable; Update for comments --- specs/simpleserialize.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index 68d29d93d4..147f557964 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -30,7 +30,7 @@ deserializing objects and data types. ## About -`SimpleSerialize` was first proposed by Vitalik Buterin as the serializaiton +`SimpleSerialize` was first proposed by Vitalik Buterin as the serialization protocol for use in the Ethereum 2.0 Beacon Chain. The core feature of `ssz` is the simplicity of the serialization with low @@ -115,7 +115,7 @@ For general `byte` type: | Length of bytes can fit into 4 bytes | ``len(value) < 2**32`` | ```python -byte_length = (len(value)).to_bytes(4, 'big') +byte_length = (len(value)).to_bytes(LENGTH_BYTES, 'big') return byte_length + value ``` @@ -134,7 +134,7 @@ serialized_list_string = '' for item in value: serialized_list_string += serialize(item) -serialized_len = len(serialized_list_string) +serialized_len = (len(serialized_list_string).to_bytes(LENGTH_BYTES, 'big')) return serialized_len + serialized_list_string ``` @@ -190,9 +190,9 @@ return rawbytes[current_index:current_index+32], new_index Get the length of the bytes, return the bytes. ```python -bytes_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big') -new_index = current_index + 4 + bytes_lenth -return rawbytes[current_index+4:current_index+4+bytes_length], new_index +bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big') +new_index = current_index + LENGTH_BYTES + bytes_lenth +return rawbytes[current_index + LENGTH_BYTES:current_index+ LENGTH_BYTES +bytes_length], new_index ``` #### List @@ -208,9 +208,9 @@ entire length of the list. | rawbytes has enough left for length | ``len(rawbytes) > current_index + 4`` | ```python -total_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big') -new_index = current_index + 4 + total_length -item_index = current_index + 4 +total_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big') +new_index = current_index + LENGTH_BYTES + total_length +item_index = current_index + LENGTH_BYTES deserialized_list = [] while item_index < new_index: From 78a830da278cf9488f1278d6ad8dbdf65b145766 Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Tue, 2 Oct 2018 23:33:11 +1000 Subject: [PATCH 5/9] Update Hash Types as per @mratsim's comments on #18 --- specs/simpleserialize.md | 111 ++++++++++++++++++++++++++++++++------- 1 file changed, 91 insertions(+), 20 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index 147f557964..11c1843ed3 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -14,18 +14,24 @@ deserializing objects and data types. * [Terminology](#terminology) * [Constants](#constants) * [Overview](#overview) - + [Serialize/Encode](#serializeencode) - - [uint: 8/16/24/32/64/256](#uint-816243264256) - - [Address](#address) - - [Hash32](#hash32) - - [Bytes](#bytes) - - [List](#list) - + [Deserialize/Decode](#deserializedecode) - - [uint: 8/16/24/32/64/256](#uint-816243264256-1) - - [Address](#address-1) - - [Hash32](#hash32-1) - - [Bytes](#bytes-1) - - [List](#list-1) + + [Serialize/Encode](#serializeencode) + - [uint: 8/16/24/32/64/256](#uint-816243264256) + - [Address](#address) + - [Hash](#hash) + * [Hash32](#hash32) + * [Hash96](#hash96) + * [Hash97](#hash97) + - [Bytes](#bytes) + - [List](#list) + + [Deserialize/Decode](#deserializedecode) + - [uint: 8/16/24/32/64/256](#uint-816243264256-1) + - [Address](#address-1) + - [Hash](#hash-1) + * [Hash32](#hash32-1) + * [Hash96](#hash96-1) + * [Hash97](#hash97-1) + - [Bytes](#bytes-1) + - [List](#list-1) * [Implementations](#implementations) ## About @@ -89,17 +95,61 @@ assert( len(value) == 20 ) return value ``` -#### Hash32 +#### Hash -The hash32 should already be a 32 byte length serialized data format. The safety -check ensures the 32 byte length is satisfied. +| Hash Type | Usage | +|:---------:|:------------------------------------------------| +| `hash32` | Hash size of ``keccak`` or `blake2b[0.. < 32]`. | +| `hash96` | BLS Public Key Size. | +| `hash97` | BLS Public Key Size with recovery bit. | -| Check to perform | Code | -|:-----------------------|:---------------------| -| Length is correct (32) | ``len(value) == 32`` | + +| Checks to perform | Code | +|:-----------------------------------|:---------------------| +| Length is correct (32) if `hash32` | ``len(value) == 32`` | +| Length is correct (96) if `hash96` | ``len(value) == 96`` | +| Length is correct (97) if `hash97` | ``len(value) == 97`` | + + +**Example all together** + +```python +if (type(value) == 'hash32'): + assert(len(value) == 32) +elif (type(value) == 'hash96'): + assert(len(value) == 96) +elif (type(value) == 'hash97'): + assert(len(value) == 97) +else: + raise TypeError('Invalid hash type supplied') + +return value +``` + +##### Hash32 + +Ensure 32 byte length and return the bytes. + +```python +assert(len(value) == 32) +return value +``` + +##### Hash96 + +Ensure 96 byte length and return the bytes. ```python -assert( len(value) == 32 ) +assert(len(value) == 96) +return value +``` + +##### Hash97 + +Ensure 97 byte length and return the bytes. + +```python +assert(len(value) == 97) return value ``` @@ -176,7 +226,9 @@ new_index = current_index + 20 return rawbytes[current_index:current_index+20], new_index ``` -#### Hash32 +#### Hash + +##### Hash32 Return the 32 bytes. @@ -185,6 +237,25 @@ new_index = current_index + 32 return rawbytes[current_index:current_index+32], new_index ``` +##### Hash96 + +Return the 96 bytes. + +```python +new_index = current_index + 96 +return rawbytes[current_index:current_index+96], new_index +``` + +##### Hash97 + +Return the 97 bytes. + +```python +new_index = current_index + 97 +return rawbytes[current_index:current_index+97], new_index +``` + + #### Bytes Get the length of the bytes, return the bytes. From 8521bd93ade2f2e88029979515a9c036e9fb85fc Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Tue, 2 Oct 2018 23:42:25 +1000 Subject: [PATCH 6/9] Update List/Vectors with comments on #18 --- specs/simpleserialize.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index 11c1843ed3..238d7699a5 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -22,7 +22,7 @@ deserializing objects and data types. * [Hash96](#hash96) * [Hash97](#hash97) - [Bytes](#bytes) - - [List](#list) + - [List/Vectors](#listvectors) + [Deserialize/Decode](#deserializedecode) - [uint: 8/16/24/32/64/256](#uint-816243264256-1) - [Address](#address-1) @@ -31,7 +31,7 @@ deserializing objects and data types. * [Hash96](#hash96-1) * [Hash97](#hash97-1) - [Bytes](#bytes-1) - - [List](#list-1) + - [List/Vectors](#listvectors-1) * [Implementations](#implementations) ## About @@ -169,14 +169,15 @@ byte_length = (len(value)).to_bytes(LENGTH_BYTES, 'big') return byte_length + value ``` -#### List +#### List/Vectors -For lists of values, get the length of the list and then serialize the value -of each item in the list: -1. For each item in list: - 1. serialize. - 2. append to string. -2. Get size of serialized string. Encode into a 4 byte integer. +1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`. + * Encode that as a `4-byte` **big endian** `uint32`. +2. Append your elements in a packed manner. + +* *Note on efficiency*: consider using a container that does not need to iterate over all elements to get its length. For example Python lists, C++ vectors or Rust Vec. + +**Example in Python** ```python serialized_list_string = '' @@ -266,7 +267,7 @@ new_index = current_index + LENGTH_BYTES + bytes_lenth return rawbytes[current_index + LENGTH_BYTES:current_index+ LENGTH_BYTES +bytes_length], new_index ``` -#### List +#### List/Vectors Deserialize each object in the list. 1. Get the length of the serialized list. From cd71c223d1dfb63ae7293ab7c1275e146af397df Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Tue, 2 Oct 2018 23:46:22 +1000 Subject: [PATCH 7/9] Add "WIP" to title to make it clear; @djrtwo's comment in #18 --- specs/simpleserialize.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index 238d7699a5..12d0c8dd4e 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -1,4 +1,4 @@ -# SimpleSerialize (SSZ) Spec +# [WIP] SimpleSerialize (SSZ) Spec ***Work In Progress*** From a2ad4bf6d5916b37e53dd2e7e65cdbb199f333f3 Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Wed, 3 Oct 2018 08:17:29 +1000 Subject: [PATCH 8/9] Add assertions in examples; Update checks from @djrtwo's comments --- specs/simpleserialize.md | 71 ++++++++++++++++++++++++++++------------ 1 file changed, 50 insertions(+), 21 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index 12d0c8dd4e..bba02cec84 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -50,15 +50,15 @@ overhead. | `byte_order` | Specifies [endianness:](https://en.wikipedia.org/wiki/Endianness) Big Endian or Little Endian. | | `len` | Length/Number of Bytes. | | `to_bytes` | Convert to bytes. Should take parameters ``size`` and ``byte_order``. | -| `from_bytes` | Convert form bytes to object. Should take ``bytes`` and ``byte_order``. | +| `from_bytes` | Convert from bytes to object. Should take ``bytes`` and ``byte_order``. | | `value` | The value to serialize. | | `rawbytes` | Raw serialized bytes. | ## Constants -| Constant | Value | Definition | -|:---------------|:-----:|:------------------------------------------------------------------------| -| `LENGTH_BYTES` | 4 | Number of bytes used for the length added before the serialized object. | +| Constant | Value | Definition | +|:---------------|:-----:|:--------------------------------------------------------------------------------------| +| `LENGTH_BYTES` | 4 | Number of bytes used for the length added before a variable-length serialized object. | ## Overview @@ -71,12 +71,12 @@ Convert directly to bytes the size of the int. (e.g. ``uint16 = 2 bytes``) All integers are serialized as **big endian**. -| Check to perform | Code | -|:---------------------------------|:------------------------| -| Size is a byte integer | ``int_size % 8 == 0`` | -| Value is less than max | ``2**int_size > value`` | +| Check to perform | Code | +|:-----------------------|:----------------------| +| Size is a byte integer | ``int_size % 8 == 0`` | ```python +assert(int_size % 8 == 0) buffer_size = int_size / 8 return value.to_bytes(buffer_size, 'big') ``` @@ -156,7 +156,7 @@ return value #### Bytes For general `byte` type: -1. Get the length/number of bytes; Encode into a 4 byte integer. +1. Get the length/number of bytes; Encode into a `4-byte` integer. 2. Append the value to the length and return: ``[ length_bytes ] + [ value_bytes ]`` @@ -165,26 +165,35 @@ For general `byte` type: | Length of bytes can fit into 4 bytes | ``len(value) < 2**32`` | ```python +assert(len(value) < 2**32) byte_length = (len(value)).to_bytes(LENGTH_BYTES, 'big') return byte_length + value ``` #### List/Vectors -1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`. +| Check to perform | Code | +|:--------------------------------------------|:----------------------------| +| Length of serialized list fits into 4 bytes | ``len(serialized) < 2**32`` | + + +1. Get the number of raw bytes to serialize: it is ``len(list) * sizeof(element)``. * Encode that as a `4-byte` **big endian** `uint32`. -2. Append your elements in a packed manner. +2. Append the elements in a packed manner. * *Note on efficiency*: consider using a container that does not need to iterate over all elements to get its length. For example Python lists, C++ vectors or Rust Vec. **Example in Python** ```python -serialized_list_string = '' + +serialized_list_string = b'' for item in value: serialized_list_string += serialize(item) +assert(len(serialized_list_string) < 2**32) + serialized_len = (len(serialized_list_string).to_bytes(LENGTH_BYTES, 'big')) return serialized_len + serialized_list_string @@ -194,16 +203,16 @@ return serialized_len + serialized_list_string The decoding requires knowledge of the type of the item to be decoded. When performing decoding on an entire serialized string, it also requires knowledge -of what order the objects have been serialized in. +of the order in which the objects have been serialized. Note: Each return will provide ``deserialized_object, new_index`` keeping track of the new index. At each step, the following checks should be made: -| Check Type | Check | -|:-------------------------|:----------------------------------------------------------| -| Ensure sufficient length | ``length(rawbytes) > current_index + deserialize_length`` | +| Check to perform | Check | +|:-------------------------|:-----------------------------------------------------------| +| Ensure sufficient length | ``length(rawbytes) >= current_index + deserialize_length`` | #### uint: 8/16/24/32/64/256 @@ -213,6 +222,7 @@ size as the integer length. (e.g. ``uint16 == 2 bytes``) All integers are interpreted as **big endian**. ```python +assert(len(rawbytes) >= current_index + int_size) byte_length = int_size / 8 new_index = current_index + int_size return int.from_bytes(rawbytes[current_index:current_index+int_size], 'big'), new_index @@ -223,6 +233,7 @@ return int.from_bytes(rawbytes[current_index:current_index+int_size], 'big'), ne Return the 20 bytes. ```python +assert(len(rawbytes) >= current_index + 20) new_index = current_index + 20 return rawbytes[current_index:current_index+20], new_index ``` @@ -234,6 +245,7 @@ return rawbytes[current_index:current_index+20], new_index Return the 32 bytes. ```python +assert(len(rawbytes) >= current_index + 32) new_index = current_index + 32 return rawbytes[current_index:current_index+32], new_index ``` @@ -243,6 +255,7 @@ return rawbytes[current_index:current_index+32], new_index Return the 96 bytes. ```python +assert(len(rawbytes) >= current_index + 96) new_index = current_index + 96 return rawbytes[current_index:current_index+96], new_index ``` @@ -252,6 +265,7 @@ return rawbytes[current_index:current_index+96], new_index Return the 97 bytes. ```python +assert(len(rawbytes) >= current_index + 97) new_index = current_index + 97 return rawbytes[current_index:current_index+97], new_index ``` @@ -261,10 +275,22 @@ return rawbytes[current_index:current_index+97], new_index Get the length of the bytes, return the bytes. +| Check to perform | code | +|:--------------------------------------------------|:-------------------------------------------------| +| rawbytes has enough left for length | ``len(rawbytes) > current_index + LENGTH_BYTES`` | +| bytes to return not greater than serialized bytes | ``len(rawbytes) > bytes_end `` | + ```python +assert(len(rawbytes) > current_index + LENGTH_BYTES) bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big') -new_index = current_index + LENGTH_BYTES + bytes_lenth -return rawbytes[current_index + LENGTH_BYTES:current_index+ LENGTH_BYTES +bytes_length], new_index + +bytes_start = current_index + LENGTH_BYTES +bytes_end = bytes_start + bytes_length +new_index = bytes_end + +assert(len(rawbytes) >= bytes_end) + +return rawbytes[bytes_start:bytes_end], new_index ``` #### List/Vectors @@ -275,13 +301,16 @@ Deserialize each object in the list. entire length of the list. -| Check type | code | -|:------------------------------------|:--------------------------------------| -| rawbytes has enough left for length | ``len(rawbytes) > current_index + 4`` | +| Check to perform | code | +|:------------------------------------------|:----------------------------------------------------------------| +| rawbytes has enough left for length | ``len(rawbytes) > current_index + LENGTH_BYTES`` | +| list is not greater than serialized bytes | ``len(rawbytes) > current_index + LENGTH_BYTES + total_length`` | ```python +assert(len(rawbytes) > current_index + LENGTH_BYTES) total_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big') new_index = current_index + LENGTH_BYTES + total_length +assert(len(rawbytes) >= new_index) item_index = current_index + LENGTH_BYTES deserialized_list = [] From 03252637cbfc045fd82c30f1dc1fb56b8a9d9ceb Mon Sep 17 00:00:00 2001 From: NatoliChris Date: Wed, 3 Oct 2018 15:08:20 +1000 Subject: [PATCH 9/9] Add container todo stubs --- specs/simpleserialize.md | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/specs/simpleserialize.md b/specs/simpleserialize.md index bba02cec84..82b8a5e9a7 100644 --- a/specs/simpleserialize.md +++ b/specs/simpleserialize.md @@ -1,8 +1,6 @@ # [WIP] SimpleSerialize (SSZ) Spec -***Work In Progress*** - -This is the work in progress document to describe `simpleserialize`, the +This is the **work in progress** document to describe `simpleserialize`, the current selected serialization method for Ethereum 2.0 using the Beacon Chain. This document specifies the general information for serializing and @@ -23,6 +21,7 @@ deserializing objects and data types. * [Hash97](#hash97) - [Bytes](#bytes) - [List/Vectors](#listvectors) + - [Container (TODO)](#container) + [Deserialize/Decode](#deserializedecode) - [uint: 8/16/24/32/64/256](#uint-816243264256-1) - [Address](#address-1) @@ -32,6 +31,7 @@ deserializing objects and data types. * [Hash97](#hash97-1) - [Bytes](#bytes-1) - [List/Vectors](#listvectors-1) + - [Container (TODO)](#container-1) * [Implementations](#implementations) ## About @@ -199,6 +199,15 @@ serialized_len = (len(serialized_list_string).to_bytes(LENGTH_BYTES, 'big')) return serialized_len + serialized_list_string ``` +#### Container + +``` +######################################## + TODO +######################################## +``` + + ### Deserialize/Decode The decoding requires knowledge of the type of the item to be decoded. When @@ -321,6 +330,14 @@ while item_index < new_index: return deserialized_list, new_index ``` +#### Container + +``` +######################################## + TODO +######################################## +``` + ## Implementations | Language | Implementation | Description |