[WIP] Initial SimpleSerialize (SSZ) spec #18

NatoliChris · 2018-10-02T00:55:42Z

Initial specification outline for SSZ for #2.

Basic information about serializing types.
Basic information about deserializing types.
Checks to be performed for sanity.
Links to implementations.

TODO:

Ensure all implementations are listed.

There has been some discussion about ssz so if you notice any discrepancies would be great to have any input for changes 😄.

Also, if I've forgotten an implementation feel free to update the implementation list. There are some eth2.0 implementations that do not use ssz at this point in time (that I could see), but if I couldn't see it please make sure to add them.

Any help is appreciated, especially in terms of readability and making it easier to understand. We appreciate your time!

NatoliChris · 2018-10-02T01:52:11Z

I think we should also add the test format once it has been finalised.

Thoughts? 😄

djrtwo · 2018-10-02T02:24:54Z

Thank you! Will review in the morning.

I've been debating about where to put the test specifications. We need to provide the generic format that our tests formats will conform to, and then we need a spec for each test format.

I was thinking about making a dir specs/tests and have specs/tests/format.md and then specs/tests/ssz.md, etc, etc.

The argument for this would be to not clutter up the specs with test format info.
The argument against would be that maybe it's nice to have everything in the same place :)

I'll add the generic test format doc tomorrow and we can discuss from there.

hwwhww · 2018-10-02T02:20:27Z

specs/simpleserialize.md

+
+## About
+
+`SimpleSerialize` was first proposed by Vitalik Buterin as the serializaiton


typo: serializaiton -> serialization

Good catch! Completely missed the misspelling 😄 Thanks!

specs/simpleserialize.md

NatoliChris · 2018-10-02T02:35:29Z

Thanks @hwwhww 😄

mratsim

Nice work!

One thing that is missing are "container types" serialization, for example ActiveState

# ActiveState
fields = {
    # Attestations that have not yet been processed
    'pending_attestations': [AttestationRecord],
    # Most recent 2 * CYCLE_LENGTH block hashes, older to newer
    'recent_block_hashes': ['hash32']
}

mratsim · 2018-10-02T10:17:51Z

specs/simpleserialize.md

+ + [Serialize/Encode](#serializeencode)
+   - [uint: 8/16/24/32/64/256](#uint-816243264256)
+   - [Address](#address)
+   - [Hash32](#hash32)


With BLS we will have Hash96 for the public keys maybe Hash97 if it comes with a recovery bit.

@mratsim interesting point! I'll leave this open for discussion if we should tentatively add it in or leave it until we have more clarity on the point?

I'll draft it up and maybe add it into a PR?

a simple Hash32 and other hash sizes is probably enough at the moment

mratsim · 2018-10-02T10:19:24Z

specs/simpleserialize.md

+
+#### Hash32
+
+The hash32 should already be a 32 byte length serialized data format. The safety


hash32 works for Keccak/Blake2b[0 ..< 32] not for BLS

Yep, great point! Did we address in point above?

I think we should have the Hash97 and Hash96 separately or address similar to the way uint is serialized/deserialized?
Thoughts?

I think like uint

mratsim · 2018-10-02T10:27:35Z

specs/simpleserialize.md

+1. For each item in list:
+   1. serialize.
+   2. append to string.
+2. Get size of serialized string. Encode into a 4 byte integer.


This part is Python specific. This should read as

#### List/Vectors 1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`. Encode that as a 4-byte bigEndian uint32 2. Append your elements in a packed manner. * Note on efficiency: consider using a container that does not need to iterate over all elements to get its length. For example Python lists, C++ vectors or Rust Vec.

Then your Python implementation notes.

The perf notes are to address the problem highlighted by @AlexeyAkhunov, if the container does keep track of its length (for example LinkedLists) prefixing the length is inefficient.

Thanks! This is great.

I think I'll add yours specifically, since it's more generic and should cater to larger of an audience!

Thanks for pointing me to the notes, I'll make sure to run through them and give a reference! 😄

You can refer to this post: #2 (comment)

On length prefix vs length suffix

ethereum/beacon_chain#94 and ethereum/eth2.0-pm#8

The argument for moving at the end is because otherwise you would have to iterate on the data structure once to get the length then once again to get feed the data. ==> All languages even Haskell have a vector implementation that tracks length so you wouldn't need the first iteration to get length.

Furthermore putting length at the end means you need specific token to indicate end of data like ‘\0’ in cstring. This is causing a lot of issue in C code also this prevents reading by chunk into a buffer, you would have to read token by token (or read by chunk and backtrack).

Thanks @mratsim! 👍

I'll hold this PR open until we get more clarification/acceptance?

I think it's up to @djrtwo and @hwwhww, either we have all the discussions happening in this single PR or we kick the ball running by merging it and let the issues come 😄.

I definitely don't want to block it, just thought it will be nice to have more eyes on it, like @mratsim. 👍👀 It looks good for me to merge it, as a clear document to be audited and to be disucussed.

NatoliChris · 2018-10-02T11:34:41Z

Thanks @mratsim for the excellent comments!
I'll provide the updates shortly! 😄

djrtwo · 2018-10-02T13:44:00Z

I'm going to give it a look this morning and merge. Just need to make sure some sort of "work in progress" signal is at the top of the doc

NatoliChris · 2018-10-02T13:55:01Z

Added Updates

Changed title to include explicit WIP status.
Updated Hash types to include hash96 and hash97 for BLS.
- Note: It may be a point of discussion how this should be implemented. At this moment we went for simplicity and assumed it will be similar to the uint serialization technique.
Updated List to List/Vector and added @mratsim's suggestions.

There are currently 2 items that are to be discussed:

Container Serialization (@mratsim's comment):

Nice work!
One thing that is missing are "container types" serialization, for example ActiveState

# ActiveState
fields = {
    # Attestations that have not yet been processed
   'pending_attestations': [AttestationRecord],
    # Most recent 2 * CYCLE_LENGTH block hashes, older to newer
    'recent_block_hashes': ['hash32']
}

I feel like this should be an open discussion around the appropriate methods of serializing a container. @AlexeyAkhunov may also have input.

List/Vector Serialization

@mratsim's comment:

The perf notes are to address the problem highlighted by @AlexeyAkhunov, if the container does keep > track of its length (for example LinkedLists) prefixing the length is inefficient.

You can refer to this post: #2 (comment)

On length prefix vs length suffix

ethereum/beacon_chain#94 and ethereum/eth2.0-pm#8

The argument for moving at the end is because otherwise you would have to iterate on the data structure once to get the length then once again to get feed the data. ==> All languages even Haskell have a vector implementation that tracks length so you wouldn't need the first iteration to get length.

Furthermore putting length at the end means you need specific token to indicate end of data like ‘\0’ in cstring. This is causing a lot of issue in C code also this prevents reading by chunk into a buffer, you would have to read token by token (or read by chunk and backtrack).

It might be appropriate like @hwwhww suggested to make this a separate discussion.

Thanks for all the input 😄

djrtwo

Looks good! 👍
A handful of small edits for consistency/clarity

djrtwo · 2018-10-02T17:14:10Z

specs/simpleserialize.md

+| `byte_order` | Specifies [endianness:](https://en.wikipedia.org/wiki/Endianness) Big Endian or Little Endian. |
+| `len`        | Length/Number of Bytes.                                                                        |
+| `to_bytes`   | Convert to bytes. Should take parameters ``size`` and ``byte_order``.                          |
+| `from_bytes` | Convert form bytes to object. Should take ``bytes`` and ``byte_order``.                        |


Typo: "form" -> "from"

djrtwo · 2018-10-02T17:16:17Z

specs/simpleserialize.md

+
+| Constant       | Value | Definition                                                              |
+|:---------------|:-----:|:------------------------------------------------------------------------|
+| `LENGTH_BYTES` |   4   | Number of bytes used for the length added before the serialized object. |


maybe say 'before a variable-length serialized object'. To be clear at this point that not all items need a length prefix

specs/simpleserialize.md

djrtwo · 2018-10-02T17:25:02Z

specs/simpleserialize.md

+
+1. Get the number of raw bytes to serialize: it is `len(list) * sizeof(element)`.
+   * Encode that as a `4-byte` **big endian** `uint32`.
+2. Append your elements in a packed manner.


change "your" to "the"

specs/simpleserialize.md

djrtwo · 2018-10-02T17:29:00Z

specs/simpleserialize.md

+
+The decoding requires knowledge of the type of the item to be decoded. When
+performing decoding on an entire serialized string, it also requires knowledge
+of what order the objects have been serialized in.


"of the order in which the objects have been serialized"

djrtwo · 2018-10-02T17:32:01Z

specs/simpleserialize.md

+
+```python
+bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big')
+new_index = current_index + LENGTH_BYTES + bytes_lenth


typo: bytes_lenth -> bytes_length

djrtwo · 2018-10-02T17:33:40Z

specs/simpleserialize.md

+```python
+bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big')
+new_index = current_index + LENGTH_BYTES + bytes_lenth
+return rawbytes[current_index + LENGTH_BYTES:current_index+ LENGTH_BYTES +bytes_length], new_index


consider using two helper vars defined before the return statement for the indices -- bytes_start & bytes_end

djrtwo · 2018-10-02T17:35:54Z

specs/simpleserialize.md

+entire length of the list.
+
+
+| Check type                          | code                                  |


Use "Check to perform" as in previous tables, and add the assertion in the sample code

NatoliChris · 2018-10-02T22:20:32Z

Thanks @djrtwo! All applied 😄. Thanks for the excellent comments!

djrtwo · 2018-10-03T01:44:28Z

@NatoliChris Let's add a "TODO section/note" in the document for "container types" to make it clear it's missing.

After that, I'll merge this

NatoliChris · 2018-10-03T05:10:14Z

Done 😄
Thanks again @djrtwo !

djrtwo · 2018-10-03T13:34:48Z

Excellent. Thanks!

* Simplify Makefile * Add CircleCI configuration

NatoliChris added 3 commits October 2, 2018 09:43

Initial SimpleSerialize spec

4c75cd0

Remove int as per discussions, update implementations

b1c873c

Add check for byte serialization

0b0f618

hwwhww reviewed Oct 2, 2018

View reviewed changes

Update misspelling; Use LENGTH_BYTES variable; Update for comments

6287573

hwwhww added the general:RFC Request for Comments label Oct 2, 2018

mratsim reviewed Oct 2, 2018

View reviewed changes

NatoliChris changed the title ~~Initial SimpleSerialize (SSZ) spec~~ [WIP] Initial SimpleSerialize (SSZ) spec Oct 2, 2018

Update Hash Types as per @mratsim's comments on ethereum#18

78a830d

hwwhww approved these changes Oct 2, 2018

View reviewed changes

Update List/Vectors with comments on ethereum#18

8521bd9

Add "WIP" to title to make it clear; @djrtwo's comment in ethereum#18

cd71c22

mratsim approved these changes Oct 2, 2018

View reviewed changes

djrtwo reviewed Oct 2, 2018

View reviewed changes

Add assertions in examples; Update checks from @djrtwo's comments

a2ad4bf

NatoliChris force-pushed the master branch from aa3988f to a2ad4bf Compare October 2, 2018 22:21

Add container todo stubs

0325263

djrtwo merged commit 66e316d into ethereum:master Oct 3, 2018

hwwhww pushed a commit that referenced this pull request Jul 23, 2020

Add CircleCI configuration (#18)

823e8a5

* Simplify Makefile * Add CircleCI configuration

hwwhww pushed a commit that referenced this pull request Jul 23, 2020

Add CircleCI configuration (#18)

beca1e5

* Simplify Makefile * Add CircleCI configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Initial SimpleSerialize (SSZ) spec #18

[WIP] Initial SimpleSerialize (SSZ) spec #18

NatoliChris commented Oct 2, 2018

NatoliChris commented Oct 2, 2018

djrtwo commented Oct 2, 2018

hwwhww Oct 2, 2018

NatoliChris Oct 2, 2018

NatoliChris commented Oct 2, 2018

mratsim left a comment

mratsim Oct 2, 2018

NatoliChris Oct 2, 2018

mratsim Oct 2, 2018 •

edited

Loading

mratsim Oct 2, 2018

NatoliChris Oct 2, 2018

mratsim Oct 2, 2018

mratsim Oct 2, 2018 •

edited

Loading

NatoliChris Oct 2, 2018

mratsim Oct 2, 2018

NatoliChris Oct 2, 2018

mratsim Oct 2, 2018

hwwhww Oct 2, 2018

NatoliChris commented Oct 2, 2018

djrtwo commented Oct 2, 2018

NatoliChris commented Oct 2, 2018

On length prefix vs length suffix

djrtwo left a comment

djrtwo Oct 2, 2018

djrtwo Oct 2, 2018

djrtwo Oct 2, 2018

djrtwo Oct 2, 2018

djrtwo Oct 2, 2018

djrtwo Oct 2, 2018

djrtwo Oct 2, 2018

NatoliChris commented Oct 2, 2018

djrtwo commented Oct 3, 2018

NatoliChris commented Oct 3, 2018

djrtwo commented Oct 3, 2018


		## About

		`SimpleSerialize` was first proposed by Vitalik Buterin as the serializaiton


		#### Hash32

		The hash32 should already be a 32 byte length serialized data format. The safety

[WIP] Initial SimpleSerialize (SSZ) spec #18

[WIP] Initial SimpleSerialize (SSZ) spec #18

Conversation

NatoliChris commented Oct 2, 2018

NatoliChris commented Oct 2, 2018

djrtwo commented Oct 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NatoliChris commented Oct 2, 2018

mratsim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mratsim Oct 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mratsim Oct 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

On length prefix vs length suffix

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NatoliChris commented Oct 2, 2018

djrtwo commented Oct 2, 2018

NatoliChris commented Oct 2, 2018

Container Serialization (@mratsim's comment):

List/Vector Serialization

On length prefix vs length suffix

djrtwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NatoliChris commented Oct 2, 2018

djrtwo commented Oct 3, 2018

NatoliChris commented Oct 3, 2018

djrtwo commented Oct 3, 2018

mratsim Oct 2, 2018 •

edited

Loading

mratsim Oct 2, 2018 •

edited

Loading