Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSZ spec: Merkleization - Typo in wording #1892

Closed
booleanfunction opened this issue Jun 15, 2020 · 13 comments
Closed

SSZ spec: Merkleization - Typo in wording #1892

booleanfunction opened this issue Jun 15, 2020 · 13 comments
Assignees
Labels
scope:SSZ Simple Serialize

Comments

@booleanfunction
Copy link
Contributor

In simple-serialize.md, within the Merkelization section it says:

We now define Merkleization hash_tree_root(value) of an object value recursively:

merkleize(pack(value)) if value is a basic object or a vector of basic objects.

This part should exclude bitvectors as they are dealt with separately.

A suggested wording to correct this typo could be:

merkleize(pack(value)) if value is a basic object or a vector of basic objects, excluding bitvectors.

@booleanfunction
Copy link
Contributor Author

The case for basic lists should also be amended to exclude bitlists, which are also dealt with separately. Hence
mix_in_length(merkleize(pack(value), limit=chunk_count(type)), len(value)) if value is a list of basic objects.
should read:
mix_in_length(merkleize(pack(value), limit=chunk_count(type)), len(value)) if value is a list of basic objects, excluding bit lists.

@hwwhww hwwhww added the scope:SSZ Simple Serialize label Jun 15, 2020
@protolambda
Copy link
Collaborator

Bitvectors and bitlists are not considered to be part of "basic" lists and vectors category. Bitlists have a special delimiter bit in serialization, and basic element types are all aligned to bytes. Adding "excluding" may clarify it, but at the same time it gives the impression they would be included otherwise, while they should not be.

@booleanfunction
Copy link
Contributor Author

@protolambda Thank you for the reply. My thinking was that when it says if value is a basic object ... that it was referring to a basic type of either a boolean or uintN, and so when the rest of the sentence said or a vector of basic objects I read it in the same way as if it said or a vector of booleans or uintNs - hence the suggestion to exclude. Can I check that the initial reference to a basic object is meant to be include a boolean? Such that a boolean would be packed using the pack function rather than pack_bits?

@protolambda
Copy link
Collaborator

There are two ways to represent a series of booleans: List[boolean, N] and Bitlist[N], these are different.

A List will pack basic element types, but does so on a byte level, without any special information about the element type except 1. if it's fixed-length or not, and 2. its byte length. One boolean is serialized as a single byte, so it takes one byte in the list. Also note that empty lists can be serialized as zero bytes thanks to this. A bitlist has a delimiter bit however, which forces a minimum of 1 byte. The one byte per element minimum in hash-tree-root of a regular list is mostly for consistency with serialization.

Because this is rather inefficient, and bitfields have other desired use cases as well (big integers, bitshifts, etc.), a separate bitlist type was introduced. And the same thing applies to Vector[boolean, N] vs Bitvector[N].

So yes, that's why List/Vector uses pack and Bitlist/Bitvector uses pack_bits

@booleanfunction
Copy link
Contributor Author

booleanfunction commented Jun 16, 2020

@protolambda Thank you for the clarification, the extra detail helps. When I thought about a List[boolean,N] I just assumed it would be implemented as a Bitlist[N] and so read the types as equivalent. Two final queries if that is ok. 1. Would a List[boolean, N] ever be used? Or rather, should it be considered redundant because only bitlists would be used? 2. I just wanted to check that a single boolean is its own type and not to be treated as a bitvector[1]?
Thank you.

@protolambda
Copy link
Collaborator

protolambda commented Jun 16, 2020

  1. Would a List[boolean, N] ever be used?

If you need to generalize something, some situaation where you can have any element type, and the list is small, the ability to use a List[boolean, N] helps avoid inconsistencies and allows you to make more generic assumptions about sizes (especially the 0-element list case).

Or rather, should it be considered redundant because only bitlists would be used?

No, not redundant. But definitely not preferred in most common situations.

Just wanted to check that a single boolean is its own type and not to be treated as a bitvector[1]

Serialization and merkleization of indiviudal boolean is the same as a Bitvector[1]. But this is more of a coincidence, and I prefer treating a boolean as a separate type, since it matches the boolean definitions in a lot of programming languages as well. E.g. a bool in Go is represented as a single byte, that can be either 0 or 1. Since it does not make sense to try and describe something with bits, when all memory addresses are in bytes. At the same time, it also does not make sense to support all lengths of a bitvector, except length 1. Thus both are valid.

@protolambda protolambda self-assigned this Jun 16, 2020
@booleanfunction
Copy link
Contributor Author

booleanfunction commented Jun 16, 2020

@protolambda Thank you again. This makes sense, I was assuming that say a List[boolean, N] would just (not sure if it is the correct word to use) default to a Bitlist[N], and similarly for vectors/bitvectors, even though I can appreciate that they would be implemented differently - but so knowing there is an intention to allow both makes the wording in the spec clearer. Thank you.

You mentioned in your comment above:

At the same time, it also does not make sense to support all lengths of a bitvector, except length 1.

I am not sure if I have missed something, but I wanted to ask if you could please clarify what lengths of a bitvector aren't supported? I read the spec as that for a Bitvector[N], N could be anything greater than 0? Have I missed something? I haven't been able to find anything in py-ssz to restrict N for Bitvectors, other than N > 0, and so I thought I should check.

I wanted to also note that I saw your new issue #1901 and I will post any further queries to that repo, but just thought it made sense to finish this thread here. I hope that is ok?

Thank you.

@protolambda
Copy link
Collaborator

clarify what lengths of a bitvector aren't supported?

Every length is supported, except 0 because bitvectors are fixed-length and we cannot have 0-length fixed-length data. Sorry if that previous comment was confusing, I was thinking of what it would be like if we tried to remove bitvector[1] in favor of boolean, but that would just be weird.

that is ok?

Yes no problem, feel free to make as many issues you like if you see more issues, the more critical ones may still be worth resolving before we shift to the new ssz specs repo. That said, any help with polishing the new repo, and going through the edge cases in the longer (but hopefully more clear) descriptions is appreciated. Then once there is a more complete transition plan, and ethereum/eth2.0-ssz or something similar is set up on github, we can move future SSZ discussion there.

@protolambda
Copy link
Collaborator

For now, while the future of the other SSZ repo is uncertain (painful discussion about formatting and spec style), shall we get the bitlist/vectors vs basic list/vector thing clarified in the current spec? I could open a PR, or do you want to give it a try, to get started with a small PR to the specs?

@booleanfunction
Copy link
Contributor Author

@protolambda great, thank you. It has taken me a little while to get to where I feel like I understand how the different parts fit together and so being able to ask some questions to clarify a few more things will be great - I am planning to add more notes to our formal verification project to help with understanding background/motivations/intentions - and so hopefully this will also be of help more broadly :)

And well, I was reading over one of your earlier replies and hoped to clarify a couple more things:

A List will pack basic element types, but does so on a byte level, without any special information about the element type except 1. if it's fixed-length or not, and 2. its byte length.

  1. I wasn't sure if I was reading this correctly. When you make the reference to special information about whether it is fixed-length or not I wasn't sure if this was still with reference to lists of basic element types or lists more generally? (as basic element types are always fixed-length??)

One boolean is serialized as a single byte, so it takes one byte in the list. Also note that empty lists can be serialized as zero bytes thanks to this.

  1. Can I just confirm - Is it just that empty lists can be serialized to zero bytes because that don't need anything extra, like the delimiter needed for bitlists? (i.e. I wasn't sure if I am reading the thanks to this reference correctly.)

Thank you :)

@protolambda
Copy link
Collaborator

as basic element types are always fixed-length??

You are right, I was talking about lists in general there. For lists with a basic element type, the element type is always fixed length, and so you only have to look at the element byte length to handle the serialization.

empty lists can be serialized to zero bytes because that don't need anything extra, like the delimiter needed for bitlists?

Exactly, the information of the "scope" (the available bytes to decode a value, or the length of the resulting bytes when encoding) is enough to say everything about a basic list: if each element is A bytes, and we have B elements in the list, then the result is simply A*B bytes long, which will be the "scope" when decoding it. And all we need for later decoding is to divide the "scope" by A to get B. For that to be valid we will have to check if the scope is a multiple A, otherwise the input data would be incomplete, and thus invalid. (And more decoding checks, such as checking the list limit, are all the same as with the other list-like types).

And to clarify some more: lists of complex but fixed-length elements work the same w.r.t. scope and length, except you just need to compute the length of the element type first. Merkleization is different however, since the non-basic types utilize the full 32 bytes of their hash-tree-root, so there is no such thing as packing them together into 32 bytes.

Long story short, SSZ serialization is really just recursive concatenation, sometimes some offsets to deal with the dynamic-length types, and a few type-specific things like bitlist delimiter bits, or selecting the value in an union.

@booleanfunction
Copy link
Contributor Author

For now, while the future of the other SSZ repo is uncertain (painful discussion about formatting and spec style), shall we get the bitlist/vectors vs basic list/vector thing clarified in the current spec? I could open a PR, or do you want to give it a try, to get started with a small PR to the specs?

@protolambda I think I was typing my comment when you posted this and so only just realised that I missed this comment!

Sure, that would be great. In some ways I can't believe that it didn't occur to me that the reason the spec is worded that way is because of both, say, Vector[boolean,N] and Bitvector[N] being valid, it seems obvious now!! And so yes, I would be happy to have a go at a PR if that is ok. I will keep it minimal, and certainly please let me know if you think there is a better way.

@djrtwo
Copy link
Contributor

djrtwo commented Jun 23, 2020

closed via #1912

@djrtwo djrtwo closed this as completed Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:SSZ Simple Serialize
Projects
None yet
Development

No branches or pull requests

4 participants