Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Witness Specification #1

Merged
merged 17 commits into from
Mar 26, 2020
Merged

Witness Specification #1

merged 17 commits into from
Mar 26, 2020

Conversation

mandrigin
Copy link
Contributor

This is the first draft of a spec of the block witness format.

A couple of points about it:

(1) It is based on the real witness format that is currently used in turbo-geth (and its serialization was used to measure data in the latest blog post I wrote). There are a couple of minor differences to keep abstractions more clean (introduction of CodeNode and removal of one instruction). So using this format you can actually rebuild tries to run block on Ethereum mainnet.

(2) The witness execution process isn't implemented exactly as stated here, we use a stack machine to execute the witness. I will work on a reference implementation of this "substitution" approach now.

(3) I'm very open to any suggestions. The goal of this document is to start a discussion and to settle on a format that will work across multiple Ethereum node implementations.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't claim to fully grok the whole thing but a first pass has me liking this. I think I'll probably shoot at writing up a small implementation to further understand this, at which point I'd probably have more valuable feedback.

witness.md Outdated

## End Criteria

The execution ends when there are no substitution rules applicable for this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "applicable" seems to suggest there could be rules remaining that just cannot be applied? Should this instead say

" The execution ends when there are no substitution rules remaining for this ..."

or maybe more explicit

The execution ends when all substitution rules have been applied

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've read the GUARD stuff I see why the working is the way it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm still not sure what is the best layout of this doc:

  • to explain all the core concepts (types, rules, guards, etc) and then the execution flow (that has a downside that you have to just remember a lot of stuff before you put it all together);
  • to show the execution first and then explain the details of it (to me it is easier to grasp, but it leads to more questions in the beginning);

witness.md Outdated
- The execution state MUST match the End Criteria;
- The items that are left in the witness MUST follow this pattern: `(Node
NEW_TRIE ... Node)`
- Each `Node` element will be a root of a trie.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say - "Each Node element MUST be a root of a trie"?

witness.md Outdated

## Helper functions

### `MAKE_VALUES_ARRAY`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include the function signature?


### `MAKE_VALUES_ARRAY`

returns an array of 16 elements, where values from `values` are set to the indices where `mask` has bits set to 1. Every other place has `nil` value there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that mask could be more precisely defined in terms of bit width and endianness?

witness.md Outdated

### `BIT_TEST(number, n)`

`n` MUST NOT be negative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a concept of negative integers anywhere in this document? Maybe we can get away with a clear definition of the type of n and rely on the type to properly restrict the value range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that is a leftover... we don't use any negative values anywhere, so I guess we can get rid of that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case you should clarify that when introducing the Int type. Right now there is nothing in the spec that says that integers must be positive

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'm excited to see how the witness serializations can improve!

At first glance, the spec feels very big. Coming from a naive angle, I wonder why it can't be of similar complexity to NodeData or some other data serialization format. I'm defining the status quo here as: the rlp-encoded list of trie nodes needed to build the trie and verify the state root, such that you can look up all data used by a block.

Maybe it would help to add some discussion of the considered alternatives (including status quo), and what advantages the proposed approach has. Without that, I'm stuck wondering, can we make it work with a much simpler static definition?


Is it possible to split up this witness into parts, and validate the parts against the state root?

I think this is important for a few reasons:

  • Peers should have the option of returning partial results, to avoid DoS attacks on the network. If a partial result is not an option, and asking for a full witness is so big that a peer rejects the request, then everything grinds to a halt.
  • If peers can give you a partial result, but you can't verify it against the state root, then it's really easy for a peer to disrupt a stateless node by giving it bad partial results, stringing it along until the end.
  • Also, if you can get partial results, but can't verify them, then it's very problematic to request parts of the witness from multiple peers at once. If you can't attribute a bad piece of the witness to the peer that gave it to you, then you have to throw everything out and start over.

witness.md Outdated Show resolved Hide resolved
Copy link
Contributor

@poemm poemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaving my review as-is so that you can see what may be confusing on a first single linear pass. Some things are discovered later, but maybe you want to remove this early confusion.

Perhaps there should be both a text format and a binary format. One defined from the other, or both defined from some higher-level abstract syntax like you seem to have. The text format would be for analysis, and the binary format would be for size.

My main concerns are adversarial inputs and witness size.

I did not spend too much time reviewing the more complicated instructions, the helper functions, or the binary encoding. I would like to see concrete examples first.


`Any` - any data type. MUST NOT be `nil`.

`Int` - an integer value. We treat the domain of integers as infinite,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to stop parsing an Int. Seems that Int is not uniquely readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that was initially done to keep it up to an implementer to choose a data type for integers as long as they behave like integers... but I guess together with the binary serialization it brings more confusion


`Byte` - a single byte.

`Hash` - 32 byte value, representing a result of Keccak256 hashing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash is presumably big-endian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I guess we need to put that all they binary data is big endian here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section about endianness, because it is the same across the witness.

witness.md Outdated

`Hash` - 32 byte value, representing a result of Keccak256 hashing.

### Composite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production rules seem to be for a high-level abstract syntax, not a text syntax. For example, production rules don't specify what is a terminal and non-terminal.

witness.md Outdated

`(Type...)` - an array of a type `Type`. MUST NOT be empty.

`{field:Type}` - an object with a field `field` of type `Type`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "object"? Can it contain multiple fields, each with a single associated value of arbitrary type? Is the ordering of fields arbitrary or specified by the syntax?


`()` - an empty array of arbitrary type.

`(Type...)` - an array of a type `Type`. MUST NOT be empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ellipses ... syntax is unclear to me. I guess that Type is replaced with a terminal symbol corresponding to the type, and the ... is replaced by an arbitrary length array of values of that type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm not a huge fan of that either, can you point me to a good example of a typed array syntax suitable for this type of spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have is ok too, but need to define what ... means. Alternatively, I have seen the notation some_type* to mean zero or more occurrences of some_type, ref: https://en.wikipedia.org/wiki/Regular_expression#Basic_concepts .

Copy link
Contributor

@poemm poemm Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But postfix syntax like * may be ambiguous, depending on the rest of the syntax.

When in doubt, it never hurts to add extra syntax to remove ambiguity. For example the syntax array(some_type* ) would be for zero or more instances of some_type. But the word "array" is usually used for sequences with fixed known length, so maybe use array(Int,some_type* ) where Int is replaced by the length of the array.

Copy link
Member

@MrChico MrChico Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these seem to be non-empty, Type+ is probably the more appropriate use of Kleene notation here


Helper functions are functions that are used in GUARDs or substitution rules.

Helper functions MUST be pure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It remains to be shown that your helper functions are pure. Saying "MUST" is not enough.

witness.md Outdated
Replaces the instruction with a `ValueNode` wrapped with a `LeafNode`.

```
LEAF(key, raw_value) |=> LeafNode{key, ValueNode(raw_value)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible type: ValueNode should be followed by curly braces. There may be other instances of this typo, I am not sure so I will just comment on it once.

witness.md Outdated

Each block witness consists of a header followed by a list of instructions.

There is no length of witness specified anywhere, the code expects to just reach `EOF`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to prove that if a witness is encoded in the right way, then applying the rules must reach EOF. This is important because of possible adversarial inputs. Algorithms may require an witness bounds check at each opcode.

witness.md Outdated

Keys are also using custom encryption to make them more compact.

The nibbles of a key are encoded in a following way `[FLAGS NIBBLE1+NIBBLE2 NIBBLE3+NIBBLE4 NIBBLE5... ]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that square brackets means concatenation. The Serialized Witness above using parentheses is also concatenation of bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that is a typo from the previous version, thanks for noticing it!

witness.md Outdated
*mask* defines which children are present
(e.g. `0000000000001011` means that children 0, 1 and 3 are present and the other ones are not)

encoded as `[ 0x02 CBOR(mask)...]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this is for hexary. Do you plan something similar for binary? If HASH and BRANCH each require a byte, and the mask encoded in CBOR is also a byte (or more?) then most 32 byte hashes will have three (or more?) bytes of overhead, which approaches 10% of the witness size. That is significant if witness size is the bottleneck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, for binary tries we can theoretically encode the mask into the opcode itself, worst case we will just have 3 branch opcodes w/o changing the core structure (3 because there can't be branches with no children).
also the difference between a single-child branch and an extension becomes basically non-existent in the binary trie, so, that will require some thoughts

Co-Authored-By: Jason Carver <ut96caarrs@snkmail.com>
@mandrigin
Copy link
Contributor Author

@carver

At first glance, the spec feels very big. Coming from a naive angle, I wonder why it can't be of similar complexity to NodeData or some other data serialization format. I'm defining the status quo here as: the rlp-encoded list of trie nodes needed to build the trie and verify the state root, such that you can look up all data used by a block.

but that is basically it, it is essentially a minimal information that is needed to restore the trie structure (including which branches are nil which aren't, extensions, etc). The second part of this document describes, essentially, how to transform this flat list of node info into a trie.

@mandrigin
Copy link
Contributor Author

@carver about partial results, so we are on the same page: do you mean that one peer should be able to respond with only a part of a witness necessary to prove a block? so to actually prove a block I'll need to get different parts of the witness from different peers?

Copy link

@s1na s1na left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on the spec draft! I had implemented one of the earlier versions of turboproof from the programmer's guide doc. Seems there have been many improvements since then.

As others have mentioned I think the layout can be improved (no particular suggestions). Further as you yourself mentioned there are some ambiguities where the witness format meets rebuilding the trie (and verifying and hashing that trie).

witness.md Outdated
| AccountNode{nonce:Int balance:Int storage:nil|Node code:nil|CodeNode|HashNode}
| LeafNode{key:(Byte...) value:ValueNode|AccountNode}
| ExtensionNode{key:(Byte...) child:Node}
| BranchNode{child0:nil|Node child1:nil|Node child3:nil|Node ... child15:nil|Node}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPT branch nodes have a value too, although it's never used when all keys have the same length. Not sure if it makes sense to include it (or a note why it was omitted).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in our specific implementation we only support the same key lengths in the branch nodes, so naturally we never use a value. basically, omitting this invariant makes the spec slightly smaller and simpler IMO, but a note is a good point.

witness.md Outdated Show resolved Hide resolved
witness.md Outdated Show resolved Hide resolved
witness.md Outdated

- The execution state MUST match the End Criteria
- There MUST be only one item left in the witness
- This item MUST be of the type `Node`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be explicitly limited to LeafNode|ExtensionNode|BranchNode for extra strictness. Parsing the witness CODE c1 would be considered valid by this End Criteria but it can't be a valid trie root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, thanks!

witness.md Outdated
Every other end state is considered a FAILURE.


### Building a Forest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if you have specific use-cases for this in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the semi-stateless sync (this experiment) actually builds a forest because we might need to attach more than one subtree to resolve the next block.

GUARD has_storage == true

HashNode(code) Node(storage_hash_node) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, storage_root, code}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LeafNode{key, AccountNode{nonce, balance, storage_root, code}}
LeafNode{key, AccountNode{nonce, balance, storage_hash_node, code}}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the verifier can distinguish when an account's code field is a hash vs when it's the full code (e.g. when the code length is 32)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our code it is distinguished by the type of the storage node: it can be either a HashNode or a ValueNode or a BranchNode

witness.md Outdated Show resolved Hide resolved
witness.md Outdated
```


### `RLP(value)`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anywhere.

witness.md Outdated

returns the array w/o the first item

### `KECCAK(bytes)`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a leftover from the earlier version. good catch!

GUARD has_storage == true

Node(storage_root) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=>
LeafNode{key, AccountNode{nonce, balance, storage_root, nil}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nil should be RLP(nil) to help with hashing the trie, but this doc seems to omit hashing so maybe it's not relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the first version had hashing, but then it just repeated the yellow paper, so we decided to omit that. in the implementation, we keep 2 lists of values: hashes and nodes

@mandrigin
Copy link
Contributor Author

@s1na yeah, I hope we simplified format a bit and it also produces slightly smaller witnesses (we do have raw data for this format, but we didn't publish it yet).

mandrigin and others added 5 commits February 18, 2020 15:38
Co-Authored-By: Sina Mahmoodi <1591639+s1na@users.noreply.github.com>
Co-Authored-By: Sina Mahmoodi <1591639+s1na@users.noreply.github.com>
Co-Authored-By: Sina Mahmoodi <1591639+s1na@users.noreply.github.com>
@carver
Copy link
Contributor

carver commented Feb 18, 2020

do you mean that one peer should be able to respond with only a part of a witness necessary to prove a block? so to actually prove a block I'll need to get different parts of the witness from different peers?

Right. So one way that comes to mind is to be able to request that you don't want the root, but that you want a position deeper in the trie. (Same thing for replying, that you only had the ability to return some sub-trie)

@mandrigin
Copy link
Contributor Author

Right. So one way that comes to mind is to be able to request that you don't want the root, but that you want a position deeper in the trie. (Same thing for replying, that you only had the ability to return some sub-trie)

Ah, taking into account that there is no witness exchange protocol defined in this spec, it is perfectly possible for a peer to generate a "partial" witness for a certain subtrie. Then, another peer can re-create this subtrie and what he needs to validate the result is to check the root hash of the subtrie.

I actually used it for the semi-stateless sync experiment I've wrote earlier. Basically, when we had a merkle path that was "collapsed" into a hash, we could re-create the subtrie from a stored witness, check that the subtrie's root hash matches the hash we expected and "attach" it to this position instead of a hash.

Again, a lot there is up to the actual exchange protocol, but the witness format doesn't forbid this. You can also return multiple subtries in one witness using the NEW_TRIE instruction.

@pipermerriam
Copy link
Member

pipermerriam commented Feb 19, 2020 via email

@mandrigin
Copy link
Contributor Author

@poemm re: witness size. The actual sizes of witnesses (in binary format, for hexary tries) that is described here was used to get these stats when it was running on a mainnet (it is a slight improvement from the previous iteration that we had).

re: input verification. There a couple of levels here where the input is validated:

  • deserialization of/reading a binary witness -- in our current version (with CBOR) apart from checking the instruction opcode we also can check types of some elements (integers, variable-length byte arrays) and if it's not what we expect we can just short-cut with an error there. I left it out of the spec because I didn't want to force too much implementation spec there.

  • substitution rules & guards. as soon as there are no rules that can be applied, we check against the "end criteria/success criteria" and for malformed witnesses there is a chance to be caught there. There is no need to wait until the witness if fully executed for that, "no applicable rules" situation can happen even immediately (i.e. for a witness from one element like BRANCH{0b11} the execution will stop immediately, because no rules can be applied).

  • hash checks. if a witness was parsed and the trie was rebuilt successfully, we can always check its root hash against what we were expecting (either the root hash of the block state or the hash value of a "collapsed" merkle path).

@poemm
Copy link
Contributor

poemm commented Feb 20, 2020

This is a response to a request for feedback on document structure.

The document structure is improving.

It is good that you start with base types like Byte = 0x00 | 0x01 | ... | 0xff and Int = 0 | 1 | 2 | .... Then you can define new types in terms of base types, like Hash = (Byte...). You can define notation you need like {field:Type} at the beginning, or as you go along. It may be challenging to choose convenient syntax.

Readers may start getting confused when you start defining Instructions, Nodes, and rewrite rules. The binary syntax may be overwhelming at first so I would use a text syntax for initial exposition, and the binary format can come later as a (hopefully simple) mapping from the text syntax. I would avoid things like finiteness and success criteria until later, maybe mention it as a high-level aside, but something so strict may be better after the full rewrite system is defined and understood. Also, the pseudocode doesn't define ApplyRule() so the reader doesn't know what it means == maybe put this with a full algorithm in the implementer's guide.

You have a tough job in explaining the rewrite system to unfamiliar readers. To them, it may be overwhelming and unclear how these huge definitions for Instructions, Nodes, and rewrite rules relate to each other. Perhaps you should spend some time on high-level, like explaining the goal of the rewrite system -- to formalize computing the merkle root of a given witness.

And maybe before defining them, explain the high-level mechanism of the rewrite system: in a single pass from left-to-write over a witness, small-step reduction rules are applied to each piece of the witness until the whole witness is rewritten to the root hash. I'm not sure, but it may make sense to give a visualization of a simple rewrite system, illustrating how there is an implicit stack of nodes to the left of where you are currently applying reduction rules. And to the right are instructions which are to-be-executed by applying rewrite rules for each of them. As you execute, you take the leftmost instruction, and possibly the rightmost operand(s) which are just to the left of the instruction, and you rewrite them into a node which is put on rightmost position of nodes, just left of the next opocde (pushing the node to the implicit node stack). Maybe this explanation will confuse things even more. I don't know how the best way to explain it.

It is good that you have a toy example to give intuition. Maybe start with an intro section defining a subset of Instructions, Nodes, and rewrite rules, just enough to do your example. After this, the reader will have context for why you are defining these huge things.

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, motivations is a really valuable piece of this writeup. Thanks for adding!

The witness format was picked to satisfy the following criteria.

**1. Witness Streaming w/o intermediate dynamic buffers**
It should be possible to basically stream-as-you-encode the the trie on one node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It should be possible to basically stream-as-you-encode the the trie on one node
It should be possible to basically stream-as-you-encode the trie on one node

### Witness -to-> Trie

Let's take a look on how to build a Merkle Trie from the witness.
the witness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the witness.


`Bool` -a boolean value.

`Any` - any data type. MUST NOT be `nil`.
Copy link
Member

@MrChico MrChico Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far unused in this spec, can be removed?


### Basic data types

`nil` - an empty value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other types start with capital letters, can be a nice convention to adopt

read).

2. When building a single trie/a forest:
- When there are no rules applicable, the success criteria for a trie/forst MUST be met. It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When there are no rules applicable, the success criteria for a trie/forst MUST be met. It
- When there are no rules applicable, the success criteria for a trie/forest MUST be met. It

@MrChico
Copy link
Member

MrChico commented Mar 17, 2020

Great initiative! Besides my drive-by nits, here are some more thoughts:

Document structure

Right now, encoding of terms is interwoven with the definition of terms in a somewhat haphazard way. I think it could make sense to first specify the structure of witness terms, and leave the encoding to its own section to separate these concerns.

Syntax

I would consider adding Bytes as a primitive (basic) type, since it is so common.

I find the syntax
AccountNode{nonce:Int balance:Int storage:nil|Node code:nil|CodeNode|HashNode} a little confusing. You are essentially defining tree different non-terminals in one line. To me it would be clearer to define these constructions in their own right, as in:

type Node = AccountNode{nonce:Int balance:Int storage:Storage code:Code}
type Storage = nil | Node
type Code = nil | CodeNode | HashNode

You may also want to consider introducing a Maybe type, such that type Maybe A := A | nil, for all types A. (Alternatively Option or the Kleene ?.)

Semantics

I think this should be clarified that the substitution rules are not just specifying ways of rewriting a term by merely matching on the structure of that term, but rather by matching on every possible subterm of that term.

This means that there might be multiple ways of rewriting the same expression and it is not clear a priori that they produce the same result. You may want to make clear if you are suggesting a particular algorithm for rewriting, for example the one suggested by @poemm .

In the same vein, the evaluation criteria "until no further rules are applicable" is also somewhat unsatisfactory. In principle this could mean that in order to know whether I am done or not one would need to check every rule against every subterm of my current expression, which is quite suboptimal.

Another approach could be to specify what constitutes a result or canonical form, which already seems to be alluded to in the success-criteria paragraph.

@MrChico
Copy link
Member

MrChico commented Mar 17, 2020

Just another thought: while the single-pass stack algorithm might be the most straightforward one, since substitution rules can apply to any subterm there might be significant room for parallelizing algorithms as well

@mandrigin
Copy link
Contributor Author

@MrChico thanks for your feedback! Yeah, the substitution rules is not the implementation suggestion, it is an attempt:

  • to make it compatible with some kind of equation solver like MS Z3 to prove some properties like the rules
  • to make it very clear when some semantics changes what those changes are. Like moving from HEX to BIN tries will require us to remove 15 rules for branches and also remove the mask parameter, which is more obvious change then if we write it as an algorithm.

For implementers, I was kept some basic handbook with possible implementations, if there is a good parallel one, we can add it to. Though, what single-pass stack machine gives is streaming, you don't need to have all the witness to start processing it. It might be more valuable than parallel execution, but who knows.

@pipermerriam
Copy link
Member

@mandrigin one approach that I'd support for this PR is to 1) compile a list of issues that you know are not fully addressed, or just places where you are aware improvements could be made 2) open issues in this repository for each of those, or just one meta issue capturing all of them. 3) go ahead with merging as-is and then iterate from there towards a more perfect spec.

This will have the benefit of landing the document into master which then makes it a lot easier for other people to pose changes via pull requests.

@mandrigin
Copy link
Contributor Author

@pipermerriam yeah, that’s a good idea! I’ll do that early next week.

@mandrigin
Copy link
Contributor Author

the witness spec discussion to be continued in the github issues with the "witness" tag

@mandrigin mandrigin merged commit 5ca1bcd into master Mar 26, 2020
@mandrigin mandrigin deleted the witness-spec branch March 26, 2020 14:02
@mandrigin
Copy link
Contributor Author

to anyone landed here: feel free to create more issues if you feel some topics/issues aren't there. this PR is essentially locked for the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants