WIP: Strict assembly. [Julia the IR] #2107

chriseth · 2017-04-03T17:17:49Z

No description provided.

gnidan · 2017-04-03T17:24:59Z

Note that "Julia" is the name of an existing language

gnidan

Just some thoughts about this, while my hands are still dirty from writing a basic parser for the grammar 😄

gnidan · 2017-04-03T23:35:29Z

docs/assembly.rst

+    Expression =
+        Identifier | Literal | FunctionCall
+    Switch =
+        'switch' Expression Case* ( 'default' ':' Block )?


For whatever it might matter, note that the following will be syntactically allowable:

switch 0 case 1 { ... }

Why should it not be syntactically allowed?

gnidan · 2017-04-03T23:35:38Z

docs/assembly.rst

+    Case =
+        'case' Expression ':' Block
+    ForLoop =
+        'for' Block Expression Block Block


Previously, the grammar permitted blocks or expressions for two of those blocks. The difference as far as I can tell is the inclusion / exclusion of braces. Is being able to exclude braces something that might be useful?

The reason for all this are the complicated scoping rules for the for-loop. We want to allow a new variable to be declared whose scope is only the for-loop itself, so basically the idiomatic

for { let x := 1 } lt(x, 10) { x := add(x, 1) } { ...}

Of course it is weird that the scope of x extends beyond { let x := 1 } but it might be even weirder if we allowed for let x := 1 lt(x, 10) x := add(x, 1) { ... }, but this is perhaps a matter of taste.

Forcing braces for at least the first and last element introduces separators, but I'm fine with also allowing plain expressions there if you think that is useful.

👍 braces

gnidan · 2017-04-03T23:35:47Z

docs/assembly.rst

+
+     | 'dataSize' '(' Identifier ')' |
+        LinkerSymbol |
+        'bytecodeSize' |


I think these need to get moved up to Statement?

Also: what are your thoughts on pulling 'break'/'continue'/'bytecodeSize' into their own non-terminal(s)?

In my own parser implementation, I've found that the inconsistency between leaf/non-leaf nodes as possible resolutions for Statement has added a bit of complexity. This might be an implementation concern and not really relevant, but I think it might clean things up a bit to make a Break non-terminal, etc.

(This would mean that every Statement instance could be resolved to a container type with exactly one item.)

Follow-up thought from this: perhaps break/continue are categorizable as ControlOperations. I am not sure of behavior for these, but maybe also there is some generalization that can be done around dataSize/bytecodeSize/HexLiteral?

Just my 0.002Ξ, trying to make pretty ASTs

These here (dataSize, bytecodeSize, ...) do not need to be part of the specification (I still need to move them somewhere). As long as they can be written as functions and do not influence the internal control flow, they can be modelled as built-in functions similar to opcodes which are specific to the actual flavour of JULIA.

You are right, we should probably pull break and continue into a nonterminals.

gnidan · 2017-04-03T23:35:58Z

docs/assembly.rst

+        'bytecodeSize' |
+
+Restriction for Expression: Functions can only return single item,
+top level has to return nothing.


Is a workaround for this to define a tuple type?

This was written too briefly. What I actually wanted to say is: In f(g(), x), g can only return a single value. You can use functions that return multiple values, but then you have to resort to
let (a, b, c) := f().

I think that adding proper support for tuples might complicated everything, but I might also be wrong.

gnidan · 2017-04-03T23:36:09Z

docs/assembly.rst

+    FunctionCall =
+        Identifier '(' ( AssemblyItem ( ',' AssemblyItem )* )? ')'
+    IdentifierOrList = Identifier | '(' IdentifierList ')'
+    Identifier = [a-zA-Z_$] [a-zA-Z_0-9]*


Another thought: I think it might be worth specifying a list of keywords and defining Identifier to exclude those.

Without this, parsers need additional lookahead tokens, or for the rules to be ordered carefully.

Yes, we need a scanner grammar, too.

gnidan · 2017-04-04T03:18:34Z

I've just reworked my parser to conform to this new grammar.

Two questions:

Have label definitions been removed?
What are the syntactic elements for the arguments to a function call? They're still listed as AssemblyItem, which has become Statement, but I wasn't sure if they should be Expression. (Expression would limit some things / have different meaning?)

chriseth · 2017-04-04T08:24:31Z

docs/assembly.rst

+    SubAssembly =
+        'assembly' Identifier Block
+    FunctionCall =
+        Identifier '(' ( AssemblyItem ( ',' AssemblyItem )* )? ')'


This should be Expression.

chriseth · 2017-04-04T08:27:31Z

Yes, labels have been removed and FunctionCall should be Identifier '(' ( Expression ( ',' Expression )* )? ')'.

Another question: If there is a function that does not take arguments (for example the built-in calldatasize()) - do we still require parentheses?

Argument in favour: This would clearly distinguish function call and variable access.
Argument against: It might be better read- and writeable if you can omit parentheses in that case.

chriseth · 2017-04-04T12:25:12Z

Ok, this is a first take at specifying the whole thing. It still needs some work about the scopes, but I think it can actually be expressed simpler than I am currently doing it. loops might create some trouble, especially break and continue and of course sub-assemblies. I thought about perhaps breaking up the sub-assemblies a bit more providing for some generic "data" area:

data "type" { ... } where type can be assembly or hex or whatever and it determines what is allowed inside the {}.

chriseth · 2017-04-04T12:29:31Z

docs/assembly.rst

+a node of the AST and returns a new global state, a new local state
+and a value (if the AST node is an expression).
+
+We use sequence numbers as a shorthand for the order of evaluation


This shorthand notation is actually not used.

axic · 2017-04-04T15:00:33Z

Width of variables is not defined as the IR is agnostic of this, but it assumed to be 256-bit by the higher level language. That also determines what the target backend can expect.

I think it would be useful to include a way (such as variable:256) to define the desired width and supporting 32, 64 and 256 only. That way, a non-EVM target, such as Webassembly can efficiently support native types.

Likewise, it perhaps would make sense defining widths and conversions for literals:
int32(42), int64(42) and int256(42) to define the literal 42 as a 32, 64 and 256-bit.

Furthermore, it could raise a failure if the literal is out of bounds for that width.

axic · 2017-04-04T15:02:57Z

Also please consider a different name, as there's another fairly well known language called Julia 😉

I nominate Simone as it starts with the letter S and there was an AI sci-fi with the same name.

gnidan · 2017-04-04T15:32:24Z

Responding to responses:

Labels – does the removal of labels give some kind of benefit re:, say, no more stack-height bookkeeping? Otherwise I'd say they could be quite useful as valid syntax, as I would imagine the desugaring phase would make good use of them.

Function calls with no args – I'd vote for requiring parens; allowing/requiring them to be omitted seems like it adds a lot of complexity

data "type" { ... } / generic data mechanism – Definitely in favor of this being well-defined.

Will look at the spec part later on

chriseth · 2017-04-04T15:48:17Z

@axic I think we can add : Identifier to specify a named type (the names are from a namespace unrelated to the namespace of variables and functions). Types have to match for arguments to functions and assignments. Conversion functions can be built-in functions similar to the evm opcodes.

Concerning the name: I don't think there will be confusion.

@gnidan labels: desugaring towards EVM of course requires labels, but desugaring will also translated JULIA to a different language that has lables but does not have functions for example. The benefit of not having labels is that the specification does not even need a stack (as you will see in the code). Labels and scoped variables also don't live well together.

axic · 2017-04-04T15:58:13Z

One more note: this language has unlimited number of return values, while Webassembly is limited to one. It will need to wrap functions which return more than one to store some of their results in memory.

Maybe it would make sense reviewing the pros and cons of limiting it to a single return value for EVM. If it is reasonable limiting it even in EVM, then it would reduce the complexity needed for the Webassembly target.

pirapira · 2017-04-05T06:22:28Z

docs/assembly.rst

-differs from standalone assembly and then specify assembly itself.
+JULIA is an intermediate language that can compile to various different backends
+(EVM 1.0, EVM 1.5 and eWASM are planned).
+Because of that, it is designed to be as featureless as possible.


I doubt it. You can remove switches and the language is still Turing-complete.

pirapira · 2017-04-05T06:28:01Z

docs/assembly.rst

+
+Scopes in JULIA are tied to Blocks and all declarations
+(``FunctionDefinition``, ``VariableDeclaration`` and ``SubAssembly``)
+introduce new identifiers into these scopes. Shadowing is disallowed


Does this imply that in the above power examples, the JULIA program requires that EVM does not have an opcode called power?

There are no opcodes. power is an (internal) function and by the EVM backend it replaced with a simple opcode.

Is any symbol from underlying virtual machines visible in JULIA programs?

pirapira · 2017-04-05T06:30:48Z

docs/assembly.rst

+Talk about identifiers across functions etc
+
+
+Restriction for Expression: Statements have to return empty tuple


Tuples are not expressions. I suspect some expressions can return values that are not expressible as expressions. I oppose this because in this way, the substitution model of program execution does not work.

pirapira · 2017-04-05T06:31:23Z

docs/assembly.rst

+
+
+Restriction for Expression: Statements have to return empty tuple
+Function arguments have to be single item


"Item"s have not been defined so far.

pirapira · 2017-04-05T06:33:05Z

docs/assembly.rst

+Restriction for Expression: Statements have to return empty tuple
+Function arguments have to be single item
+
+Restriction for VariableDeclaration and Assignment: Number of elements left and right needs to be the same


The right hand side of an assignment is always one expression. Is this always a single element? When is an expression multiple elements?

pirapira · 2017-04-05T06:33:50Z

docs/assembly.rst

+Restriction for VariableDeclaration and Assignment: Number of elements left and right needs to be the same
+continue and break only in for loop
+
+Literals have to fit 32 bytes


Does this refer to the length of literals on the text? In that case, does the counting contain 0x or ""?

I think it should be only expressed as a number (decimal or hexadecimal) with the limitation to fit into 256 bits.

pirapira · 2017-04-05T06:35:16Z

docs/assembly.rst

+
+The the evaluation function E takes a global state, a local state and
+a node of the AST and returns a new global state, a new local state
+and a value (if the AST node is an expression).


Is this a confirmation that an expression is always evaluated into a single element?

pirapira · 2017-04-05T06:36:20Z

docs/assembly.rst

+
+.. code::
+    E(G, L, <{St1, ..., Stn}>: Block) =
+        let L' be a copy of L that adds a new inner scope which contains


It was not mentioned that local states can contain scopes.

pirapira · 2017-04-05T06:39:07Z

docs/assembly.rst

+        let L'' be a copy of L'n where the innermost scope is removed
+        Gn, L''
+    E(G, L, <function fname (param1, ..., paramn) -> (ret1, ..., retm) block>: FunctionDefinition) =
+        G, L


Maybe not here, but somewhere, the function definition needs to be registered in L. Not just the name of the function, but also the parameters, return values and the body.

pirapira · 2017-04-05T06:40:21Z

docs/assembly.rst

+    E(G, L, <function fname (param1, ..., paramn) -> (ret1, ..., retm) block>: FunctionDefinition) =
+        G, L
+    E(G, L, <let (var1, ..., varn) := value>: VariableDeclaration) =
+        E(G, L, <(var1, ..., varn) := value>: Assignment)


Something needs to happen to the inactive flags in L.

pirapira · 2017-04-05T06:41:25Z

docs/assembly.rst

+    E(G, L, <let (var1, ..., varn) := value>: VariableDeclaration) =
+        E(G, L, <(var1, ..., varn) := value>: Assignment)
+    E(G, L, <(var1, ..., varn) := value>: Assignment) =
+        let G', L', v1, ..., vn = E(G, L, value)


What is the number of return values of E?

pirapira · 2017-04-05T06:43:33Z

docs/assembly.rst

+        E(G, L, <(var1, ..., varn) := value>: Assignment)
+    E(G, L, <(var1, ..., varn) := value>: Assignment) =
+        let G', L', v1, ..., vn = E(G, L, value)
+        let L'' be a copy of L' where L'[vi] = vi for i = 1, ..., n


It looks as if looking up v1 results in v1.

Also, the modified local state should L'' not L'.

pirapira · 2017-04-05T06:45:01Z

docs/assembly.rst

+The the evaluation function E takes a global state, a local state and
+a node of the AST and returns a new global state, a new local state
+and a value (if the AST node is an expression).
+


Around here, an introduction of local states are necessary. What information they contain and how to access these.

axic · 2017-04-10T11:25:59Z

docs/assembly.rst

+        G'', Ln, rv1, ..., rvm
+    E(G, L, l: HexLiteral) = G, L, hexString(l),
+        where hexString decodes l from hex and left-aligns in into 32 bytes
+    E(G, L, l: StringLiteral) = G, L, utf8EncodeLeftAligned(l),


Why do we need string literals?

axic · 2017-04-18T12:06:37Z

This PR changes assembly.rst and the current flow suggests that inline assembly is a subset of Julia. I think that is incorrect and we still need to support "old inline assembly" even if we introduce direct Julia support from Solidity. Therefore I suggest to move Julia-related documentation to julia.rst.

chriseth · 2017-07-12T18:10:54Z

Replaced by #2129

Describe Julia.

53aa56c

chriseth added the in progress label Apr 3, 2017

gnidan reviewed Apr 3, 2017

View reviewed changes

chriseth commented Apr 4, 2017

View reviewed changes

First take in formal specification.

870abf7

chriseth commented Apr 4, 2017

View reviewed changes

pirapira reviewed Apr 5, 2017

View reviewed changes

axic reviewed Apr 10, 2017

View reviewed changes

axic changed the title ~~WIP: Strict assembly.~~ WIP: Strict assembly. [Julia the IR] Apr 18, 2017

axic mentioned this pull request Apr 18, 2017

Initial Julia description #2129

Merged

chriseth closed this Jul 12, 2017

axic deleted the strictasm branch September 16, 2017 12:13

		Talk about identifiers across functions etc


		Restriction for Expression: Statements have to return empty tuple



		Restriction for Expression: Statements have to return empty tuple
		Function arguments have to be single item

WIP: Strict assembly. [Julia the IR] #2107

WIP: Strict assembly. [Julia the IR] #2107

Conversation

chriseth commented Apr 3, 2017

gnidan commented Apr 3, 2017

gnidan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gnidan commented Apr 4, 2017 • edited

Choose a reason for hiding this comment

chriseth commented Apr 4, 2017

chriseth commented Apr 4, 2017

Choose a reason for hiding this comment

axic commented Apr 4, 2017 • edited

axic commented Apr 4, 2017

gnidan commented Apr 4, 2017

chriseth commented Apr 4, 2017

axic commented Apr 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axic Apr 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axic Apr 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axic commented Apr 18, 2017

chriseth commented Jul 12, 2017

gnidan commented Apr 4, 2017 •

edited

axic commented Apr 4, 2017 •

edited

axic Apr 10, 2017 •

edited

axic Apr 10, 2017 •

edited