Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Strict assembly. [Julia the IR] #2107

Closed
wants to merge 2 commits into from
Closed

WIP: Strict assembly. [Julia the IR] #2107

wants to merge 2 commits into from

Conversation

chriseth
Copy link
Contributor

@chriseth chriseth commented Apr 3, 2017

No description provided.

@gnidan
Copy link
Member

gnidan commented Apr 3, 2017

Note that "Julia" is the name of an existing language

Copy link
Member

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some thoughts about this, while my hands are still dirty from writing a basic parser for the grammar 😄

Expression =
Identifier | Literal | FunctionCall
Switch =
'switch' Expression Case* ( 'default' ':' Block )?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whatever it might matter, note that the following will be syntactically allowable:

switch 0 case 1 { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it not be syntactically allowed?

Case =
'case' Expression ':' Block
ForLoop =
'for' Block Expression Block Block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the grammar permitted blocks or expressions for two of those blocks. The difference as far as I can tell is the inclusion / exclusion of braces. Is being able to exclude braces something that might be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for all this are the complicated scoping rules for the for-loop. We want to allow a new variable to be declared whose scope is only the for-loop itself, so basically the idiomatic

for { let x := 1 } lt(x, 10) { x := add(x, 1) } { ...}

Of course it is weird that the scope of x extends beyond { let x := 1 } but it might be even weirder if we allowed for let x := 1 lt(x, 10) x := add(x, 1) { ... }, but this is perhaps a matter of taste.

Forcing braces for at least the first and last element introduces separators, but I'm fine with also allowing plain expressions there if you think that is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 braces


| 'dataSize' '(' Identifier ')' |
LinkerSymbol |
'bytecodeSize' |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these need to get moved up to Statement?

Also: what are your thoughts on pulling 'break'/'continue'/'bytecodeSize' into their own non-terminal(s)?

In my own parser implementation, I've found that the inconsistency between leaf/non-leaf nodes as possible resolutions for Statement has added a bit of complexity. This might be an implementation concern and not really relevant, but I think it might clean things up a bit to make a Break non-terminal, etc.

(This would mean that every Statement instance could be resolved to a container type with exactly one item.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up thought from this: perhaps break/continue are categorizable as ControlOperations. I am not sure of behavior for these, but maybe also there is some generalization that can be done around dataSize/bytecodeSize/HexLiteral?

Just my 0.002Ξ, trying to make pretty ASTs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These here (dataSize, bytecodeSize, ...) do not need to be part of the specification (I still need to move them somewhere). As long as they can be written as functions and do not influence the internal control flow, they can be modelled as built-in functions similar to opcodes which are specific to the actual flavour of JULIA.

You are right, we should probably pull break and continue into a nonterminals.

'bytecodeSize' |

Restriction for Expression: Functions can only return single item,
top level has to return nothing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a workaround for this to define a tuple type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written too briefly. What I actually wanted to say is: In f(g(), x), g can only return a single value. You can use functions that return multiple values, but then you have to resort to
let (a, b, c) := f().

I think that adding proper support for tuples might complicated everything, but I might also be wrong.

FunctionCall =
Identifier '(' ( AssemblyItem ( ',' AssemblyItem )* )? ')'
IdentifierOrList = Identifier | '(' IdentifierList ')'
Identifier = [a-zA-Z_$] [a-zA-Z_0-9]*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: I think it might be worth specifying a list of keywords and defining Identifier to exclude those.

Without this, parsers need additional lookahead tokens, or for the rules to be ordered carefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need a scanner grammar, too.

@gnidan
Copy link
Member

gnidan commented Apr 4, 2017

I've just reworked my parser to conform to this new grammar.

Two questions:

  1. Have label definitions been removed?
  2. What are the syntactic elements for the arguments to a function call? They're still listed as AssemblyItem, which has become Statement, but I wasn't sure if they should be Expression. (Expression would limit some things / have different meaning?)

SubAssembly =
'assembly' Identifier Block
FunctionCall =
Identifier '(' ( AssemblyItem ( ',' AssemblyItem )* )? ')'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Expression.

@chriseth
Copy link
Contributor Author

chriseth commented Apr 4, 2017

Yes, labels have been removed and FunctionCall should be Identifier '(' ( Expression ( ',' Expression )* )? ')'.

Another question: If there is a function that does not take arguments (for example the built-in calldatasize()) - do we still require parentheses?

Argument in favour: This would clearly distinguish function call and variable access.
Argument against: It might be better read- and writeable if you can omit parentheses in that case.

@chriseth
Copy link
Contributor Author

chriseth commented Apr 4, 2017

Ok, this is a first take at specifying the whole thing. It still needs some work about the scopes, but I think it can actually be expressed simpler than I am currently doing it. loops might create some trouble, especially break and continue and of course sub-assemblies. I thought about perhaps breaking up the sub-assemblies a bit more providing for some generic "data" area:

data "type" { ... } where type can be assembly or hex or whatever and it determines what is allowed inside the {}.

a node of the AST and returns a new global state, a new local state
and a value (if the AST node is an expression).

We use sequence numbers as a shorthand for the order of evaluation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shorthand notation is actually not used.

@axic
Copy link
Member

axic commented Apr 4, 2017

Width of variables is not defined as the IR is agnostic of this, but it assumed to be 256-bit by the higher level language. That also determines what the target backend can expect.

I think it would be useful to include a way (such as variable:256) to define the desired width and supporting 32, 64 and 256 only. That way, a non-EVM target, such as Webassembly can efficiently support native types.

Likewise, it perhaps would make sense defining widths and conversions for literals:
int32(42), int64(42) and int256(42) to define the literal 42 as a 32, 64 and 256-bit.

Furthermore, it could raise a failure if the literal is out of bounds for that width.

@axic
Copy link
Member

axic commented Apr 4, 2017

Also please consider a different name, as there's another fairly well known language called Julia 😉

I nominate Simone as it starts with the letter S and there was an AI sci-fi with the same name.

@gnidan
Copy link
Member

gnidan commented Apr 4, 2017

Responding to responses:

Labels – does the removal of labels give some kind of benefit re:, say, no more stack-height bookkeeping? Otherwise I'd say they could be quite useful as valid syntax, as I would imagine the desugaring phase would make good use of them.

Function calls with no args – I'd vote for requiring parens; allowing/requiring them to be omitted seems like it adds a lot of complexity

data "type" { ... } / generic data mechanism – Definitely in favor of this being well-defined.

Will look at the spec part later on

@chriseth
Copy link
Contributor Author

chriseth commented Apr 4, 2017

@axic I think we can add : Identifier to specify a named type (the names are from a namespace unrelated to the namespace of variables and functions). Types have to match for arguments to functions and assignments. Conversion functions can be built-in functions similar to the evm opcodes.

Concerning the name: I don't think there will be confusion.

@gnidan labels: desugaring towards EVM of course requires labels, but desugaring will also translated JULIA to a different language that has lables but does not have functions for example. The benefit of not having labels is that the specification does not even need a stack (as you will see in the code). Labels and scoped variables also don't live well together.

@axic
Copy link
Member

axic commented Apr 4, 2017

One more note: this language has unlimited number of return values, while Webassembly is limited to one. It will need to wrap functions which return more than one to store some of their results in memory.

Maybe it would make sense reviewing the pros and cons of limiting it to a single return value for EVM. If it is reasonable limiting it even in EVM, then it would reduce the complexity needed for the Webassembly target.

differs from standalone assembly and then specify assembly itself.
JULIA is an intermediate language that can compile to various different backends
(EVM 1.0, EVM 1.5 and eWASM are planned).
Because of that, it is designed to be as featureless as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it. You can remove switches and the language is still Turing-complete.


Scopes in JULIA are tied to Blocks and all declarations
(``FunctionDefinition``, ``VariableDeclaration`` and ``SubAssembly``)
introduce new identifiers into these scopes. Shadowing is disallowed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that in the above power examples, the JULIA program requires that EVM does not have an opcode called power?

Copy link
Member

@axic axic Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no opcodes. power is an (internal) function and by the EVM backend it replaced with a simple opcode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any symbol from underlying virtual machines visible in JULIA programs?

Talk about identifiers across functions etc


Restriction for Expression: Statements have to return empty tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples are not expressions. I suspect some expressions can return values that are not expressible as expressions. I oppose this because in this way, the substitution model of program execution does not work.



Restriction for Expression: Statements have to return empty tuple
Function arguments have to be single item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Item"s have not been defined so far.

Restriction for Expression: Statements have to return empty tuple
Function arguments have to be single item

Restriction for VariableDeclaration and Assignment: Number of elements left and right needs to be the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right hand side of an assignment is always one expression. Is this always a single element? When is an expression multiple elements?

Restriction for VariableDeclaration and Assignment: Number of elements left and right needs to be the same
continue and break only in for loop

Literals have to fit 32 bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this refer to the length of literals on the text? In that case, does the counting contain 0x or ""?

Copy link
Member

@axic axic Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be only expressed as a number (decimal or hexadecimal) with the limitation to fit into 256 bits.


The the evaluation function E takes a global state, a local state and
a node of the AST and returns a new global state, a new local state
and a value (if the AST node is an expression).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a confirmation that an expression is always evaluated into a single element?


.. code::
E(G, L, <{St1, ..., Stn}>: Block) =
let L' be a copy of L that adds a new inner scope which contains
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not mentioned that local states can contain scopes.

let L'' be a copy of L'n where the innermost scope is removed
Gn, L''
E(G, L, <function fname (param1, ..., paramn) -> (ret1, ..., retm) block>: FunctionDefinition) =
G, L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not here, but somewhere, the function definition needs to be registered in L. Not just the name of the function, but also the parameters, return values and the body.

E(G, L, <function fname (param1, ..., paramn) -> (ret1, ..., retm) block>: FunctionDefinition) =
G, L
E(G, L, <let (var1, ..., varn) := value>: VariableDeclaration) =
E(G, L, <(var1, ..., varn) := value>: Assignment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something needs to happen to the inactive flags in L.

E(G, L, <let (var1, ..., varn) := value>: VariableDeclaration) =
E(G, L, <(var1, ..., varn) := value>: Assignment)
E(G, L, <(var1, ..., varn) := value>: Assignment) =
let G', L', v1, ..., vn = E(G, L, value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the number of return values of E?

E(G, L, <(var1, ..., varn) := value>: Assignment)
E(G, L, <(var1, ..., varn) := value>: Assignment) =
let G', L', v1, ..., vn = E(G, L, value)
let L'' be a copy of L' where L'[vi] = vi for i = 1, ..., n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if looking up v1 results in v1.

Also, the modified local state should L'' not L'.

The the evaluation function E takes a global state, a local state and
a node of the AST and returns a new global state, a new local state
and a value (if the AST node is an expression).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Around here, an introduction of local states are necessary. What information they contain and how to access these.

G'', Ln, rv1, ..., rvm
E(G, L, l: HexLiteral) = G, L, hexString(l),
where hexString decodes l from hex and left-aligns in into 32 bytes
E(G, L, l: StringLiteral) = G, L, utf8EncodeLeftAligned(l),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need string literals?

@axic axic changed the title WIP: Strict assembly. WIP: Strict assembly. [Julia the IR] Apr 18, 2017
@axic
Copy link
Member

axic commented Apr 18, 2017

This PR changes assembly.rst and the current flow suggests that inline assembly is a subset of Julia. I think that is incorrect and we still need to support "old inline assembly" even if we introduce direct Julia support from Solidity. Therefore I suggest to move Julia-related documentation to julia.rst.

@axic axic mentioned this pull request Apr 18, 2017
@chriseth
Copy link
Contributor Author

Replaced by #2129

@chriseth chriseth closed this Jul 12, 2017
@axic axic deleted the strictasm branch September 16, 2017 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants