New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Strict assembly. [Julia the IR] #2107
Conversation
Note that "Julia" is the name of an existing language |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thoughts about this, while my hands are still dirty from writing a basic parser for the grammar 😄
docs/assembly.rst
Outdated
Expression = | ||
Identifier | Literal | FunctionCall | ||
Switch = | ||
'switch' Expression Case* ( 'default' ':' Block )? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For whatever it might matter, note that the following will be syntactically allowable:
switch 0 case 1 { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should it not be syntactically allowed?
Case = | ||
'case' Expression ':' Block | ||
ForLoop = | ||
'for' Block Expression Block Block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the grammar permitted blocks or expressions for two of those blocks. The difference as far as I can tell is the inclusion / exclusion of braces. Is being able to exclude braces something that might be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for all this are the complicated scoping rules for the for-loop. We want to allow a new variable to be declared whose scope is only the for-loop itself, so basically the idiomatic
for { let x := 1 } lt(x, 10) { x := add(x, 1) } { ...}
Of course it is weird that the scope of x
extends beyond { let x := 1 }
but it might be even weirder if we allowed for let x := 1 lt(x, 10) x := add(x, 1) { ... }
, but this is perhaps a matter of taste.
Forcing braces for at least the first and last element introduces separators, but I'm fine with also allowing plain expressions there if you think that is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 braces
|
||
| 'dataSize' '(' Identifier ')' | | ||
LinkerSymbol | | ||
'bytecodeSize' | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these need to get moved up to Statement?
Also: what are your thoughts on pulling 'break'
/'continue'
/'bytecodeSize'
into their own non-terminal(s)?
In my own parser implementation, I've found that the inconsistency between leaf/non-leaf nodes as possible resolutions for Statement
has added a bit of complexity. This might be an implementation concern and not really relevant, but I think it might clean things up a bit to make a Break
non-terminal, etc.
(This would mean that every Statement instance could be resolved to a container type with exactly one item.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up thought from this: perhaps break
/continue
are categorizable as ControlOperation
s. I am not sure of behavior for these, but maybe also there is some generalization that can be done around dataSize/bytecodeSize/HexLiteral?
Just my 0.002Ξ, trying to make pretty ASTs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These here (dataSize
, bytecodeSize
, ...) do not need to be part of the specification (I still need to move them somewhere). As long as they can be written as functions and do not influence the internal control flow, they can be modelled as built-in functions similar to opcodes which are specific to the actual flavour of JULIA.
You are right, we should probably pull break
and continue
into a nonterminals.
docs/assembly.rst
Outdated
'bytecodeSize' | | ||
|
||
Restriction for Expression: Functions can only return single item, | ||
top level has to return nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a workaround for this to define a tuple type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was written too briefly. What I actually wanted to say is: In f(g(), x)
, g
can only return a single value. You can use functions that return multiple values, but then you have to resort to
let (a, b, c) := f()
.
I think that adding proper support for tuples might complicated everything, but I might also be wrong.
docs/assembly.rst
Outdated
FunctionCall = | ||
Identifier '(' ( AssemblyItem ( ',' AssemblyItem )* )? ')' | ||
IdentifierOrList = Identifier | '(' IdentifierList ')' | ||
Identifier = [a-zA-Z_$] [a-zA-Z_0-9]* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought: I think it might be worth specifying a list of keywords and defining Identifier to exclude those.
Without this, parsers need additional lookahead tokens, or for the rules to be ordered carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need a scanner grammar, too.
I've just reworked my parser to conform to this new grammar. Two questions:
|
docs/assembly.rst
Outdated
SubAssembly = | ||
'assembly' Identifier Block | ||
FunctionCall = | ||
Identifier '(' ( AssemblyItem ( ',' AssemblyItem )* )? ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be Expression
.
Yes, labels have been removed and FunctionCall should be Another question: If there is a function that does not take arguments (for example the built-in Argument in favour: This would clearly distinguish function call and variable access. |
Ok, this is a first take at specifying the whole thing. It still needs some work about the scopes, but I think it can actually be expressed simpler than I am currently doing it. loops might create some trouble, especially
|
a node of the AST and returns a new global state, a new local state | ||
and a value (if the AST node is an expression). | ||
|
||
We use sequence numbers as a shorthand for the order of evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shorthand notation is actually not used.
Width of variables is not defined as the IR is agnostic of this, but it assumed to be 256-bit by the higher level language. That also determines what the target backend can expect. I think it would be useful to include a way (such as Likewise, it perhaps would make sense defining widths and conversions for literals: Furthermore, it could raise a failure if the literal is out of bounds for that width. |
Also please consider a different name, as there's another fairly well known language called Julia 😉 I nominate Simone as it starts with the letter S and there was an AI sci-fi with the same name. |
Responding to responses: Labels – does the removal of labels give some kind of benefit re:, say, no more stack-height bookkeeping? Otherwise I'd say they could be quite useful as valid syntax, as I would imagine the desugaring phase would make good use of them. Function calls with no args – I'd vote for requiring parens; allowing/requiring them to be omitted seems like it adds a lot of complexity
Will look at the spec part later on |
@axic I think we can add Concerning the name: I don't think there will be confusion. @gnidan labels: desugaring towards EVM of course requires labels, but desugaring will also translated JULIA to a different language that has lables but does not have functions for example. The benefit of not having labels is that the specification does not even need a stack (as you will see in the code). Labels and scoped variables also don't live well together. |
One more note: this language has unlimited number of return values, while Webassembly is limited to one. It will need to wrap functions which return more than one to store some of their results in memory. Maybe it would make sense reviewing the pros and cons of limiting it to a single return value for EVM. If it is reasonable limiting it even in EVM, then it would reduce the complexity needed for the Webassembly target. |
differs from standalone assembly and then specify assembly itself. | ||
JULIA is an intermediate language that can compile to various different backends | ||
(EVM 1.0, EVM 1.5 and eWASM are planned). | ||
Because of that, it is designed to be as featureless as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it. You can remove switches and the language is still Turing-complete.
|
||
Scopes in JULIA are tied to Blocks and all declarations | ||
(``FunctionDefinition``, ``VariableDeclaration`` and ``SubAssembly``) | ||
introduce new identifiers into these scopes. Shadowing is disallowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that in the above power
examples, the JULIA program requires that EVM does not have an opcode called power
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no opcodes. power
is an (internal) function and by the EVM backend it replaced with a simple opcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any symbol from underlying virtual machines visible in JULIA programs?
Talk about identifiers across functions etc | ||
|
||
|
||
Restriction for Expression: Statements have to return empty tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuples are not expressions. I suspect some expressions can return values that are not expressible as expressions. I oppose this because in this way, the substitution model of program execution does not work.
|
||
|
||
Restriction for Expression: Statements have to return empty tuple | ||
Function arguments have to be single item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Item"s have not been defined so far.
Restriction for Expression: Statements have to return empty tuple | ||
Function arguments have to be single item | ||
|
||
Restriction for VariableDeclaration and Assignment: Number of elements left and right needs to be the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right hand side of an assignment is always one expression. Is this always a single element? When is an expression multiple elements?
Restriction for VariableDeclaration and Assignment: Number of elements left and right needs to be the same | ||
continue and break only in for loop | ||
|
||
Literals have to fit 32 bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this refer to the length of literals on the text? In that case, does the counting contain 0x
or ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be only expressed as a number (decimal or hexadecimal) with the limitation to fit into 256 bits.
|
||
The the evaluation function E takes a global state, a local state and | ||
a node of the AST and returns a new global state, a new local state | ||
and a value (if the AST node is an expression). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a confirmation that an expression is always evaluated into a single element?
|
||
.. code:: | ||
E(G, L, <{St1, ..., Stn}>: Block) = | ||
let L' be a copy of L that adds a new inner scope which contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not mentioned that local states can contain scopes.
let L'' be a copy of L'n where the innermost scope is removed | ||
Gn, L'' | ||
E(G, L, <function fname (param1, ..., paramn) -> (ret1, ..., retm) block>: FunctionDefinition) = | ||
G, L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not here, but somewhere, the function definition needs to be registered in L
. Not just the name of the function, but also the parameters, return values and the body.
E(G, L, <function fname (param1, ..., paramn) -> (ret1, ..., retm) block>: FunctionDefinition) = | ||
G, L | ||
E(G, L, <let (var1, ..., varn) := value>: VariableDeclaration) = | ||
E(G, L, <(var1, ..., varn) := value>: Assignment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something needs to happen to the inactive
flags in L
.
E(G, L, <let (var1, ..., varn) := value>: VariableDeclaration) = | ||
E(G, L, <(var1, ..., varn) := value>: Assignment) | ||
E(G, L, <(var1, ..., varn) := value>: Assignment) = | ||
let G', L', v1, ..., vn = E(G, L, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the number of return values of E
?
E(G, L, <(var1, ..., varn) := value>: Assignment) | ||
E(G, L, <(var1, ..., varn) := value>: Assignment) = | ||
let G', L', v1, ..., vn = E(G, L, value) | ||
let L'' be a copy of L' where L'[vi] = vi for i = 1, ..., n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as if looking up v1
results in v1
.
Also, the modified local state should L''
not L'
.
The the evaluation function E takes a global state, a local state and | ||
a node of the AST and returns a new global state, a new local state | ||
and a value (if the AST node is an expression). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Around here, an introduction of local states are necessary. What information they contain and how to access these.
G'', Ln, rv1, ..., rvm | ||
E(G, L, l: HexLiteral) = G, L, hexString(l), | ||
where hexString decodes l from hex and left-aligns in into 32 bytes | ||
E(G, L, l: StringLiteral) = G, L, utf8EncodeLeftAligned(l), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need string literals?
This PR changes |
Replaced by #2129 |
No description provided.