copied in Github wiki entries on syntax and semantics, and completed …

…first-cut of overview, implementation, and usage
mmalvarez · Jun 19, 2019 · 965499d · 965499d
1 parent 505c3fe
commit 965499d
Show file tree

Hide file tree

Showing 6 changed files with 507 additions and 40 deletions.
diff --git a/docs/correctness.rst b/docs/correctness.rst
@@ -3,3 +3,6 @@
 ************
 Correctness
 ************
+
+In this entry, we will describe the theorem establishing
+the correctness of the Elle compiler, and sketch its proof.
diff --git a/docs/installation.rst b/docs/installation.rst
@@ -2,6 +2,8 @@
 Installation
 *************
 
+.. contents::
+
 Elle's installation is quite straightforward if
 all you want to do is use the FourL frontend as described in
 :doc:`usage`, and somewhat more involved if you want to
@@ -63,28 +65,10 @@ representation of the bytecode produced by the compiler for that smart contract
 messages). Files to run ``llllc`` on can be found in the ``eth-isabelle/elle/tests``
 directory.
 
-For example:
-
-::
-   
-   > cd eth-isabelle/elle/generated
-   # output snipped
-   > make
-   # output snipped
-   > ./llllc ../tests/if.lll
-   6001600a576003600d565b60025b
-
-Usage of ``llllc`` is described further in :doc:`usage`
+For more details on using ``llllc``, see :ref:`basic-usage`.
 
-To help inspect the output of ``llllc``, you may find it useful to use the
-EVM bytecode parser contained in eth-isabelle:
-`eth-isabelle/parser/hexparser.rb <https://github.com/mmalvarez/eth-isabelle/blob/master/parser/hexparser.rb>`
+.. _elle-isabelle-installation:
 
-You'll need an installation of Ruby (tested with 2.5.1p57) to use this tool. It takes hex bytecodes like those
-output by ``llllc`` (or other compilers for Ethereum, such as Solidity LLL) on standard input and outputs
-(on standard output)
-a series of mnemonics describing the opcodes in the input.
-
 Installation for Modifying and Examining Elle
 ==============================================
 
@@ -101,7 +85,6 @@ Once Isabelle is installed, the user will need to set up Lem, a framework used t
 Isabelle specifications used by Elle. In order to do this, first run the following, to update the
 Lem submodule contained in Elle's git repository:
 
-
 ::
    
    > cd eth-isabelle

diff --git a/docs/overview.rst b/docs/overview.rst
@@ -11,7 +11,7 @@ it is *foundationally verified*: that is, it is implemented inside of
 `Isabelle <https://isabelle.in.tum.de/>`_, a *proof assistant* that enables programmers to state
 and prove *mathematical theorems* about their code. In the case of
 Elle, a theorem exists stating that the behavior of the code output by the
-Elle compiler matches the behavior of its input (see :doc:`correctness`).
+Elle compiler matches the behavior of its input (see :doc:`Correctness`).
 
 Assuming we trust the model
 of Elle's source language (an LLL/Yul/WASM-like structured programming
@@ -21,19 +21,16 @@ programmer's intent: that is, that behave the same way that input programs are s
 to behave.
 
 In the rest of this documentation, we'll cover
-:doc:`how to install Elle <installation>`, and then
-:doc:`how to use Elle's FourL frontend <usage>` as
-to compile smart contracts written in the LLL language into EVM
+`:ref:how to install Elle <installation>`,
+`how to use Elle's FourL frontend <usage_>` as
+an end user to compile smart contracts written in the LLL language into EVM
 bytecode. Next, we'll dive into the details of Elle's source-level representation,
-covering its :doc:`syntax <syntax>` and its :doc:`formal semantics <semantics>`. Next, we'll talk about
-:doc:`the internals of the implementation <implementation>` of the Elle compiler
+covering its `syntax <syntax_>` and its `formal semantics <semantics_>`. Next, we'll talk about
+`the internals of the implementation <implementation_>` of the Elle compiler
 (along with the FourL frontend) as well
-as Elle's :doc:`correctness proof <correctness>`.
+as its `correctness proof <correctness_>`.
 
-Elle is intended to be supplanted by `Gazelle <https://github.com/mmalvarez/gazelle>`_, a
-generalized platform inspired by Elle for
-building verified compilers that allows for greater ease of adaptation
-than the Elle system.
+Elle is intended to be supplanted by `Gazelle <https://github.com/mmalvarez/gazelle>`_, a .
 As such, Elle itself is unlikely to see significant changes at this point. Nonetheless, Elle as it exists is a useful
 system: it can be used to compile real-world LLL smart contracts; namely Dan Ellison's
 `Echo <https://media.consensys.net/deploying-your-first-lll-contract-part-2-910d9eff497e>`_ smart contract,

diff --git a/docs/semantics.rst b/docs/semantics.rst
@@ -1,3 +1,185 @@
-*****
-Semantics
-*****
+***************
+Elle Semantics
+***************
+
+.. contents::
+
+(Note: this documentation page is adapted from
+`an entry in the Elle Github wiki <https://github.com/mmalvarez/eth-isabelle/wiki/Elle-Core-Semantics:-The-Meaning-of-Elle>`_
+
+========================================
+Goal: A Formal Meaning for Elle Programs
+========================================
+
+This document is intended to describe the semantics of the Elle-Core intermediate representation, the code that is translated to EVM via the Elle system using a verified procedure. What it means for this procedure to be “verified” is precisely that the behaviors of the source programs according to their semantics (described here) match the behaviors of the compiled EVM code, according to the EVM semantics described in Eth-Isabelle.
+
+Before continuing, it is worth noting that this semantics uses the original EVM semantics to describe the behavior of individual instructions. However, all higher-level control flow is described by the Elle semantics, capturing the key abstraction Elle is intended to provide (structured control flow).
+
+In the formal Isabelle development implementing Elle, this is actually described as a `*big-step operational semantics* <https://en.wikipedia.org/wiki/Operational_semantics#Big-step_semantics>`_. For clarity of exposition here (i.e., to make the presentation more comprehensible for people not already familiar with inductive semantics) I will describe it here as a "non-deterministic" interpreter. The big-step version of the semantics
+is given later in (TODO)
+
+Elle's semantics is nondeterministic
+"in theory" but not “in practice”. What I mean by this is that while
+source programs can describe nondeterministic behaviors, the Elle compiler will refuse to compile them into EVM programs (this is important, as the EVM is by necessity a deterministic virtual machine). Additionally, it is relatively straightforward to characterize which Elle programs are deterministic (a predicate called "valid3" captures precisely this condition).
+
+TODO link in valid3?
+
+Elle's interpreter uses a *logical program counter* which consists of an index into a syntax tree. In particular, this is expressed as a list of natural numbers, describing the path taken through the tree at each syntax node (that is, "[0,1,2]" means "the root's first child's second child's third child"; these paths are zero-indexed). In the codebase these lists are referred to as *childpaths* or *cp* for short. Most of the correctness proof revolves around showing that this logical program counter advances in such a way that the Elle program's behavior matches that of the EVM program with its standard (integer) program counter.
+
+In the code below, "@" is list concatenation.
+
+=================================
+Informal Description of Semantics
+=================================
+
+-------------
+Instructions
+-------------
+
+* Instruction nodes carry an EVM instruction that is guaranteed not to be a JUMP (i.e. no arbitrary modifications of the original program counter)
+* If an instruction node is at the end of the tree (there is no next node), we return the result of running that instruction in the EVM semantics
+* If the instruction is not at the end, we run the EVM instruction to get a new state, advance the logical program counter to the next instruction, and continue executing the Elle semantics from there
+
+-------
+Labels
+-------
+
+* Elle labels have the same semantics as an EVM JUMPDEST instruction
+
+------
+Jumps
+------
+
+* Jump nodes in Elle carry a "depth" parameter, pointing to a particular Sequence node a certain number of levels "up" in the tree.
+* Label nodes in Elle also carry a "depth" parameter, which also points "up" the tree a certain number of levels to a Sequence node for which that label is a corresponding jump target
+* For an Elle program to be valid, each Sequence node must have exactly one (or zero) label nodes "pointing up" to it
+
+    * For valid Elle programs, the semantics of a jump involves moving the logical program counter to the location of the single label corresponding to the target of the jump
+
+* Elle's semantics also describes execution of invalid programs, with more than one jump target
+
+    * In these cases, Elle's execution *nondeterministically splits*, creating "parallel" executions that separately move the program counter to *each* of the jump targets, and continue executing from there.
+    * This behavior cannot be realized by the EVM, and is prevented by checks in Elle's compilation process that assure that programs with multiple labels will not compile successfully to EVM
+
+------------------
+Conditional Jumps
+------------------
+
+* If the head of the EVM stack in the current state is zero (i.e. the conditional jump will not execute in EVM) and there is no next node in the tree, we are done and can return the current state (after subtracting gas for the jump
+  instruction)
+* If the head of the EVM stack is zero and there is a next node, we run the unsuccessful jump (subtracting the gas), increment the logical program counter to point to the next node in the tree, and then continue executing
+* If the head of the EVM stack is nonzero (the conditional jump will execute) the semantics are the same as that of a jump (other than the different gas cost) - see above.
+
+----------
+Sequences
+----------
+
+* If a sequence is empty (has no sub-nodes), and there is no next node *after* the current sequence node, we are already done executing and can return the current state
+* If a sequence is empty and there is a next node, we advance the logical program counter to that node and continue executing
+* If a sequence is non-empty, we begin executing from that sequence's first child
+
+
+
+========================
+Interpreter (Pseudo)Code
+========================
+
+::
+
+   Function get (root, cp) {
+     if (cp == []) return root;
+       else {
+         if (node_type(root) != Seq) return null;
+         else {
+           if(nth (root, head(cp)) = null) return null;
+           else return get(nth (root, head(cp)), tail(cp));
+         }
+     }
+   }
+
+   Function getnext (root, cp) {
+      if (cp == []) return null;
+      cp' = butlast (cp);
+      cpl = last (cp);
+      if (getnext (root, (cp'@(cpl+1))) = null) {
+        return getnext (root, cp');
+      }
+      else {
+          return (cp'@(cpl + 1));
+      }
+   }
+
+   // Function get_label_cp(root)
+   // returns locations of all labels pointing up to root
+
+   Function ellesem (root, cp, state) {
+     switch (get (root, cp)) {
+       case null: return emptyset;
+       
+       // instructions carry which EVM instruction to execute
+       case Inst(i):
+         if(getnext (root, cp) = null) return evm_sem i state;
+         else return ellesem (root, getnext (root, cp), evm_sem i state);
+	 
+       // labels are jumpdests
+       case Label(d):
+         if(getnext (root, cp) = null) return evm_sem JUMPDEST state;
+         else return ellesem (root, getnext(root, cp), evm_sem JUMPDEST state);
+	 
+       // jumps carry a depth - how many scopes up to jump to
+       case Jump(d):
+         ctxpath = take((length cp - d), cp); //take all but last d elements
+         context = get(root, ctxpath);
+         s = get_label_cp context;
+         return set{cp' | ellesem(root, cps, state)};
+	 
+       case JumpI(d):
+         // if we should jump
+         if(hd (evm_stack (st)) != 0) {
+           ctxpath = take((length cp - d), cp); //take all but last d elements
+	   context = get(root, ctxpath);
+	   s = get_label_cp context;
+	   return set{cp' | ellesem(root, cps, state)};
+         }
+         else {
+           if(getnext (root, cp) = null) return state;
+           else return ellesem(root, getnext(root, cp), state);
+         }
+	 
+       // sequences carry a list of sub-nodes
+       // we jump to all labels pointing to the Seq node "d" levels up
+       case Seq(l):
+         if(l == []) {
+           if(getnext (root, cp) = null) return state;
+           else return (ellesem(root, getnext(root, cp), state));
+          }
+          // if the list has children, run its first child nexts
+          else return ellesem(root, cp@[0], state);
+     }
+    
+   }
+
+    
+---------------------
+Notes on Interpreter
+---------------------
+
+The "jump" and "jumpI" cases in the above code explicitly return sets of states, which captures the nondeterminism of the semantics. All other (deterministic) cases can be considered to be implicitly returning singleton sets containing the single next state (for clarity I have left these implicit).
+
+
+=============================
+Inductive Semantics for Elle
+=============================
+
+(While this presentation of Elle's semantics may appear different from the
+interpreter, it should be possible to prove their equivalence. Indeed, this
+is a goal of the Gazelle project's semantics framework.)
+
+
+TODO: have the formal semantics (e.g. copy diagram from paper) in addition to interpreter
+
+TODOS
+
+Have a practical example of what the semantics do with a concrete program
+
+Say more about what the proof does? (Or at least link to :doc:`correctness`)