New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVM 2/wasm 105 #48

Open
wanderer opened this Issue Dec 22, 2015 · 12 comments

Comments

Projects
None yet
9 participants
@wanderer
Member

wanderer commented Dec 22, 2015

EVM 2.0

Abstract

This EIP proposes upgrading the VM by converting to use a subset of Webassembly (Wasm). Wasm is a new Assembly standard being built for the web. The main advantage of using wasm is performance (both speed and size). The main disadvantage would be the need to port our existing infrastructure to use a new ISA. Most of this EIP is taken from Wasm's design docs which should be referenced for futher details.

Motivation

To truly distinguish Ethereum as the World Computer we need to have a performant VM. The current architecture of the VM is one of the greatest blockers to raw performance. Being stack-based and the 256-bit word size make translation from EVM opcodes to hardware instructions more difficult than needed.

With an architecture that provides a closer mapping to hardware the VM will have a considerably enhanced performance which will effectively open the door to a much wider array of uses that require a much higher performance/throughput. Also, by choosing a common and more standardized architecture anyone will be able to compile C/C++, Solidity (, etc.) once, and the compiled code will run in multiple environments. Using the new Assembly standard will make running a program either directly on Ethereum, on a cloud hosting environment, or on one's local machine - a friction-less process.

Specification

Wasm is a kin to LLVM IR and Low Level Instruction Set Architectures (ISAs). It is an defined as an Abstract Syntax Tree (AST) which has a textual representation using s-expressions. As an AST it has some higher level semantics than pure hardware ISAs. Including functions and basic flow control. Wasm is not finished a specification and still in flux. This EIP suggest using a subset of wasm to allow us to tailor the semantics to Ethereum. This spec is designed so that the minimum amount of changes a wasm VM would need to be implemented. This should make existing implementations of wasm easy to integrate and provide a way to run Ethereum flavored wasm on a unmodified wasm VM.

A Wasm module expressed in s-expressions looks like

(module
  (func $add (param $x i64) (param $y i64) (result i64) (i64.add (get_local $x) (get_local $y)))
  (func $sub (param $x i64) (param $y i64) (result i64) (i64.sub (get_local $x) (get_local $y)))

  (export "add" $add)
  (export "sub" $sub)
  )

Eth Flavored wasm

(module
  (func $add (param $x) (param $y) (result) (add (get_local $x) (get_local $y)))
  (func $sub (param $x) (param $y) (result) (sub (get_local $x) (get_local $y)))

  (export "add" $add)
  (export "sub" $sub)
)

For Ethereum we can consider each contract as a module therefore we can have an intrinsic modules and we do not need to specify the module’s closure (we leave it now for clarity). Furthermore if we restrict the types to 64 bit then we can eliminate the type information from the locals definition.

Difference from WASM

  • A local variable limit
  • Metering
  • Shared memory for call returns
  • Limit types and functions to i64
  • No Datatype conversion
  • No select op
  • No comma op
  • No conditional op
  • No need for a explicit module definition

Metering

Metering can initial be accomplished by injecting the counting code into the AST then passing the modified AST to a wasm VM. Modifying the AST is done by traversing the AST and adding a gas check immediately after each branch condition and at the start of functions and loops. For a more performant version gas counting could possibly be done at the VM directly. But from initial trials injecting gas at the AST level does not greatly affect performance. Since the gas counting code must be protected it would have to be implemented in a separate module.

Ethereum Specific Opcodes

  • call_delegate
  • call_code
  • create
  • gas
  • logs
  • suicide
  • environmental information (blockhash, coinbase, etc)
  • general module imports

The above opcodes should be used with WASM's call. call_delegate would become call $call_delegate <params> and so on.

The open-ended nature of module imports allow them to be used to expose arbitrary host environment functionality to WebAssembly code, similar to a native syscall. For example, a shell environment could define a builtin stdio module with an export puts. referance

Whether the module system could be used for contracts in general is an open questions. For now we will not use it in favor of using the CALL, CALLCODE and CALLDELEGATE semantics. This could be revisited once wasm has support for runtime dlopen like dynamic linking.

Modified Ethereum operations

Elimination of PUSH, DUP, SWAP, POP, PUSH, JUMP, JUMPI

SUICIDE, CALL, CALLCODE, CALLDELEGATE now gets the to address from memory and are expressions.

note on CALLs

The module loading system can’t yet support call since exported function are only allowed to return one value and do not have shared memory access. When implementing SUICIDE, CALL, CALLCODE, CALLDELEGATE the host environment will have to load the return values into the specified location in linear memory.

Abstract Syntax Tree Semantics

For the current full definition see here I’m only going to post here the core semantics that we are intersted in. For more details see the accompanying wiki. A formal spec such for each opcode will be worked out if this EIP gains Momentum. Currently this EIP suggests limiting the functions to only 64 bit Integer operations.

Internal Function Calls

Each function has a signature, which consists of: Return types, which are a sequence of value types Argument types, which are a sequence of value types

WebAssembly doesn't support variable-length argument lists (aka varargs). Compilers targeting WebAssembly can instead support them through explicit accesses to linear memory. The length of the return types sequence may only be 0 or 1

Direct calls to a function specify the callee by index into a main function table.

  • call_func: call function directly
    A direct call to a function with a mismatched signature is a module verification error.

Integer Operations

  • add: sign-agnostic addition
  • sub: sign-agnostic subtraction
  • mul: sign-agnostic multiplication (lower 32-bits)
  • div_s: signed division (result is truncated toward zero)
  • div_u: unsigned division (result is floored)
  • rem_s: signed remainder (result has the sign of the dividend)
  • rem_u: unsigned remainder
  • and: sign-agnostic bitwise and
  • or: sign-agnostic bitwise inclusive or
  • xor: sign-agnostic bitwise exclusive or
  • shl: sign-agnostic shift left
  • shr_u: zero-replicating (logical) shift right
  • shr_s: sign-replicating (arithmetic) shift right
  • eq: sign-agnostic compare equal
  • ne: sign-agnostic compare unequal
  • lt_s: signed less than
  • le_s: signed less than or equal
  • lt_u: unsigned less than
  • le_u: unsigned less than or equal
  • gt_s: signed greater than
  • ge_s: signed greater than or equal
  • gt_u: unsigned greater than
  • ge_u: unsigned greater than or equal
  • clz: sign-agnostic count leading zero bits (All zero bits are considered leading if the value is zero)
  • ctz: sign-agnostic count trailing zero bits (All zero bits are considered trailing if the value is zero)
  • popcnt: sign-agnostic count number of one bits

Flow Control

  • block: a fixed-length sequence of expressions with a label at the end.
  • loop: a block with an additional label at the beginning which may be used to form loops
  • br: branch to a given label in an enclosing construct
  • br_if: conditionally branch to a given label in an enclosing construct
  • get_reg: gets a register. We constrain registers to be one byte. Therefor we have 255 registers
  • store_reg: stores a register.

The br and br_if constructs express low-level branching. Branches that exit a block or loop may take a subexpression that yields a value for the exited construct. Branches may only reference labels defined by an outer enclosing construct. This means that, for example, references to a block's label can only occur within the block's body. In practice, outer blocks can be used to place labels for any given branching pattern, except for one restriction: one can't branch into the middle of a loop from outside it. This restriction ensures all control flow graphs are well-structured

  • unreachable: An expression which can take on any type, and which, if executed, always traps. It is intended to be used for example after calls to functions which are known by the producer not to return (otherwise the producer would have to create another expression with an unused value to satisfy the type check). This trap is intended to be impossible for user code to catch or handle, even in the future when it may be possible to handle some other kinds of traps or exceptions.

Linear Memory

The main storage of a WebAssembly instance, called the linear memory, is a contiguous, byte-addressable range of memory spanning from offset 0 and extending for memory_size bytes which can be dynamically grown by grow_memory. The linear memory can be considered to be an untyped array of bytes, and it is unspecified how embedders map this array into their process' own virtual memory. The linear memory is sandboxed; it does not alias the execution engine's internal data structures, the execution stack, local variables, or other process memory. The initial state of linear memory is specified by the module.

Linear Memory Accesses

Linear memory access is accomplished with explicit load and store operators. Integer loads can specify a storage size which is smaller than the result type as well as a signedness which determines whether the bytes are sign- or zero- extended into the result type.

  • load8_s: load 1 byte and sign-extend i8 to i32
  • load8_u: load 1 byte and zero-extend i8 to i32
  • load16_s: load 2 bytes and sign-extend i16 to i32
  • load16_u: load 2 bytes and zero-extend i16 to i32
  • load: load 4 bytes as i32

Stores have an additional input operand which is the value to store to memory. Like loads, integer stores may specify a smaller storage size than the operand size in which case integer wrapping is implied.

  • store8: wrap i64 to i8 and store 1 byte
  • store16: wrap i64 to i16 and store 2 bytes
  • store32: wrap i64 to i32 and store 4 bytes
  • store: (no conversion) store 8 bytes

Local variables

Each function has a fixed, pre-declared number of local variables which occupy a single index space local to the function. Parameters are addressed as local variables. Local variables do not have addresses and are not aliased by linear memory. Local variables have value types and are initialized to the appropriate zero value for their type at the beginning of the function, except parameters which are initialized to the values of the arguments passed to the function. Local variables can be thought of as registers

  • get_local: read the current value of a local variable
  • set_local: set the current value of a local variable

Module start function

If the module has a start node defined, the function it refers should be called
by the loader after the instance is initialized and before the exported functions
are called.

  • The start function must not take any arguments or return anything
  • The function can also be exported
  • There can only be at most one start node per module

For example, a start node in a module will be:

(start $start_function)

or

(start 0)

In the first example, the environment is expected to call the function $start_function
before calling any other module function. In the second case, the environment is
expected to call the module function indexed 0.

A module can:

  • Only have at most a start node
  • If a module contains a start node, the function must be defined in the module
  • The start function will be called after module loading and before any call to the module
    function is done

Binary Specification

Further attention to the binary encode can be tracked here. Currently wasm only has an early binary encoding spec.
Its ends up being fairly easy to follow. But not all of the features are relevant to Ethereum so some iteration can be done there. An overview of binary encoding can be seen here.

Rationale

  • Faster. Here are the benchmarks results. The benchmarks are based of off wasm-to-llvm-prototype. It compares a Wasm JIT based on LLVM to a EVM JIT which is also uses LLVM.
  • Maybe Smaller. Using the code from the benchmarks I compared the binary size. The results are that WASM was 29% smaller. But there is not enough real world data to be confident of this result. Also It compares handwritten WASM to EVM generated by solidity.
  • Toolchain Compatibility. LLVM front-end.
  • A large development base. It is being developed by Mozilla, Google, Microsoft, and Apple. Wasm is already in the Chromium's code base and is targeted to be deployed in all the major web browsers. Which would result in it being one of the most widely deployed VM architecture.
  • Furthermore some of Wasm's top design goals are largely applicable to Ethereum

Define a portable, size- and load-time-efficient binary format to serve as a compilation target which can be compiled to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms, including mobile and IoT.

Detailed Rationale

Rationale For Registered Based ISA.

  • Register-based virtual machines are more like actual hardware.
  • Easier to JIT
  • Although most early computers used stack or accumulator-style architectures, virtually every new architecture designed after 1980 uses a load-store register architecture. The major reasons for the emergence of general-purpose register (GPR) computers are twofold. First, registers—like other forms of storage internal to the processor—are faster than memory. Second, registers are more efficient for a compiler to use than other forms of internal storage. For example, on a register computer the expression (A * B) – (B * C) – (A * D) may be evaluated by doing the multiplications in any order, which may be more efficient because of the location of the operands or because of pipelining concerns. Nevertheless, on a stack computer the hardware must evaluate the expression in only one order, since operands are hidden on the stack, and it may have to load an operand multiple times. More importantly, registers can be used to hold variables. When variables are allocated to registers, the memory traffic reduces, the program speeds up (since registers are faster than memory), and the code density improves (since a register can be named with fewer bits than can a memory location). Reference
  • (Java is stack based.) "Java byte-codes have additional disadvantages. Directly mapping byte-codes onto the underlying architecture is much more difficult than generating machine instructions from an abstract syntax-tree. Code generators that are based on a high-level representation do not have to deal with unfavorable peculiarities of Java byte-codes but can tailor their output towards advanced and specific processor features, such as special purpose instructions, size of register sets, and cache architectures. This is especially true for today's most common RISC processors which are less suited for byte-code's heavily used stack operations. Reference ftp://ftp.cis.upenn.edu/pub/cis700/public_html/papers/Kistler96.pdf
  • The design of the Inferno virtual machine
  • Virtual Machine Showdown: Stack Versus Registers

AST

  • compression benefits "use predictive compression algorithm that allows to encode recurring sub expressions in a program space efficiently while facilitating also fast decoding with simultaneous code-generation. Our compression scheme is based on adaptive methods such as LZW but has been tailored towards encoding abstract syntax trees rather than character streams. It takes advantage of the limited scope of variables in programming languages,which allows to deterministically prune entries from the compression dictionary, and uses prediction heuristics to achieve a denser encoding." Reference ftp://ftp.cis.upenn.edu/pub/cis700/public_html/papers/Franz97b.pdf
  • JITing benefits.
  • Notes on better JIT compilers using AST

64-bit vs 256-bit

A 64 bit word size would map to real world hardware. Most the time we don’t need 256 bit arithmetic. 256 was chosen to work with 256-bit hashes but we can easily store hash results in memory. Some opcodes such as CALL would have to load the to address from memory.

Implementation

While the complete EIP is not implemented many pieces of it are.

There are several Wasm implementations; here are a few.

References

@CJentzsch

This comment has been minimized.

Show comment
Hide comment
@CJentzsch

CJentzsch Dec 22, 2015

All upgrades must be downwardly compatible. How do you ensure that old contracts will still run the same way?

CJentzsch commented Dec 22, 2015

All upgrades must be downwardly compatible. How do you ensure that old contracts will still run the same way?

@wanderer

This comment has been minimized.

Show comment
Hide comment
@wanderer

wanderer Dec 22, 2015

Member

@CJentzsch You would have to transcompile all the existing EVM code to WASM. Something like this has been done before with the EVM JIT. Except that it targeted LLVM IR. There are a couple of opcodes that are not backwards compatible. gas and PC might have a different expected values

Member

wanderer commented Dec 22, 2015

@CJentzsch You would have to transcompile all the existing EVM code to WASM. Something like this has been done before with the EVM JIT. Except that it targeted LLVM IR. There are a couple of opcodes that are not backwards compatible. gas and PC might have a different expected values

@wanderer wanderer changed the title from EVM 2 105 to EVM 2/wasm 105 Dec 22, 2015

@karalabe

This comment has been minimized.

Show comment
Hide comment
@karalabe

karalabe Dec 30, 2015

Member

Although this idea does have its merits, I honestly don't think this will ever happen. We've just launched Ethereum and the EVM, and even getting a single opcode added to it is a horribly painful and error prone procedure. By replacing the entire EVM we might as well just launch a new network from scratch. I don't think we could rationalize any speed increases with the enormous costs associated with implementing the proposed WASM based VM in all official languages, battle testing that they work and also potentially having them audited.

Member

karalabe commented Dec 30, 2015

Although this idea does have its merits, I honestly don't think this will ever happen. We've just launched Ethereum and the EVM, and even getting a single opcode added to it is a horribly painful and error prone procedure. By replacing the entire EVM we might as well just launch a new network from scratch. I don't think we could rationalize any speed increases with the enormous costs associated with implementing the proposed WASM based VM in all official languages, battle testing that they work and also potentially having them audited.

@wanderer

This comment has been minimized.

Show comment
Hide comment
@wanderer

wanderer Dec 30, 2015

Member

@karalabe yep that is the key question. C++ wouldn't have much to implement since they would just have to wrap Intel's JIT. But it might be a lot of work for go and Python. Lucky WASM is taking a similar approach to testing as us. They have a standalone test repo that all implementation must pass. We wouldn't be doing everything from scratch.

Member

wanderer commented Dec 30, 2015

@karalabe yep that is the key question. C++ wouldn't have much to implement since they would just have to wrap Intel's JIT. But it might be a lot of work for go and Python. Lucky WASM is taking a similar approach to testing as us. They have a standalone test repo that all implementation must pass. We wouldn't be doing everything from scratch.

@bobsummerwill

This comment has been minimized.

Show comment
Hide comment
@bobsummerwill

bobsummerwill Jan 5, 2016

@wanderer Is there something specific to WASM which makes it a better fit than just using LLVM IR, which is already defined, and becoming very close to being a universal AST standard?

If it was feasible to use LLVM IR as the target then all the numerous existing LLVM backends could be used off-the-shelf (C++, C#, JS, and many more) right now, and you would still be able to target WASM as-and-when that settles down.

bobsummerwill commented Jan 5, 2016

@wanderer Is there something specific to WASM which makes it a better fit than just using LLVM IR, which is already defined, and becoming very close to being a universal AST standard?

If it was feasible to use LLVM IR as the target then all the numerous existing LLVM backends could be used off-the-shelf (C++, C#, JS, and many more) right now, and you would still be able to target WASM as-and-when that settles down.

@wanderer

This comment has been minimized.

Show comment
Hide comment
@wanderer

wanderer Jan 5, 2016

Member

@bobsummerwill I actually compiled some notes on this subject. Here the are. Let me know if you have anymore questions!

From LLVM Language Reference:

The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation.

Good

  • very tested
  • large community
  • was used by googles PNACL
  • widely deployed

Bad

  • not intrinsically portable
  • not stable
  • lage surface (ISA) that VM implementors would have to deal with

Response from Derek Schuff (one of the engineers for pNACL) from google on WASM vs LLVM

I'm guessing you are unfamiliar with PNaCl. This is more or less the approach taken by PNaCl; i.e. use LLVM as the starting point for a wire format. It turns out that LLVM IR/bitcode by itself is neither portable nor stable enough to be used for this purpose, and it is designed for compiler optimizations, it has a huge surface area, much more than is needed for this purpose. PNaCl solves these problems by defining a portable target triple (an architecture called "le32" used instead of e.g. i386 or arm), a subset of LLVM IR, and a stable frozen wire format based on LLVM's bitcode. So this approach (while not as simple as "use LLVM-IR directly") does work. However LLVM's IR and bitcode formats were designed (respectively) for use as a compiler IR and for temporary file serialization for link-time optimization. They were not designed for the goals we have, in particular a small compressed distribution format and fast decoding. We think we can do much better for wasm, with the experience we've gained from PNaCl

Further research on LLVM
LLVM IR is meant to make compiler optimizations easy to implement, and to represent the constructs and semantics required by C, C++, and other languages on a large variety of operating systems and architectures. This means that by default the IR is not portable (the same program has different representations for different architectures) or stable (it changes over time as optimization and language requirements change). It has representations for a huge variety of information that is useful for implementing mid-level compiler optimizations but is not useful for code generation (but which represents a large surface area for codegen implementers to deal with). It also has undefined behavior (largely similar to that of C and C++) which makes some classes of optimization feasible or more powerful, but which can lead to unpredictable behavior at runtime. LLVM's binary format (bitcode) was designed for temporary on-disk serialization of the IR for link-time optimization, and not for stability or compressibility (although it does have some features for both of those).

None of these problems are insurmountable. For example PNaCl defines a small portable subset of the IR with reduced undefined behavior, and a stable version of the bitcode encoding. It also employs several techniques to improve startup performance. However, each customization, workaround, and special solution means less benefit from the common infrastructure

Member

wanderer commented Jan 5, 2016

@bobsummerwill I actually compiled some notes on this subject. Here the are. Let me know if you have anymore questions!

From LLVM Language Reference:

The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation.

Good

  • very tested
  • large community
  • was used by googles PNACL
  • widely deployed

Bad

  • not intrinsically portable
  • not stable
  • lage surface (ISA) that VM implementors would have to deal with

Response from Derek Schuff (one of the engineers for pNACL) from google on WASM vs LLVM

I'm guessing you are unfamiliar with PNaCl. This is more or less the approach taken by PNaCl; i.e. use LLVM as the starting point for a wire format. It turns out that LLVM IR/bitcode by itself is neither portable nor stable enough to be used for this purpose, and it is designed for compiler optimizations, it has a huge surface area, much more than is needed for this purpose. PNaCl solves these problems by defining a portable target triple (an architecture called "le32" used instead of e.g. i386 or arm), a subset of LLVM IR, and a stable frozen wire format based on LLVM's bitcode. So this approach (while not as simple as "use LLVM-IR directly") does work. However LLVM's IR and bitcode formats were designed (respectively) for use as a compiler IR and for temporary file serialization for link-time optimization. They were not designed for the goals we have, in particular a small compressed distribution format and fast decoding. We think we can do much better for wasm, with the experience we've gained from PNaCl

Further research on LLVM
LLVM IR is meant to make compiler optimizations easy to implement, and to represent the constructs and semantics required by C, C++, and other languages on a large variety of operating systems and architectures. This means that by default the IR is not portable (the same program has different representations for different architectures) or stable (it changes over time as optimization and language requirements change). It has representations for a huge variety of information that is useful for implementing mid-level compiler optimizations but is not useful for code generation (but which represents a large surface area for codegen implementers to deal with). It also has undefined behavior (largely similar to that of C and C++) which makes some classes of optimization feasible or more powerful, but which can lead to unpredictable behavior at runtime. LLVM's binary format (bitcode) was designed for temporary on-disk serialization of the IR for link-time optimization, and not for stability or compressibility (although it does have some features for both of those).

None of these problems are insurmountable. For example PNaCl defines a small portable subset of the IR with reduced undefined behavior, and a stable version of the bitcode encoding. It also employs several techniques to improve startup performance. However, each customization, workaround, and special solution means less benefit from the common infrastructure

@bobsummerwill

This comment has been minimized.

Show comment
Hide comment
@bobsummerwill

bobsummerwill Jan 5, 2016

Thanks for the excellently detailed answer, @wanderer!

bobsummerwill commented Jan 5, 2016

Thanks for the excellently detailed answer, @wanderer!

@frozeman frozeman added the EIP label Jan 6, 2016

@chfast

This comment has been minimized.

Show comment
Hide comment
@chfast

chfast Jan 7, 2016

Contributor

I agree LLVM IR is not a choice. It is huge and unstable (intentionally). Even LLVM does not have any proper interpreter for LLVM IR (they used to have it but it stopped being maintained some time ago).

Contributor

chfast commented Jan 7, 2016

I agree LLVM IR is not a choice. It is huge and unstable (intentionally). Even LLVM does not have any proper interpreter for LLVM IR (they used to have it but it stopped being maintained some time ago).

@wanderer

This comment has been minimized.

Show comment
Hide comment
@wanderer

wanderer Mar 1, 2016

Member

update, the binary encoding has now been speficied https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md

Member

wanderer commented Mar 1, 2016

update, the binary encoding has now been speficied https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md

@wanderer

This comment has been minimized.

Show comment
Hide comment
@wanderer

wanderer Mar 8, 2016

Member

Future discussion should go here https://github.com/ethereum/ewasm-design

Member

wanderer commented Mar 8, 2016

Future discussion should go here https://github.com/ethereum/ewasm-design

@wanderer wanderer added the EVM label Apr 14, 2016

@axic

This comment has been minimized.

Show comment
Hide comment
@axic

axic Aug 7, 2016

Member

Would it make sense closing this issue and reopening it or opening a new issue, once an actual proposal can be made? The content here is really outdated, especially given eWASM depends on a final/stable version of WASM.

Member

axic commented Aug 7, 2016

Would it make sense closing this issue and reopening it or opening a new issue, once an actual proposal can be made? The content here is really outdated, especially given eWASM depends on a final/stable version of WASM.

@axic

This comment has been minimized.

Show comment
Hide comment
@axic

axic Jan 12, 2018

Member

@wanderer as agreed on AllCoreDev#31, can you please close this?

Member

axic commented Jan 12, 2018

@wanderer as agreed on AllCoreDev#31, can you please close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment