Permalink
Fetching contributors…
Cannot retrieve contributors at this time
386 lines (302 sloc) 12 KB

Determining ewasm gas costs

The goal of this document is to describe the process of determining gas costs for ewasm instructions.

Each WASM opcode is assigned an appropriate Intel IA-32 (x86) opcode (or a series of opcodes). These opcodes have a fixed cycle count (called latency by Intel). We are selecting one specific CPU model from the Haswell architecture (family: 06, model 3C). This equals to CPUs produced in 2014.

Assumption 1: This specific Haswell CPU represents a good average of the Ethereum nodes. We assume that a 2.2 Ghz model is the average.

According to Intel, the 2.2 Ghz clock rate roughly equals to 2 200 000 000 cycles per second.

Assumption 2: 1 second of CPU execution equals to 10 million gas (i.e. 1 gas equals to 0.1 us).

This equals to 0.0045 gas per cycle. (10 000 000 / 2 200 000 000)

To put this in perspective, the average block gas limit as of August 2016 is around 4.7 million. With this assumption we allow contract execution to take up to 0.5 second of the 15 seconds block time, which also has to include other processing, including PoW and network roundtrip times.

Assumption 3: The gas costs are adjusted on a regular basis, at least every 3 years.

We assume that CPUs are continuously improving and the hardware for Ethereum nodes are upgraded every 3 years (as that matches usual depreciation rates).

The upgrade procedure is not part of this document, but it is plausible to expect a dedicated gas cost oracle contract will exist in the future, which can be used by the metering injection contract.

Gas vs. Particles

The current gas handling fields do not offer the precision needed assuming that running WASM opcodes takes significantly less processing power compared to EVM1 opcodes.

Rationale: EVM1 opcodes operate on 256-bits of data, while WASM opcodes are limited to at most 64-bits, which results in executing four instructions in the best case to match EVM1. Most arithmetic instructions in EVM1 cost 3 gas, which would amount to 0.75 gas for most 64-bit WASM instructions.

Internally, ewasm gas measurements should be recorded in a 64 bit variable with 4 decimal digits precision. We call this particles. It is a minor implementation detail actually using integers and converting the below gas costs appropriately.

When converting the particles count to Ethereum gas, it has to be divided by 10000 and has to be rounded up. If the result is less than 0, then it should equal to 1.

Gas costs of individual instructions

The formula for determining the gas cost is: <cycle count> * <gas per cycle>

Registers

Opcode Cycle IA-32 eqv. Gas
get_local 3 MOV 0.0135
set_local 3 MOV 0.0135
tee_local 3 MOV 0.0135
get_global 3 MOV 0.0135
set_global 3 MOV 0.0135

Memory

Opcode Cycle IA-32 eqv. Gas
i32.load8_s 3 MOV 0.0135
i32.load8_u 3 MOV 0.0135
i32.load16_s 3 MOV 0.0135
i32.load16_u 3 MOV 0.0135
i32.load 3 MOV 0.0135
i64.load8_s 3 MOV 0.0135
i64.load8_u 3 MOV 0.0135
i64.load16_s 3 MOV 0.0135
i64.load16_u 3 MOV 0.0135
i64.load32_s 3 MOV 0.0135
i64.load32_u 3 MOV 0.0135
i64.load 3 MOV 0.0135
grow_memory ? (breaking out from the VM)
current_memory ? (breaking out from the VM)

Flow Control

Opcode Cycle IA-32 eqv.
nop ?
block 0 (this is only a grouping)
loop 0 (same as block)
if 0 (only a grouping, then/else counts)
then 2 JMP (near)
else 2 JMP (near)
br 2 JMP (near)
br_if 3 CMP, JMP (near)
br_table 2 JMP (near)
return 2 RET or JMP

Calls

Opcode Cycle IA-32 eqv.
call 2 CALL (near)
call_indirect ? CALL (far) (breaking out from the VM)
call_import ? CALL (far) (breaking out from the VM)

Constants

Opcode Cycle IA-32 eqv.
i32.const 0
i64.const 0

32-bit Integer operators

Opcode Cycle IA-32 eqv. Gas
i32.add 1 ADD 0.0045
i32.sub 1 SUB 0.0045
i32.mul 3 MUL 0.0135
i32.div_s 80 DIV 0.36
i32.div_u 80 DIV 0.36
i32.rem_s 80 DIV 0.36
i32.rem_u 80 DIV 0.36
i32.and 1 AND 0.0045
i32.or 1 OR 0.0045
i32.xor 1 XOR 0.0045
i32.shl 1.5 SHL 0.0067
i32.shr_u 1.5 SHR 0.0067
i32.shr_s 1.5 SHR 0.0067
i32.rotl 2 ROL 0.0090
i32.rotr 2 ROR 0.0090
i32.eq 1 CMP 0.0045
i32.eqz 1 CMP 0.0045
i32.ne 1 CMP 0.0045
i32.lt_s 1 CMP 0.0045
i32.lt_u 1 CMP 0.0045
i32.le_s 1 CMP 0.0045
i32.le_u 1 CMP 0.0045
i32.gt_s 1 CMP 0.0045
i32.gt_u 1 CMP 0.0045
i32.ge_s 1 CMP 0.0045
i32.ge_u 1 CMP 0.0045
i32.clz 105 (clz) 0.4725
i32.ctz 105 (ctz) 0.4725
i32.popcnt ?

64-bit Integer operators

Opcode Cycle IA-32 eqv. Gas
i64.add 1 ADD 0.0045
i64.sub 1 SUB 0.0045
i64.mul 3 MUL 0.0135
i64.div_s 80 DIV 0.36
i64.div_u 80 DIV 0.36
i64.rem_s 80 DIV 0.36
i64.rem_u 80 DIV 0.36
i64.and 1 AND 0.0045
i64.or 1 OR 0.0045
i64.xor 1 XOR 0.0045
i64.shl 1.5 SHL 0.0067
i64.shr_u 1.5 SHR 0.0067
i64.shr_s 1.5 SHR 0.0067
i64.rotl 2 ROL 0.0090
i64.rotr 2 ROR 0.0090
i64.eq 1 CMP 0.0045
i64.eqz 1 CMP 0.0045
i64.ne 1 CMP 0.0045
i64.lt_s 1 CMP 0.0045
i64.lt_u 1 CMP 0.0045
i64.le_s 1 CMP 0.0045
i64.le_u 1 CMP 0.0045
i64.gt_s 1 CMP 0.0045
i64.gt_u 1 CMP 0.0045
i64.ge_s 1 CMP 0.0045
i64.ge_u 1 CMP 0.0045
i64.clz ?
i64.ctz ?
i64.popcnt ?

Datatype conversions, truncations, reinterpretations, promotions, and demotions

Opcode Cycle IA-32 eqv. Gas
i32.wrap/i64 3 MOV 0.0135
i64.extend_s/i32 ? (signextend)
i64.extend_u/i32 3 MOV 0.0135

Type-parametric operators.

Opcode Cycle IA-32 eqv. Gas
drop 3 MOV 0.0135
select 3 CMP, JMP 0.0135

Other

Opcode Cycle IA-32 eqv. Gas
unreachable 0 (breaking out from the VM - or INT3)

Notes about complex instructions

Some of the above instructions are complex and cannot be mapped 1-by-1 to a machine instruction. For them we assume the below algorithms and calculate gas based on the component costs.

i32.clz

Number of leading 0-bits within 32 bits.

(Based on Warren, Section 5-3: Counting Leading 0's:)

function clz(x)
    if x = 0 return 32
    n ← 0
    if (x & 0xFFFF0000) = 0: n ← n + 16, x ← x << 16
    if (x & 0xFF000000) = 0: n ← n +  8, x ← x <<  8
    if (x & 0xF0000000) = 0: n ← n +  4, x ← x <<  4
    if (x & 0xC0000000) = 0: n ← n +  2, x ← x <<  2
    if (x & 0x80000000) = 0: n ← n +  1
    return n

For an implementation in WebAssembly, see Appendix A.

Counting the individual instructions results in:

  • 6 if
  • 6 then
  • 2 return
  • 5 and
  • 5 eqz,
  • 11 get_local,
  • 11 set_local,
  • 5 add,
  • 5 shl

These amount to a total of 104.5 cycles in the worst case.

Note: clz exists natively on most common CPUs, including Intel and ARM CPUs. We could consider it native for the purposes of Ethereum and assign a lower gas cost.

i32.ctz

Number of trailing 0-bits within 32 bits.

(Based on Warren, Section 5-4: Counting Trailing 0's.:)

function ctz (x)
    if x = 0 return 32
    n ← 0
    if (x & 0x0000FFFF) = 0: n ← n + 16, x ← x >> 16
    if (x & 0x000000FF) = 0: n ← n +  8, x ← x >>  8
    if (x & 0x0000000F) = 0: n ← n +  4, x ← x >>  4
    if (x & 0x00000003) = 0: n ← n +  2, x ← x >>  2
    if (x & 0x00000001) = 0: n ← n +  1
    return n

This algorithm has the very same steps as i32.clz and therefore their cost equals.

Note: ctz does not exists natively on ARM CPUs. ARM should be considered an important platform for Ethereum light clients and therefore it is sensible to assign a gas cost based on a complex implementation.

i32.popcnt

Number of 1-bits within 32 bits.

TBD

Note: popcnt is natively supported by recent CPUs, mostly through SIMD extensions only.

i64.clz

Number of leading 0-bits within 64 bits.

TBD

i64.ctz

Number of trailing 0-bits within 64 bits.

TBD

i64.popcnt

Number of 1-bits within 64 bits.

TBD

i64.extend_s/i32

Sign extend i32 to i64.

Given WebAssembly supports both arithmetic (shr_s) and logical right shifts (shr_u) it is fairly simple to implement:

function signextend (x)
    return (x << 32) >>> 32

This cost equals the sum of i64.shl and i64.shr_s, ? cycles.

TBD

select

Ternary operator.

TBD

breaking out of the VM

Any instruction pausing the VM and transferring data between the ewasm contract and the host is breaking out of the VM.

These instructions include:

  • current_memory
  • grow_memory
  • call_indirect
  • call_import

TBD

Appendix A: i32.clz in WebAssembly

(func $clz
  (param $x i32)
  (result i32)

  (local $n i32)

  (if (i32.eqz (get_local $x))
    (then
      (return (i32.const 32))
    )
  )

  (set_local $n (i32.const 0))

  (if (i32.eqz (i32.and (get_local $x) (i32.const 0xffff0000))
    (then
      (set_local $n (i32.add (get_local $n) (i32.const 16)))
      (set_local $x (i32.shl (get_local $x) (i32.const 16)))
    )
  )

  (if (i32.eqz (i32.and (get_local $x) (i32.const 0xff000000))
    (then
      (set_local $n (i32.add (get_local $n) (i32.const 8)))
      (set_local $x (i32.shl (get_local $x) (i32.const 8)))
    )
  )

  (if (i32.eqz (i32.and (get_local $x) (i32.const 0xf0000000))
    (then
      (set_local $n (i32.add (get_local $n) (i32.const 4)))
      (set_local $x (i32.shl (get_local $x) (i32.const 4)))
    )
  )

  (if (i32.eqz (i32.and (get_local $x) (i32.const 0xc0000000))
    (then
      (set_local $n (i32.add (get_local $n) (i32.const 2)))
      (set_local $x (i32.shl (get_local $x) (i32.const 2))
    )
  )

  (if (i32.eqz (i32.and (get_local $x) (i32.const 0x80000000))
    (then
      (set_local $n (i32.add (get_local $n) (i32.const 1)))
    )
  )

  (return (get_local $n))
)

References