New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSTORE/SLOAD for byte arrays #97

Open
vbuterin opened this Issue Apr 26, 2016 · 6 comments

Comments

Projects
None yet
8 participants
@vbuterin
Copy link
Collaborator

vbuterin commented Apr 26, 2016

NOTE: this is split off from #86, and updated to incorporate what was agreed on the dev call on 2016.04.25.

Parameters

  • BASE_COST: 2500
  • BYTE_STORAGE_COST: 250
  • BYTE_CHANGE_COST: 40
  • MIN_COST: 2500
  • GSSIZE: 50
  • GSLOADBYTES: 50
  • METROPOLIS_FORK_BLKNUM: TBA

Specification

If block.number >= METROPOLIS_FORK_BLKNUM, then:

  1. Add an opcode SLOADBYTES at 0xe1, which takes three arguments off the stack: key, mstart, msize. Reads the storage of the account at the given key, and outputs the result into memory; unused memory is left unmodified, and memory is only extended to the point where it is needed (eg. if msize = 2**200 but the result only returns 1000 bytes, then you only pay gas for extending up to mstart + 1000). Gas cost GSLOADBYTES, plus GCOPY gas per 32 bytes (eg. 319 bytes -> 9 * GCOPY, 320 -> 10 * GCOPY, 321 -> 10 * GCOPY, just like CALLDATACOPY, CODECOPY and EXTCODECOPY)
  2. Add an opcode SSTOREBYTES at 0xe2, which takes three arguments off the stack: key, mstart, msize. Copies the given memory slice into that key in storage. Gas cost is computed according to the following schedule: total_cost = BASE_COST + BYTE_STORAGE_COST * (adjusted byte count of memory slice - adjusted byte count of previous contents) + BYTE_CHANGE_COST * (adjusted byte count of memory slice), gas_cost = max(total_cost, MIN_COST), refund = MIN_COST - min(total_cost, MIN_COST). Adjusted byte count = 32 + length if length is nonzero, otherwize 0. An SLOAD operation on contract data that was previously filled with SSTOREBYTES return the last 32 bytes of the data, right-zeropadded. Also, note that using SSTOREBYTES to save a slice of zero bytes actually saves the zero bytes; it does not delete the slot.
  3. Add an opcode SSIZE at 0xe3, which takes a key off the stack, and returns the length of the value in storage there. Gas cost GSSIZE.

Code for SSTOREBYTES

BASE_COST = 2500
BYTE_STORAGE_COST = 250
BYTE_CHANGE_COST = 40
MIN_COST = 2500
...
elif op == 'SSTOREBYTES':
    key, mstart, msize = stk.pop(), stk.pop(), stk.pop()
    if not mem_extend(mem, compustate, op, mstart, msize):
        return vm_exception('OOG EXTENDING MEMORY')
    prev_adjbyte_count = len(ext.get_storage_data(msg.to, key))
    if prev_adjbyte_count >= 0:
        prev_adjbyte_count += 32
    post_adjbyte_count = msize + (32 if msize else 0)
    gas_cost = BASE_COST + BYTE_STORAGE_COST * (post_adjbyte_count - prev_adjbyte_count) + BYTE_CHANGE_COST * post_adjbyte_count
    gas_payment = max(MIN_COST, gas_cost)
    refund = gas_payment - gas_cost
    if compustate.gas < gas_payment:
        return vm_exception('OUT OF GAS')
    compustate.gas -= gas_payment
    ext.set_storage_data(msg.to, ''.join(map(chr, mem[mstart: mstart + msize])))

Rationale

This allows contracts to store data more efficiently by reducing the number of trie/DB writes and reads needed for contract execution. Recent measurements show that trie and DB writes are a primary source of overhead, and serenity tests show a >2x speed improvement in some cases if code is written well. The canonical examples include:

  • Cases where a byte array or string needs to be saved to storage. One practical use case is the storage of 80-byte bitcoin block headers in btcrelay.
  • Cases where a struct needs to be saved to storage, and many elements in the struct are updated at the same time. One practical use case "order book" contracts where buyer, seller, amount, price, currency, etc are all entered at once.

Additionally, compiler logic may be simplified as storing data in sequential slots is no longer required. This does come at the cost of requiring updates to blockchain tools that currently assume that all storage values are either empty or exactly 32 bytes long.

The rationale behind adding a 32 byte cost to nonempty storage slots is that even a one-byte storage value will necessarily have overhead including key/value storage size costs, merkle tree branches/leaves, etc; additionally, an incentive is required to encourage saving a constant 320-byte chunk in a single key rather than splitting it up among 10 keys.

@wanderer

This comment has been minimized.

Copy link
Member

wanderer commented Apr 26, 2016

Possible implementation concern. Leveldb doesn't handle large values very well. To get around this, break large values into chunks and append nonce to the key. Since leveldb sorts key lexicographical it very efficient to stream the chunks. Leveldb has lg and lt operations, so the query needed here would be gt <key> and lt <key + 1> where +1 refers to the next value, not append.

@vbuterin

This comment has been minimized.

Copy link
Collaborator Author

vbuterin commented May 1, 2016

If you look at the bytes per second from that benchmark, you get 779k writes/sec * 100 bytes/write = 77.9 mbps for small values and 1100 writes/sec * 100000 bytes/write = 110 mbps for large values. So it doesn't seem to be fatal or even that problematic.

@chriseth

This comment has been minimized.

Copy link
Contributor

chriseth commented May 5, 2016

If this is adopted and should be usable from Solidity, it will add another layer of complexity. This "store as single blob" has to be an optional feature for structs and arrays, because otherwise, elements of the structs and arrays are not accessible as lvalues anymore.

Furthermore, the fact that storage size is dynamic will need some additional handling: If sloadbytes does not write to the full range of memory, we always have to check the size beforehand and optionally even throw if the size is not what we expect.

Finally, as already commented on the original issue, I think that having two "types" of storage slots will add similar complexity, although I think that this is not too big an issue.

@janx

This comment has been minimized.

Copy link
Member

janx commented May 6, 2016

elements of the structs and arrays are not accessible as lvalues anymore.

@chriseth can you give more details?

@Nashatyrev

This comment has been minimized.

Copy link
Member

Nashatyrev commented May 13, 2016

Are there any other rationales behind this EIP besides optimization?
If not than in my opinion this is the overkill to solve pure engineering performance problem with such a serious protocol change.
That change introduces a number of complexities (and potential vulnerabilities accordingly) for clients which are mostly relying on 32 bytes values. The change also complicates the protocol itself without introducing any new fundamental improvements.

@tawaren

This comment has been minimized.

Copy link

tawaren commented Feb 17, 2017

What is the status of this?
I assume the gas costs as stated now are not updated to include adaptions from #150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment