Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSTORE/SLOAD for byte arrays #97

Closed
vbuterin opened this issue Apr 26, 2016 · 9 comments
Closed

SSTORE/SLOAD for byte arrays #97

vbuterin opened this issue Apr 26, 2016 · 9 comments
Labels

Comments

@vbuterin
Copy link
Contributor

vbuterin commented Apr 26, 2016

NOTE: this is split off from #86, and updated to incorporate what was agreed on the dev call on 2016.04.25.

Parameters

  • BASE_COST: 2500
  • BYTE_STORAGE_COST: 250
  • BYTE_CHANGE_COST: 40
  • MIN_COST: 2500
  • GSSIZE: 50
  • GSLOADBYTES: 50
  • METROPOLIS_FORK_BLKNUM: TBA

Specification

If block.number >= METROPOLIS_FORK_BLKNUM, then:

  1. Add an opcode SLOADBYTES at 0xe1, which takes three arguments off the stack: key, mstart, msize. Reads the storage of the account at the given key, and outputs the result into memory; unused memory is left unmodified, and memory is only extended to the point where it is needed (eg. if msize = 2**200 but the result only returns 1000 bytes, then you only pay gas for extending up to mstart + 1000). Gas cost GSLOADBYTES, plus GCOPY gas per 32 bytes (eg. 319 bytes -> 9 * GCOPY, 320 -> 10 * GCOPY, 321 -> 10 * GCOPY, just like CALLDATACOPY, CODECOPY and EXTCODECOPY)
  2. Add an opcode SSTOREBYTES at 0xe2, which takes three arguments off the stack: key, mstart, msize. Copies the given memory slice into that key in storage. Gas cost is computed according to the following schedule: total_cost = BASE_COST + BYTE_STORAGE_COST * (adjusted byte count of memory slice - adjusted byte count of previous contents) + BYTE_CHANGE_COST * (adjusted byte count of memory slice), gas_cost = max(total_cost, MIN_COST), refund = MIN_COST - min(total_cost, MIN_COST). Adjusted byte count = 32 + length if length is nonzero, otherwize 0. An SLOAD operation on contract data that was previously filled with SSTOREBYTES return the last 32 bytes of the data, right-zeropadded. Also, note that using SSTOREBYTES to save a slice of zero bytes actually saves the zero bytes; it does not delete the slot.
  3. Add an opcode SSIZE at 0xe3, which takes a key off the stack, and returns the length of the value in storage there. Gas cost GSSIZE.

Code for SSTOREBYTES

BASE_COST = 2500
BYTE_STORAGE_COST = 250
BYTE_CHANGE_COST = 40
MIN_COST = 2500
...
elif op == 'SSTOREBYTES':
    key, mstart, msize = stk.pop(), stk.pop(), stk.pop()
    if not mem_extend(mem, compustate, op, mstart, msize):
        return vm_exception('OOG EXTENDING MEMORY')
    prev_adjbyte_count = len(ext.get_storage_data(msg.to, key))
    if prev_adjbyte_count >= 0:
        prev_adjbyte_count += 32
    post_adjbyte_count = msize + (32 if msize else 0)
    gas_cost = BASE_COST + BYTE_STORAGE_COST * (post_adjbyte_count - prev_adjbyte_count) + BYTE_CHANGE_COST * post_adjbyte_count
    gas_payment = max(MIN_COST, gas_cost)
    refund = gas_payment - gas_cost
    if compustate.gas < gas_payment:
        return vm_exception('OUT OF GAS')
    compustate.gas -= gas_payment
    ext.set_storage_data(msg.to, ''.join(map(chr, mem[mstart: mstart + msize])))

Rationale

This allows contracts to store data more efficiently by reducing the number of trie/DB writes and reads needed for contract execution. Recent measurements show that trie and DB writes are a primary source of overhead, and serenity tests show a >2x speed improvement in some cases if code is written well. The canonical examples include:

  • Cases where a byte array or string needs to be saved to storage. One practical use case is the storage of 80-byte bitcoin block headers in btcrelay.
  • Cases where a struct needs to be saved to storage, and many elements in the struct are updated at the same time. One practical use case "order book" contracts where buyer, seller, amount, price, currency, etc are all entered at once.

Additionally, compiler logic may be simplified as storing data in sequential slots is no longer required. This does come at the cost of requiring updates to blockchain tools that currently assume that all storage values are either empty or exactly 32 bytes long.

The rationale behind adding a 32 byte cost to nonempty storage slots is that even a one-byte storage value will necessarily have overhead including key/value storage size costs, merkle tree branches/leaves, etc; additionally, an incentive is required to encourage saving a constant 320-byte chunk in a single key rather than splitting it up among 10 keys.

@wanderer
Copy link
Member

Possible implementation concern. Leveldb doesn't handle large values very well. To get around this, break large values into chunks and append nonce to the key. Since leveldb sorts key lexicographical it very efficient to stream the chunks. Leveldb has lg and lt operations, so the query needed here would be gt <key> and lt <key + 1> where +1 refers to the next value, not append.

@vbuterin
Copy link
Contributor Author

vbuterin commented May 1, 2016

If you look at the bytes per second from that benchmark, you get 779k writes/sec * 100 bytes/write = 77.9 mbps for small values and 1100 writes/sec * 100000 bytes/write = 110 mbps for large values. So it doesn't seem to be fatal or even that problematic.

@chriseth
Copy link
Contributor

chriseth commented May 5, 2016

If this is adopted and should be usable from Solidity, it will add another layer of complexity. This "store as single blob" has to be an optional feature for structs and arrays, because otherwise, elements of the structs and arrays are not accessible as lvalues anymore.

Furthermore, the fact that storage size is dynamic will need some additional handling: If sloadbytes does not write to the full range of memory, we always have to check the size beforehand and optionally even throw if the size is not what we expect.

Finally, as already commented on the original issue, I think that having two "types" of storage slots will add similar complexity, although I think that this is not too big an issue.

@janx
Copy link
Member

janx commented May 6, 2016

elements of the structs and arrays are not accessible as lvalues anymore.

@chriseth can you give more details?

@Nashatyrev
Copy link
Member

Nashatyrev commented May 13, 2016

Are there any other rationales behind this EIP besides optimization?
If not than in my opinion this is the overkill to solve pure engineering performance problem with such a serious protocol change.
That change introduces a number of complexities (and potential vulnerabilities accordingly) for clients which are mostly relying on 32 bytes values. The change also complicates the protocol itself without introducing any new fundamental improvements.

@tawaren
Copy link

tawaren commented Feb 17, 2017

What is the status of this?
I assume the gas costs as stated now are not updated to include adaptions from #150

@github-actions
Copy link

There has been no activity on this issue for two months. It will be closed in a week if no further activity occurs. If you would like to move this EIP forward, please respond to any outstanding feedback or add a comment indicating that you have addressed all required feedback and are ready for a review.

@github-actions github-actions bot added the stale label Jan 16, 2022
@github-actions
Copy link

This issue was closed due to inactivity. If you are still pursuing it, feel free to reopen it and respond to any feedback or request a review in a comment.

@qizhou
Copy link
Contributor

qizhou commented Jun 16, 2022

Some thoughts on backward compatibility for SSTORE/SLOAD:

  • If SSTOREBYTES writes a bytes data with len < 32, then SLOAD will read the data with right-padded zeros to 32 bytes
  • If SSTOREBYTES writes a bytes data with len > 32, then SLOAD will return a left-32-bytes-truncated data
  • SSTORE = SSTOREBYES with size = 32

Further, to keep the IO loads of SSTORE/SLOAD as before, we could implement the storage layout by maintaining two mappings:

  • Original Mapping 32-bytes key => 32-bytes or 36-bytes original_value. If original_value size is 32 bytes (which is the same as the existing SSTORE/SLOAD), then the value size is 32 bytes. If original_value size is 36 bytes, then the last 4 bytes encode the size of the value. If the size < 32, then the value is the truncated version of original_value. If the size > 32, then the rest size-32 bytes are stored in the following mapping.
  • Extended Mapping 32-bytes key => arbitrary-size extended_value. This maps the rest part of the value if the size of the value.

SSIZE would simply read original_value from Original Mapping and return 32 if the value is 32 bytes or the decoded value of the last 4 bytes of the value.

This should make SSTOREBYTES/SLOADBYTES/SSIZE as a pure add-on feature of existing EVM - existing SSTORE/SLOAD semantics are kept and their IO costs are the same as before. However, if a contract mixes the usage of fixed-bytes SSTORE/SLOAD and arbitrary-size SSTOREBYTES/SLOADBYTES/SSIZE, then the gas metering of all these opcodes needs further discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants