VIP: Improve storage variable layout #769

daejunpark · 2018-04-09T21:43:05Z

Preamble

VIP: #769
Title: Improve storage variable layout
Author: Daejun Park
Type: Standard Track 
Status: Draft
Created: 2018-04-09
Requires (*optional): <VIP number(s)>
Replaces (*optional): <VIP number(s)>

Simple Summary

Improve the layout of dynamically-sized state variables in the storage, adopting that of Solidity.

Specification

A map entry m[k] is stored at the location hash(k . index(m)), where "." is byte-concatenation.

A nested map entry's location can be defined recursively. For example, m[k1][k2] is stored at hash(k2 . hash(k1 . index(m)))

A formal specification can be found here (in progress).

Below is a quick reference of loc that computes the storage location.

Lists:

loc(a[i])    =   #(a) + i
loc(a[i][j]) = #(#(a) + i) + j

Structs:

loc(s.x)   =   #(s) + x
loc(s.x.y) = #(#(s) + x) + y

Maps:

loc(m[k])    =   #(m . k)
loc(m[k][l]) = #(#(m . k) . l)

In general: (E is an arbitrary nested data structure, e.g., a list of structs of maps ...):

loc(E[i]) = #(loc(E)) + i    // list of ...
loc(E.x)  = #(loc(E)) + x    // struct of ...
loc(E[k]) = #(loc(E) . k)    // map of ...

loc(a) = a                   // list name
loc(s) = s                   // struct name
loc(m) = m                   // map name

NOTE:

# denotes the keccak256 hash.
. is byte-concatenation.
Assume there is the implicit index conversion for the list name a, the struct name s, the struct field names x and y, and the map name m.

Rationale:

The number of elements (or fields) of a list (or a struct) is expected to be much smaller than the size of the storage address space. Thus, even if the elements (or the fields) are stored in a consecutive region of the storage, it is very unlikely the region is overlapped with the other regions, since the starting locations of the regions will be well distributed (well spread across the entire storage) by keccak256.

Motivation

The main difference from the current scheme is the use of . instead of + to compute the offset. This change is critical for avoiding potential collisions, since . is injective, while + is not due to the modulo arithmetic. (Note that this is orthogonal to the potential hash collision of keccak256.)

The text was updated successfully, but these errors were encountered:

jacqueswww · 2018-04-09T22:21:06Z

As we've discussed this across many meetings, I have marked this as approved ;)

jacqueswww · 2018-04-11T14:11:36Z

@daejunpark I started working on the above, can we just update the spec, the resulting lll comes out as follows (just order of concat is different):

self.a[1][3] = 131313

 [sstore, [sha3_64, [sha3_64, 0 <self.a>, 1], 3], 131313],

Also please mention that list and structs also need to use the new method ;)

daejunpark · 2018-04-11T22:23:53Z

@jacqueswww that seems more natural to me. Indeed, I have no good idea why Solidity uses the swapped order. It would be great if you know Solidity developers and can ask them to confirm that there is no critical reason for choosing the swapped order.

I'll update the spec as you suggested. Thanks!

daejunpark · 2018-04-12T22:01:28Z

@jacqueswww I think you can keep the current layout scheme for lists and structs.

Below is a quick reference of loc that computes the storage location.

Lists:

loc(a[i])    =   #(a) + i
loc(a[i][j]) = #(#(a) + i) + j

Structs:

loc(s.x)   =   #(s) + x
loc(s.x.y) = #(#(s) + x) + y

Maps:

loc(m[k])    =   #(m . k)
loc(m[k][l]) = #(#(m . k) . l)

In general: (E is an arbitrary nested data structure, e.g., a list of structs of maps ...):

loc(E[i]) = #(loc(E)) + i    // list of ...
loc(E.x)  = #(loc(E)) + x    // struct of ...
loc(E[k]) = #(loc(E) . k)    // map of ...

loc(a) = a                   // list name
loc(s) = s                   // struct name
loc(m) = m                   // map name

NOTE:

# denotes the keccak256 hash.
Assume there is the implicit index conversion for the list name a, the struct name s, the struct field names x and y, and the map name m.

Rationale:

The number of elements (or fields) of a list (or a struct) is expected to be much smaller than the size of the storage address space. Thus, even if the elements (or the fields) are stored in a consecutive region of the storage, it is very unlikely the region is overlapped with the other regions, since the starting locations of the regions will be well distributed (well spread across the entire storage) by keccak256.

Question.

Could you confirm that the above scheme for lists and structs is the same with the current implementation?

jacqueswww · 2018-04-13T13:49:11Z

@daejunpark I actually went and did the work on all structures yesterday. The other one to consider is a ByteArray, which uses the same scheme. Are we 100% it's safe to have those structures not to use the
#(#(m).k) approach. What happens when we store a list, bytearray or struct within a map? Feels like in that scenario the risk is the same as using map of #(m) + k ?
If we are 100% abouth using it for maps only; I'll adapt it ;) as there is obviously a higher gas cost.

daejunpark · 2018-04-13T17:25:39Z

@jacqueswww I discussed with @yzhang90, we think that it seems to be safe, but not 100% sure to be honest (because I'm not an expert for the kaccak256 hash). Indeed, the above scheme is similar to that of Solidity.

I think we need to have an extra discussion and confirmation of its safety with other experts (e.g., Solidity developers). Do you think the Solidity gitter channel is a good place for it? If so, I can initiate the discussion there.

(BTW, you're so quick to develop. Sorry for making you go back and forth. I think you can leave what you already implemented for now, just in case you change the scheme again.)

jacqueswww · 2018-04-13T17:40:54Z

I agree, let's get this sorted before I develop it further ;) gitter channel should help yes. Also look for @chriseth in our channel directly I know he works on solidity as well.

daejunpark · 2018-04-18T23:16:11Z

Just for a record, copy the gitter communication here:

Daejun Park @daejunpark 12:16
Hi @chriseth
I have a question regarding the layout of state variables in storage:
http://solidity.readthedocs.io/en/v0.4.21/miscellaneous.html#layout-of-state-variables-in-storage

(It seems weird to ask about Solidity in Vyper channel, but this will help to improve the Vyper's layout scheme.)

The current scheme, for example, works as follows:

A dynamically-sized array element, a[i], is stored at keccak256(slot(a)) + i.
A mapping entry, m[k] is stored at keccak256(k . slot(m)), where . is concatenation.

Now, I have two questions:

Q1. Is there a critical reason that the location scheme of dynamic arrays is different from that of mappings? In other word, what will happen if a[i] is stored at keccak256(i . slot(a))? I know it will consume more gas, but will it affect any security (or hash collision probability)?

Q2. For a map entry location, is there any critical difference (in terms of security or hash distribution) between the following two?
keccak256(k . slot(m))
vs
keccak256(slot(m) . k)

chriseth @chriseth 13:21
@daejunpark dynamic arrays use a different scheme than mappings for efficiency. Using the same scheme as mappings reduces collision probability. We have to disallow large arrays because you can easily find collisions in storage otherwise. Another reason is also that if you use the same scheme, there is not a big reason to have arrays in general.

for the collisions see https://chriseth.github.io/notes/talks/safe_solidity/#/8

For Q2, I hope that keccak256 ensures that the order does not matter.

daejunpark · 2018-04-18T23:34:29Z

@jacqueswww It turns out (thanks to @chriseth ) that the current scheme for lists is somewhat necessary, otherwise there is no reason to have lists in addition to maps.

But we need to have a compile-time check to reject a very large list, which can be problematic for the same reason we discussed before.

It would be good to check the size of structs as well (although it is very unlikely to declare such a large struct).

So, are lists, structs, and maps all we have?

chriseth · 2018-04-19T07:14:31Z

In Solidity we issue warnings for large statically-sized arrays and we plan to remove the ability to arbitrarily increase the size of dynamically-sized arrays. If you have structures whose size in storage scales linearly with the amount of symbols required in source code, you should be fine, since there is still a lot of space in 2**256.

chriseth · 2018-04-19T07:29:13Z

Perhaps to clarify a little more: If you can only increase the length of a dynamically-sized structure by a single element, then the gas costs for that operation keep the structure small enough until the end of the universe.

daejunpark · 2018-04-23T15:25:09Z

Thanks @chriseth for your help! Now, we're clear what to do.

jacqueswww added the VIP: Approved VIP Approved label Apr 9, 2018

jacqueswww added this to Backlog in Vyper - Final Countdown via automation Apr 9, 2018

jacqueswww moved this from Backlog to In Progress in Vyper - Final Countdown Apr 11, 2018

daejunpark mentioned this issue Apr 23, 2018

Meeting 23rd April 2018 #788

Closed

jacqueswww mentioned this issue Apr 26, 2018

Improved storage variable layout #793

Merged

jacqueswww closed this as completed in #793 May 7, 2018

Vyper - Final Countdown automation moved this from In Progress to Done May 7, 2018

nrryuya mentioned this issue Nov 13, 2018

Modify hashedLocation rule of Vyper runtimeverification/evm-semantics#275

Closed

MrChico mentioned this issue Nov 15, 2019

Inefficient storage layout for statically sized arrays #1731

Closed

fubuloubu mentioned this issue Nov 19, 2019

VIP: Hash-based storage slots #1733

Open

iamdefinitelyahuman mentioned this issue Feb 16, 2021

Place re-entrancy lock immediately after used storage slots #2308

Merged

pcaversaccio mentioned this issue Aug 24, 2023

Add EIP: Namespaced Storage Layout ethereum/EIPs#7201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VIP: Improve storage variable layout #769

VIP: Improve storage variable layout #769

daejunpark commented Apr 9, 2018 •

edited

Loading

jacqueswww commented Apr 9, 2018

jacqueswww commented Apr 11, 2018 •

edited

Loading

daejunpark commented Apr 11, 2018 •

edited

Loading

daejunpark commented Apr 12, 2018

jacqueswww commented Apr 13, 2018 •

edited

Loading

daejunpark commented Apr 13, 2018 •

edited

Loading

jacqueswww commented Apr 13, 2018

daejunpark commented Apr 18, 2018 •

edited

Loading

daejunpark commented Apr 18, 2018

chriseth commented Apr 19, 2018

chriseth commented Apr 19, 2018

daejunpark commented Apr 23, 2018

VIP: Improve storage variable layout #769

VIP: Improve storage variable layout #769

Comments

daejunpark commented Apr 9, 2018 • edited Loading

Preamble

Simple Summary

Specification

Motivation

jacqueswww commented Apr 9, 2018

jacqueswww commented Apr 11, 2018 • edited Loading

daejunpark commented Apr 11, 2018 • edited Loading

daejunpark commented Apr 12, 2018

jacqueswww commented Apr 13, 2018 • edited Loading

daejunpark commented Apr 13, 2018 • edited Loading

jacqueswww commented Apr 13, 2018

daejunpark commented Apr 18, 2018 • edited Loading

daejunpark commented Apr 18, 2018

chriseth commented Apr 19, 2018

chriseth commented Apr 19, 2018

daejunpark commented Apr 23, 2018

daejunpark commented Apr 9, 2018 •

edited

Loading

jacqueswww commented Apr 11, 2018 •

edited

Loading

daejunpark commented Apr 11, 2018 •

edited

Loading

jacqueswww commented Apr 13, 2018 •

edited

Loading

daejunpark commented Apr 13, 2018 •

edited

Loading

daejunpark commented Apr 18, 2018 •

edited

Loading