Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conceptual: Substrate Storage #255

Conversation

shawntabrizi
Copy link
Contributor

@shawntabrizi shawntabrizi commented Sep 24, 2019

  • Should this be in node or core?

  • Maybe needs a lot more content if there are relevant topics to touch on. Need storage expert to chime in.

  • More references

  • Diagram

@shawntabrizi shawntabrizi added A3-inprogress S0-Conceptual Documentation about learning T2-Page Documentation which should live as a detailed external page labels Sep 24, 2019
@shawntabrizi shawntabrizi changed the title Conceptual: Client Storage Conceptual: Substrate Storage Sep 24, 2019
@shawntabrizi
Copy link
Contributor Author

Notes from @cheme

trie abstraction
I do not know if you are interested in technical details here, like what rocksdb inner storage looks like:

for trie for instance there is some relevant point for poeple implementing this kind of stuff

all trie node are stored in rocksdb. Part of the trie state can get pruned (key-value deleted from state out of pruning window/range for non archive node).
no reference counted in rocksdb stored trie node
trie nodes are encoded and stored in rocksdb at 'encoded trie path up to node ++ hash(encoded_node)'. That is to allow any kind of key and still be able to avoid key collision.
A word about the fact that the trie is good to maintain historic of value:
a nice thing to note is that this kind of structure allow storing history of block state. I mean sharing block state between block is inherent to the way the trie is defined (you don't have a state trie per block but a trie hash that will point on nodes from previous block state).

Also you could say that other than state trie, there is multiple place where trie are used (but not stored in rocks db with their full state: only key value are needed because we do not modify the trie): block extrinsics root, change tries and maybe others.

For storage it means : blocks storage (right now I do not know where they are stored, probably should look at clientdb), non cannonical block exection data/delta storage (that is state_db code, does not use trie), key pending pruning storage (I am not sure may also be from state_db code, does not use trie),

Main trie
I would call it 'State trie', it actually has one root hash per block (block header).
Used to verify indeed state at any block.

Technical detail could go into the fact that the rocksdb nodes content is only for canonical chain (so no branch), and there is a 'state_db' layer that maintain trie state with reference counted in memory only for all that is non canonical (loaded from stored delta for blocks: see journaldb).

Not having reference counted in the persistent db is for performance reason.

child trie
child trie are identical to main trie except their root is stored and updated in the main trie instead of the header.

therefore prooving inclusion of a key value at a state involves:

proving inclusion of the child trie root in the main trie for a given block header hash
proving inclusion of the key value in a trie with the previous child trie root value as root
Technical details are subject to change here (PR under progress to isolate the rocksdb key value between child trie).

The interesting one is to say that trie encoded value are using the same rocksdb collection, so there is a need to prefix the rocksdb keys to avoid key collision (related to the fact that we do not use reference counted).

I remember there was some discussion and bench about the complexity and cost of using a trie.

So it should be worth writing somewhere that access to trie data is costy; for a single key value query we need to parse all node leading to the trie. Therefore a key value cache is suitable (and in place, even if I did not have the occasion to verify it works correctly).

I think change trie could get their own description, but I do not think the feature is not really use yet, there is two thing important their

the tries are in memory only (only the key values are stored), the values are the indexes of key change, so the change trie can be use to see which extrinsic does touch a key for a block.
their is also a log digest that is build into the change trie for checking state, I won't try to describe the purpose (too complicated for me but Rob did a presentation about it the first day of ethcc).
There is also a bunch of metadata but that can be a bit unrelated.

Description of light client and offchain worker could be interesting to, but I am really not up to date with substrate light client, and part of offchain worker is still in progress (I am on a storage for it that keep trace of blockchain change, that is why even if very inefficient, the trie storage property of keeping all its history indexed is really good).


## Trie Abstraction

One advantage of using a simple key-value store is that you are able to easily abstract other storage structures on top.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put a bit more content on how a Trie is abstracting on top of a K-V store? I have worked with storage from highest level (SRML), to the lowest (raw RPC to to encoded key) and from the looks of it, honestly, I never really felt like I am working with a Trie here. It always looked more like a bare bone KV store to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to know what that information would look like...

Implementation details like this could be above the level of "conceptual" docs, but I agree we would want this information in the reference docs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it is interesting to keep in mind as a runtime developper that the trie is way more costy than a standard k-v store.
General idea being that you should rather store a serialized struct with two field at a storage location rather than both field at different location, except if you want to reduce the size of the proof.
That is something that can get quite clear when you apply storage cost (if you have a base cost and variable cost for size to store).

@shawntabrizi
Copy link
Contributor Author

@cheme Can you give a final approval when you are happy?

docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
Copy link
Contributor

@joepetrowski joepetrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of confusing paragraphs and not always clear what the point is. The "Trie Abstraction" part starts out with a few paragraphs on how tries can be used to verify state agreement, but then suddenly starts talking about performance and pruning.

Still a lot of English to be cleaned up,

  • The Substrate uses a simple a key-value data store
  • Tries are important tool for
  • it is still easy to verify of the complete node state

docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved

Substrate has a single main trie, called the state trie, whose changing root
hash is placed in each block header. This is used to easily verify the state of
the blockchain and provide a basis for light clients to verify proofs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to know more about the light clients and what exactly is proved to them. I know they only read block headers (not full blocks) so they get to see the state root before and after each block, but how do they actually know that the transactions in the block were executed correctly and lead to the resulting state root?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is proved to the light client is validity of an operation over a state of the chain. Say the light client does not have the full state, he get the info he need with a proof that it comes from a valid full state.

simplier example should be:


Querying keys from state trie, the request will be (from cone/network/src/protocol/message.rs):

	#[derive(Debug, PartialEq, Eq, Clone, Encode, Decode)]
	/// Remote storage read request.
	pub struct RemoteReadRequest<H> {
		/// Unique request id.
		pub id: RequestId,
		/// Block at which to perform call.
		pub block: H,
		/// Storage key.
		pub keys: Vec<Vec<u8>>,
	}

then the reply will be

#[derive(Debug, PartialEq, Eq, Clone, Encode, Decode)]
/// Remote read response.
pub struct RemoteReadResponse {
	/// Id of a request this response was made for.
	pub id: RequestId,
	/// Read proof.
	pub proof: Vec<Vec<u8>>,
}

To make sense out of this reply we fetch request by id and the state trie root of the requested block hash, then we query keys over the proof.
The proof is here a subset of the memorydb the state trie is build upon.
This subset contains only the trie nodes the full client does record when running the query on its side (for every keys from the request).
Then the light client run the same keys query over a trie build from the block state root (he know it from the cht) and the trie nodes he just received (field proof of response). From this incomplete state trie he can get the resulting values of every keys in input (and since every node of trie refers to each over through a crypto hash it is proof it is in the chain state).


but this way of recording some operation on a full client and re-executing over this record on a light client can apply to many thing (if you look in message.rs there is a few query, call for instance uses executor on light blockchain).
Possible future design for substrate light client seems to be allowing evaluation of some wasm code which could be a better solution than chaining queries result like it was done in eth. (better solution to reducing the number of queries roundtrip).

Not 100% sure on the following point, but proving the transition between two consecutive blocks is not something we do (I think), we just rely on the fact that the blocks got validated and we know they are chained. Surely a full client can execute the transition by running a full block on its state, so it should be possible for him to send to a light client all accessed db keys during the full block execution and next root calculation, and the light client will be able to execute the block on those keys and produce next root. But I guess the proof will be really massive, so relying on network having validating those state may be enough in the light client usecase.

But you can run any call already (see RemoteCallRequest) or query the deltas between block through RemoteChangesRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a slightly simpler, but hopefully still fully accurate explanation of light client proofs come from merkle proof in general.

image

from here: https://www.quora.com/Cryptography-How-does-a-Merkle-proof-actually-work

The image hopefully shows that if you want to prove TX3 is in the trie, you dont need ALL the nodes, just the full branch will leads to TX3 and all the nodes next to the nodes on that full branch. Obviously WAYWAY less data on big tries like the ones on blockchains.

Full nodes act as the provider to light clients of the proof it needs. They do so pretty selflessly, but there are thoughts to have light clients start micro-payments to full nodes for their services.

@JoshOrndorff
Copy link
Contributor

I read the whole thing and learned while reading. I would really like to see an example of how and why to use a child tree. I guess how doesn't go in conceptual docs, but it should go somewhere. why probably does go here.

@shawntabrizi
Copy link
Contributor Author

shawntabrizi commented Sep 27, 2019

I read the whole thing and learned while reading. I would really like to see an example of how and why to use a child tree. I guess how doesn't go in conceptual docs, but it should go somewhere. why probably does go here.

That makes sense. The why, afaik, is that you want your own trie with it's own root hash that you can use the verify the state of that child trie.

A trie only has a single "root hash" which describes the whole trie. Subsections of the trie do not have some hash which represents their "sub-content". But maybe you want that, only for a subsection of data.... so you make a child trie.

shawntabrizi and others added 3 commits September 28, 2019 02:09
Co-Authored-By: cheme <emericchevalier.pro@gmail.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
@shawntabrizi
Copy link
Contributor Author

shawntabrizi commented Sep 28, 2019

@joepetrowski I would like to merge this in having addressed every issue except:

A lot of confusing paragraphs and not always clear what the point is. The "Trie Abstraction" part starts out with a few paragraphs on how tries can be used to verify state agreement, but then suddenly starts talking about performance and pruning.

I think the best option is to just remove this section:

All trie nodes are stored in RocksDB and part of the trie state can get pruned,
i.e. a key-value pair can be deleted from the storage when it is out of pruning
range for non archive nodes. We do not use reference
counting
for performance
reasons.

Which is entirely implementation details. But would want to hear from engineering that this data is not important to be taught.

Copy link
Contributor

@joepetrowski joepetrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's close. Mostly grammar nits and a few content suggestions. I should caution that tries/storage are a weak point in my CS understanding, so I think this is generally OK as in I learned something, but I'm not the authority on its correctness.

docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
docs/conceptual/core/storage.md Outdated Show resolved Hide resolved
shawntabrizi and others added 12 commits September 28, 2019 23:27
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-Authored-By: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Copy link
Contributor

@joepetrowski joepetrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as @cheme approves for accuracy, I'm good with it.

@shawntabrizi shawntabrizi merged commit 726befd into substrate-developer-hub:source Sep 29, 2019
@shawntabrizi shawntabrizi deleted the shawntabrizi-storage-doc branch September 29, 2019 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S0-Conceptual Documentation about learning T2-Page Documentation which should live as a detailed external page
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants