Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core, trie: rework trie database #26813

Merged
merged 2 commits into from
Apr 24, 2023
Merged

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Mar 6, 2023

This PR reworks the trie database a bit to simplify things and make it be usable by verkle tree.

Trie database uses reference counter mechanism to prune stale nodes in memory. It relies on the references
between parent node and child nodes to determine if the node is already outdated.

However, the trie database is very MPT specific. For example, it maintains the MPT nodes directly and iterate
children of nodes with MPT rules. It will bring some performance speed up by avoiding node encoding/decoding.
But it's not abstract enough, and very unfriendly for new trie types, e.g. verkle tree.

In this PR, trie database will only maintain encoded node blobs with additional metadata(e.g the children hashes
node). In this way, trie database could be abstract enough to be used by different trie implementations. And also
the huge benefit is it unties the trie and trie database, we can move the trie database to a separate package
in the future.

The downsize is trie database needs to maintain the children hashes of nodes additionally, it will requires more
memory space. We need to twist a reasonable memory allowance for dirty node set.

trie/nodeset.go Outdated
node node // Cached collapsed trie node, or raw rlp data, nil for deleted nodes
hash common.Hash // Node hash, computed by hashing rlp value, empty for deleted nodes
node []byte // Cached RLP-encoded trie node, nil for deleted nodes
children []common.Hash // The list of children hashes
Copy link
Contributor

@fjl fjl Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could optimize the size of this struct by storing offsets into node instead. For example, we could have a field definition like:

children *[16]uint16

The values stored in the array would be the byte offsets where a child hash is located in the encoded node. This definition would have memory overhead of 8 bytes (nil pointer) for nodes without any children, and 8 bytes (pointer) + 32 bytes (array) for nodes with children. Smaller than the current slice-based children list even when only a single child is present!

Compare this to the overhead of the current definition

children []common.Hash

which is 16 bytes (slice) for nodes with no children, and 16 + n*32 bytes for nodes with n children.

Copy link
Contributor

@fjl fjl Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, the [16]uint16 is specific to MPT with its 16 children and small maximum node size. We can make it work for the verkle tree using generics, i.e.

type childrenArray interface{
     getChild(i int, node []byte) []byte
}

type mptChildrenArray [16]uint16

func (c *mptChildrenArray) getChild(i int, node []byte) []byte {
     offset := (*c)[i]
     return node[offset:offset+32]
}

type memoryNode[CA childrenArray] struct{
     hash     common.Hash  // Node hash, computed by hashing rlp value, empty for deleted nodes
     node     []byte       // Cached RLP-encoded trie node, nil for deleted nodes
     children *CA          // The list of children hashes
}

func (n *memoryNode[CA]) child(i int) []byte {
     return n.children.getChild(i, n.node)
}

type mptMemoryNode = memoryNode[mptChildrenArray]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds cool!

Indeed as you said children list will be very large and would be way more better for taking byte offset approach. I will have a try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem will be getting the actual offsets. I think this will require a rewrite/duplication of the node decoder functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I do not advise implementing the generics-based thing right away, since that will further increase code complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's non trivial to get these offsets, or to load the child correctly with the given index.

Now I pick up the interface approach, which has forEach function defined to iterate the children. It feels like a bit simpler. Do you think this approach makes sense to you?

I also have to think about it. Personally I think maintain children []common.Hash is the cleanest approach and general enough, but the metadata is too large. For this interface approach, it feels like still clean, but I am not sure if it's suitable.

@rjl493456442 rjl493456442 force-pushed the refactor-tdb branch 3 times, most recently from 6df66c6 to 09b0c89 Compare March 13, 2023 07:47
@rjl493456442
Copy link
Member Author

截屏2023-03-13 下午4 15 43

截屏2023-03-13 下午4 17 03

After running 7 days, this PR is slightly slower than master branch, which is approximately 1.7% slower
because of additional RLP encoding/decoding overhead.

flushNext common.Hash // Next node in the flush-list
node []byte // Encoded node blob
parents uint32 // Number of live nodes referencing this one
external map[common.Hash]struct{} // The set of external children
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from int to marker? Won't this lose a "count" of the reference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one example could be the meta-parent (common.Hash) referencing the same root hash multiple times (Clique networks, post-merge empty-block state). In that case, we'll not add the second reference this way and the dereference of the first might actually delete data still referenced later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one example could be the meta-parent (common.Hash) referencing the same root hash multiple times (Clique networks, post-merge empty-block state). In that case, we'll not add the second reference this way and the dereference of the first might actually delete data still referenced later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't maintain the common.Hash -> root references anymore, seems I found it's unnecessary.
The parents counter of root node is already enough.

In the case you describe, the parent count of root node will be bumped when this state is requested to be referenced explicitly. All nodes referenced by this root node will be retained until the parent count is decreased to 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can even replace it with a common.Hash pointer, because in our case, a node can have at most 1 external child(storage root).

trie/database.go Outdated Show resolved Hide resolved
@rjl493456442 rjl493456442 marked this pull request as ready for review March 14, 2023 06:59
@karalabe
Copy link
Member

Pls rebase

nhash = common.BytesToHash(hash)
mnode = &memoryNode{
hash: nhash,
node: simplifyNode(n),
size: uint16(size),
node: nodeToBytes(n),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got an issue with this, because in verkle trees, serializing a node is very expensive and can (and should) be done in parallel in order to factor some computation. I would very much prefer to keep a pointer to a non-serialized node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, I could just not do a node serialization but simply a copy of the commitment bytes. It would be faster, but a node's size would be around 16K.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about it. After the change in this PR, there are some extra costs for verkle, mostly the node serialization and deserialization:

Scenario description

There are a few accounts and a few storage slots touched in the mainnet block. Let's use 500 accounts and 3.5K slots here for calculation.

In verkle, the node width is 256, the average tree depth is 4. If we need to touch 4K leaves in total, it means 4K leaf nodes and 585(1+8+8^2+8^3) internal nodes will be accessed.

Deserialization

It happens when the node is accessed. The major cost is the conversion from node hash to node commitment(find the elliptic curve point by its X-coordinate). I wrote a benchmark and it turns out it's indeed extremely expensive.

// BenchmarkHashToCommitment
// BenchmarkHashToCommitment-8   	   45598	     25769 ns/op	      32 B/op	       1 allocs/o
func BenchmarkHashToCommitment(b *testing.B) {
	p := &Generator
	p.Add(p, p)
	hash := p.Bytes()

	b.ResetTimer()
	b.ReportAllocs()

	for i := 0; i < b.N; i++ {
		var elem Element
		elem.SetBytesTrusted(hash[:])
	}
}

If we need to do the conversion for 4.58K nodes, the overall cost will be 112ms. Fortunately we have trie prefetcher which will try to load and decode nodes during evm execution. Not sure how much can be covered by it though.

Serialization

The same, the major cost of serialization comes from the conversion of node commitment to node hash. And it's possible to speed it up by running the conversion in a batch.

So the total cost of this conversion is 4.5ms and might be possible to reduce it a lot.

// BenchmarkCommitmentToHash
// BenchmarkCommitmentToHash-8   	  740104	      1625 ns/op	       0 B/op	       0 allocs/op
func BenchmarkCommitmentToHash(b *testing.B) {
	p := &Generator
	p.Add(p, p)

	b.ResetTimer()
	b.ReportAllocs()

	for i := 0; i < b.N; i++ {
		p.Bytes()
	}
}

// nodes = 16, 384.625 ns in average
// BenchmarkElementsToBytes
// BenchmarkElementsToBytes-8   	  169038	      6154 ns/op	    1552 B/op	       4 allocs/op
func BenchmarkElementsToBytes16(b *testing.B) {
	benchmarkElementsToBytes(b, 16)
}

// nodes = 256, 250.41015625 ns in average
// BenchmarkElementsToBytes256
// BenchmarkElementsToBytes256-8   	   17974	     64105 ns/op	   24832 B/op	       4 allocs/op
func BenchmarkElementsToBytes256(b *testing.B) {
	benchmarkElementsToBytes(b, 256)
}

func benchmarkElementsToBytes(b *testing.B, children int) {
	var (
		point    = &Generator
		elements []*Element
	)
	for i := 0; i < children; i++ {
		point = point.Add(point, point)
		elements = append(elements, point)
	}
	b.ResetTimer()
	b.ReportAllocs()

	for i := 0; i < b.N; i++ {
		ElementsToBytes(elements)
	}
}

Summary

The cost of node serialization and deserialization is not trivial. But if we keep the node or a pointer of node in memory, the trie database will be MPT or Verkle specific. I hope we can make it general enough for both trees.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as you said, if we change the serialization rules a bit, we might be able to get rid of the conversion cost. Currently, we use the X-coordinate as the node identifier. This identifier is expected to be unique. The benefits of using X-coordinate as the identifier is:

  • We can uniquely identify a elliptic curve point which is the commiement of the node
  • The identifier is also unique
  • The size of X-coordinate is smaller(32 bytes)

But what if we use a different method to calculate the node identifier:
e.g: we can simply compute the sha256 hash of the node coordinates (identifier = sha256(x,y,z)) or the hash of serialized node (identifier = sha256(serialized_node)). which can also uniquely represent a node.

In serialization, we can serialize the x,y,z coordinates as well. Each of them are 32 bytes, result in extra 96 bytes. But compare with the 256 children, it's still trivial.

In this scheme, we can get rid of the conversion, the node serialization and deserialization should be away more faster.

}

// insert inserts a simplified trie node into the memory database.
// All nodes inserted by this function will be reference tracked
// and in theory should only used for **trie nodes** insertion.
func (db *Database) insert(hash common.Hash, size int, node node) {
func (db *Database) insert(hash common.Hash, node []byte) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for using []byte and not rawNode?

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, but really only time running will tell if everything's truly correct 🙃

@karalabe karalabe added this to the 1.11.7 milestone Apr 24, 2023
@karalabe karalabe merged commit bbcb5ea into ethereum:master Apr 24, 2023
1 check passed
shekhirin pushed a commit to shekhirin/go-ethereum that referenced this pull request Jun 6, 2023
* core, trie: rework trie database

* trie: fix comment
antonydenyer pushed a commit to antonydenyer/go-ethereum that referenced this pull request Jul 28, 2023
* core, trie: rework trie database

* trie: fix comment
MoonShiesty pushed a commit to MoonShiesty/go-ethereum that referenced this pull request Aug 30, 2023
* core, trie: rework trie database

* trie: fix comment
devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
* core, trie: rework trie database

* trie: fix comment
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants