Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use finalized block as the difflayer/dislayer indicator in pathdb #29822

Open
fynnss opened this issue May 23, 2024 · 7 comments
Open

Use finalized block as the difflayer/dislayer indicator in pathdb #29822

fynnss opened this issue May 23, 2024 · 7 comments

Comments

@fynnss
Copy link

fynnss commented May 23, 2024

Rationale

Why should this feature exist?

In the existing implementation, pathdb's difflayer stores 128 layers by default, which can facilitate fast memory rollback. However, the more diff layers, the deeper the call depth and the worse performance of getting Node will be.

Use finalized block as the difflayer/dislayer indicator in pathdb can reduce the number of difflayers in memory and improve efficiency of getting node.

pprof:

image

Implementation

Do you have ideas regarding the implementation of this feature?

func (db *Database) Update(root common.Hash, parentRoot common.Hash, block uint64, nodes *trienode.MergedNodeSet, states *triestate.Set) error {
	// Hold the lock to prevent concurrent mutations.
	db.lock.Lock()
	defer db.lock.Unlock()

	// Short circuit if the mutation is not allowed.
	if err := db.modifyAllowed(); err != nil {
		return err
	}
	if err := db.tree.add(root, parentRoot, block, nodes, states); err != nil {
		return err
	}
	// Keep 128 diff layers in the memory, persistent layer is 129th.
	// - head layer is paired with HEAD state
	// - head-1 layer is paired with HEAD-1 state
	// - head-127 layer(bottom-most diff layer) is paired with HEAD-127 state
	// - head-128 layer(disk layer) is paired with HEAD-128 state
	return db.tree.cap(root, maxDiffLayers)  // use head<->finalizedblock as the maxDifflayers
}

Are you willing to implement this feature?

@rjl493456442
Copy link
Member

This idea sounds reasonable, but unfortunately we have to keep the latest 128 states available for serving snap sync.

@fynnss
Copy link
Author

fynnss commented May 23, 2024

This idea sounds reasonable, but unfortunately we have to keep the latest 128 states available for serving snap sync.

AFAIK, snapshot data serves snap-sync. Will pathdb also be used? If it will be, maybe the history state can serve it.

@rjl493456442
Copy link
Member

Will pathdb also be used?

Yes, we need to construct the range proof for the specific state range, via trie

maybe the history state can serve it

The state history only records the flat state diffs, the historic trie node are not accessible unless we do the state rollback with the state diffs.

@fynnss
Copy link
Author

fynnss commented May 23, 2024

Yes, we need to construct the range proof for the specific state range, via trie

Thank you so much. I got it.

The snapshot use bloomfliter to reduce the call path of getAccount/Storage. But bloomfilter for path key seems not very suitable. Have you tried it before?

@rjl493456442
Copy link
Member

rjl493456442 commented May 27, 2024

Yeah, we also thought about using a Bloom filter in a trie database, but we haven't tried it yet.

The reason is bloom filters need to be rebuilt for each layer once the dirty cache in "diskLayer" is flushed, we need to measure the cost and gain.

Given that the number of trie nodes contained in the trie database far exceeds the number of states contained in the state snapshot. We assume the cost for bloom filter construction is non-trivial.

Besides, we could foresee that the frequency of disk layer flushing is relatively higher than state snapshot (as the dirty cache size is limited). So the overhead for maintaining bloom filter could be even higher.

However, I think it would be worthwhile to experiment it a bit.

@will-2012
Copy link

In some test scenarios, pbss 128difflayer may indeed be a potential bottleneck. Adding hashcache to difflayer may be an option. For details, see PR: #29991

@fynnss
Copy link
Author

fynnss commented Jun 18, 2024

@will-2012 Sounds more reasonable than bloomfilter with less overhead and better overall performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants