Skip to content

Conversation

@matkt
Copy link
Contributor

@matkt matkt commented May 27, 2025

PR description

This PR refactors the DB snapshot (thanks to @garyschulte's commit) to remove the use of transactions, as a snapshot is immutable by nature.
Additionally, it introduces an optional cache layer at the snapshot level to optimize repeated reads.

Tests :

  • Checked block building on a sepolia validator
  • Verified oldest_snapshot_seqno to be sure old snapshot are correctly closed
  • Tried calling eth_call in order to verify the performance with the snapshot cache

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

matkt and others added 7 commits May 22, 2025 18:20
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature/test-read-snapshot branch from d6c92d9 to ea564b2 Compare May 28, 2025 08:34
matkt added 2 commits May 28, 2025 10:39
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature/test-read-snapshot branch from 8447c7a to 231399d Compare May 28, 2025 09:04
@matkt matkt marked this pull request as ready for review May 28, 2025 12:14
hidden = true,
paramLabel = "<BOOLEAN>",
description =
"Enable caching of reads during snapshot access to improve performance (default: ${DEFAULT-VALUE})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here that we shouldn't use it with block processing ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the description feel free to tell me if it's better

throws RocksDBException {
final Bytes cacheKey = makeCacheKey(segmentId, key);
Optional<byte[]> cached = cache.getIfPresent(cacheKey);
//noinspection OptionalAssignedToNull
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to understand the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because I'm checking if optional == null, but it's something we want to do, I will move it at the beginning of the class wuth SuppressWarnings

final byte[] segmentId,
final byte[] key,
final ColumnFamilyHandle handle,
final Cache<Bytes, Optional<byte[]>> cache)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, maybe it is clear for everyone but I wonder if adding a comment that explains why the value in the cache is an optional, because we want to cache zero reads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

Signed-off-by: Karim Taam <karim.t2am@gmail.com>
@matkt matkt requested a review from ahamlat May 28, 2025 12:53
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
.getWorldState(withBlockHeaderAndNoUpdateNodeHead(chainHeadBlockHeader))
.orElseThrow()) {
if (worldState instanceof BonsaiWorldState bonsaiWorldState) {
bonsaiWorldState.disableCacheMerkleTrieLoader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

WorldUpdater updater = getEffectiveWorldStateUpdater(ws);

if (ws instanceof BonsaiWorldState bonsaiWorldState) {
bonsaiWorldState.disableCacheMerkleTrieLoader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we would do better to have segment specific caches though so we can leverage different update patterns

return snapTx.get(segment, key);
try (final OperationTimer.TimingContext ignored = metrics.getReadLatency().startTimer()) {
final ColumnFamilyHandle handle = columnFamilyMapper.apply(segment);
if (isReadCacheEnabledForSnapshots && segment.isEligibleToHighSpecFlag()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring high spec would includes blockchain queries (which afaik should only be the corresponding body and header for this snapshot), but omits code. It might make sense to remove this condition entirely, since snapshots should only be used to access rpc-relevant things. the largest values are going to be block bodies, and they are not omitted by this check.

I can see how we might end up with multiple copies of the same code cached for each snapshot. ambivalent about this. it might be better to have a non-snapshot specific code cache, since the code hash key is deterministic.

having column family-specific read caches is probably a better approach in general since there are different update patterns for each. We can make better use of shared caches for several segments like blockchain and code

Copy link
Contributor

@ahamlat ahamlat Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to have a non-snapshot specific code cache, since the code hash key is deterministic.

Not sure to understand this comment. I agree with your comment related to blocks but in this case, it is not an issue, block data calls doesn't go through this code as far as know, I could be wrong. The idea was to exclude caching the result of get calls related to code. We don't want to cache it at all here.

@macfarla
Copy link
Contributor

any reason not to merge this @matkt ?

@macfarla macfarla enabled auto-merge (squash) May 29, 2025 07:46
@garyschulte garyschulte disabled auto-merge May 29, 2025 20:47
@garyschulte
Copy link
Contributor

re-enabling auto-merge after the failed metrics CLI passed locally. I still think we could do smarter caching here though.

@garyschulte garyschulte enabled auto-merge (squash) May 29, 2025 20:56
@garyschulte garyschulte merged commit 363324b into hyperledger:main May 29, 2025
48 checks passed
@matkt
Copy link
Contributor Author

matkt commented Jun 2, 2025

re-enabling auto-merge after the failed metrics CLI passed locally. I still think we could do smarter caching here though.

yes we can try to do a more optimized version if we really want to use this feature in mainnet or by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants