Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5319: Optimize virtual node cache flush strategy #5568

Merged
merged 14 commits into from May 3, 2023

Conversation

artemananiev
Copy link
Member

The fix for #5319. Summary of changes:

  • A new method is added to VirtualRoot, getEstimatedSize(), which is used by virtual pipeline to estimate how much memory a virtual root (mostly, its node cache) consumes
  • Implementation is based on MerkleDb serializers, which provide ways to get estimated sizes for keys and values, when they are serialized. Although serialized size and consumed Java heap size are different, I found strong correlation between them
  • Virtual roots are no longer automatically marked to be flushed every 20 rounds. There is still a way to enable flushing explicitly, this is very useful for tests and benchmarks
  • VirtualPipeline is changed to flush based on virtual root size rather than fast copy version. The main hash/merge/flush lifecycle job is simplified a bit. It still handles both explicitly requested to flush copies and copies, which should be flushed because they are too large
  • A new backpressure mechanism is implemented in the pipeline. Previously fast copy creation was artificially slowed down, if the number of copies to flush, but not yet flushed was more than 2 (or whatever the settings are). Now none of the copies are explicitly marked to flush, so this mechanism wouldn't work. Instead, pipelines check all the copies, which aren't merged or flushed, check their sizes, and if the size of all copies exceeds certain threshold, a delay is applied
  • A small change to when virtual map copies can be flushed to disk. Previously, one undestroyed copy was allowed to flush, and all newer copies were blocked (not flushed) until the oldest copy is released. Although it may be beneficial in some cases, in real worlds the bottleneck is different anyway (hashing), so I changed the pipeline to handle released copies only. It simplified VirtualPipeline methods quite a bit

Testing:

  • All existing unit tests pass
  • Some tests were corrected, e.g. to call release() before waiting until a virtual map is flushed
  • Two new unit tests are provided
  • A couple existing tests that were disabled are now enabled
  • CryptoTransfer benchmark was run in different configurations to check performance / stability

Fixes: #5319
Signed-off-by: Artem Ananev artem.ananev@swirldslabs.com

Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
@artemananiev artemananiev added P1 High priority issue, which must be completed in the milestone otherwise the release is at risk. CI:UnitTests Performance Issues related to performance concerns. Platform Virtual Map Platform Data Structures labels Mar 9, 2023
@artemananiev artemananiev self-assigned this Mar 9, 2023
@artemananiev artemananiev requested review from a team as code owners March 9, 2023 23:13
@github-actions
Copy link

github-actions bot commented Mar 9, 2023

Node: Unit Test Results

  1 339 files    1 339 suites   1h 28m 35s ⏱️
97 421 tests 97 413 ✔️ 8 💤 0
99 052 runs  99 044 ✔️ 8 💤 0

Results for commit e35f81c.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Mar 9, 2023

Node: Integration Test Results

    3 files      3 suites   14m 45s ⏱️
151 tests 151 ✔️ 0 💤 0
152 runs  152 ✔️ 0 💤 0

Results for commit e35f81c.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Mar 9, 2023

Node: E2E Test Results

    1 files      1 suites   17m 11s ⏱️
309 tests 309 ✔️ 0 💤 0
327 runs  327 ✔️ 0 💤 0

Results for commit e35f81c.

♻️ This comment has been updated with latest results.

…overflow during merges

Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
hendrikebbers
hendrikebbers previously approved these changes Mar 15, 2023
Copy link
Member

@hendrikebbers hendrikebbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from a platform-base perspective

⚠️ No idea about the business logic & internals

Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
@artemananiev artemananiev added this to the v0.39 milestone May 2, 2023
hendrikebbers
hendrikebbers previously approved these changes May 2, 2023
Copy link
Contributor

@OlegMazurov OlegMazurov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following classes must override getTypicalSerializedSize() in services:
UniqueTokenMerkleDbValueSerializer
VirtualBlobMerkleDbValueSerializer
OnDiskAccountMerkleDbValueSerializer
IterableContractMerkleDbValueSerializer

@artemananiev
Copy link
Member Author

The following classes must override getTypicalSerializedSize() in services: UniqueTokenMerkleDbValueSerializer VirtualBlobMerkleDbValueSerializer OnDiskAccountMerkleDbValueSerializer IterableContractMerkleDbValueSerializer

@OlegMazurov It can't be done yet, as a part of this PR. In the current version of the platform used by HS, there is no getTypicalSerializedSize() method in ValueSerializer interface.

@OlegMazurov
Copy link
Contributor

I couldn't build hedera-services without those changes. Of course, those new methods could be added to the classes. They can't use @Override, though. Maybe add it commented out with instructions to uncomment during migration to the new platform version?

OlegMazurov
OlegMazurov previously approved these changes May 2, 2023
@artemananiev
Copy link
Member Author

A follow-up issue is filed to implement getTypicalSerializedSize() method for all HS virtual value serializer classes: #6367.

poulok
poulok previously approved these changes May 2, 2023
Copy link
Member

@poulok poulok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the hashgraph owned file changes.

Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
OlegMazurov
OlegMazurov previously approved these changes May 3, 2023
Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
@sonarcloud
Copy link

sonarcloud bot commented May 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@github-actions
Copy link

github-actions bot commented May 3, 2023

Platform: JUnit Test Report

     501 files           1 errors  500 suites   28m 38s ⏱️
13 597 tests 13 560 ✔️ 37 💤 0
15 380 runs  15 343 ✔️ 37 💤 0

Results for commit e35f81c.

@sonarcloud
Copy link

sonarcloud bot commented May 3, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 8 Code Smells

78.6% 78.6% Coverage
0.1% 0.1% Duplication

@artemananiev artemananiev merged commit 8ae4451 into develop May 3, 2023
16 of 18 checks passed
@artemananiev artemananiev deleted the 06253-D-virtual-cache-size-based-flushes branch May 3, 2023 04:35
mhess-swl pushed a commit that referenced this pull request May 4, 2023
Fixes: #5319
Reviewed-by: Ivan Malygin <ivan@swirldslabs.com>, Kelly Greco <kelly@swirldslabs.com>, Oleg Mazurov <oleg.mazurov@swirldslabs.com>
Signed-off-by: Artem Ananev <artem.ananev@swirldslabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority issue, which must be completed in the milestone otherwise the release is at risk. Performance Issues related to performance concerns. Platform Data Structures Platform Virtual Map
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize virtual node cache flush strategy
8 participants