-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slow saving of latest witnesses #11354
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11354 +/- ##
==========================================
+ Coverage 71.08% 71.17% +0.09%
==========================================
Files 783 784 +1
Lines 156833 157305 +472
Branches 156833 157305 +472
==========================================
+ Hits 111477 111969 +492
+ Misses 40528 40493 -35
- Partials 4828 4843 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -1040,6 +1040,9 @@ impl<'a> ChainStoreUpdate<'a> { | |||
DBCol::LatestChunkStateWitnesses => { | |||
store_update.delete(col, key); | |||
} | |||
DBCol::LatesWitnessesByIndex => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Latest
@@ -45,21 +47,22 @@ impl LatestWitnessesKey { | |||
/// `LatestWitnessesKey` has custom serialization to ensure that the binary representation | |||
/// starts with big-endian height and shard_id. | |||
/// This allows to query using a key prefix to find all witnesses for a given height (and shard_id). | |||
pub fn serialized(&self) -> [u8; 64] { | |||
let mut result = [0u8; 64]; | |||
pub fn serialized(&self) -> [u8; 72] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're no longer using range iteration, does it matter what encoding we're using? Should we instead use borsh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoding doesn't matter for the write path now, but it still matters for the read path. When someone queries for saved witnesses at a given height we can quickly find them using the prefix.
./neard view-state latest-witnesses --height 119372230 --shard-id 0
# create prefix (119372230, 0) and query `DBCol::LatestChunkStateWitnesses` with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, gotcha. That makes sense. Thanks.
let (key_bytes, witness_bytes) = item?; | ||
store_update.delete(DBCol::LatestChunkStateWitnesses, &key_bytes); | ||
// Go over witnesses with increasing indexes and remove them until the limits are satisfied. | ||
while !info.is_within_limits() && info.lowest_index < info.next_witness_index { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the maximum number of old witnesses evicted this way? Is it (max witness size / min witness size)? If there are a bunch of small witnesses followed by a big witness, would that mean we have to look up quite many witnesses when processing the large one? Should we place an upper limit on how many times this loop iterates, just to not block the main processing path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which, I wonder if it makes sense for this whole thing to be on a separate thread (as in, a separate actor). I guess it's a bit overkill but the fewer things on the main thread the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait I see that this is debug-only and not recommended for production use. Okay, maybe this is not a big deal then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it could potentially take a long time if it needs to remove a lot of saved witnesses :/
In the average case it only removes one or two, but in the worst case scenario it could need to remove thousands of witnesses.
It's a debug-only optional feature, and I'm not sure if anyone is really using it, so I didn't spend too much time optimizing things. We can fix it in a follow up PR if it becomes a problem.
Fixes: near#11258 Changes in this PR: * Improved observability of saving latest witnesses * Added metrics * Added a tracing span, which will be visible in span analysis tools * Added a printout in the logs with details about saving the latest witness * Fixed the extreme slowness of `save_latest_chunk_state_witness`, the new solution doesn't iterate anything * Start saving witnesses produced during shadow validation, I needed that to properly test the change The previous solution used `store().iter()` to find the witness with the lowest height that needs to be removed to free up space, but it turned out that this takes a really long time, ~100ms! The new solution doesn't iterate anything, instead of that it maintains a mapping from integer indexes to saved witnesses. So the first observed witness gets index 0, the second one gets 1, third gets 2, and so on... When it's time to free up space we delete the witness with the lowest index. We maintain two pointers to the ends of this "queue", and move them accordingly when the witnesses are removed and added. This greatly improves the time needed to save the latest witness - with new code generating the database update usually takes under 1ms, and commiting it takes under 6ms (on shadow validation): ![image](https://github.com/near/nearcore/assets/149345204/06f379d3-1a36-4aa0-8c5f-043bab7bc36c) ([view the metrics here](https://nearone.grafana.net/d/admakiv9pst8gd/save-latest-witnesses-stats?orgId=1&var-chain_id=mainnet&var-node_id=jan-mainnet-node&var-shard_id=All&from=1716234291000&to=1716241491000)) ~7ms is still a non-negligible amount of time, but it's way better than the previous ~100ms. It's a debug only feature, so 7ms might be acceptable.
Fixes: #11258
Changes in this PR:
save_latest_chunk_state_witness
, the new solution doesn't iterate anythingThe previous solution used
store().iter()
to find the witness with the lowest height that needs to be removed to free up space, but it turned out that this takes a really long time, ~100ms!The new solution doesn't iterate anything, instead of that it maintains a mapping from integer indexes to saved witnesses.
So the first observed witness gets index 0, the second one gets 1, third gets 2, and so on...
When it's time to free up space we delete the witness with the lowest index.
We maintain two pointers to the ends of this "queue", and move them accordingly when the witnesses are removed and added.
This greatly improves the time needed to save the latest witness - with new code generating the database update usually takes under 1ms, and commiting it takes under 6ms (on shadow validation):
(view the metrics here)
~7ms is still a non-negligible amount of time, but it's way better than the previous ~100ms. It's a debug only feature, so 7ms might be acceptable.