Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do periodic inbound cleaning for rooted slots #8436

Merged
merged 16 commits into from Mar 3, 2020

Conversation

@ryoqun
Copy link
Member

ryoqun commented Feb 25, 2020

Problem

After much investigation, I've realized just removing storage entries with count being 0 nor tweaking the purge logic isn't enough.

The root cause of snapshot bloat is that we leak StorageEntry/AppendVec too easily, first of all. Even just running solana ping causes that situation.

In this regard, much like the zero-lamport accounts RAM exhaustion DOS attack, this behavior can be easily used to mount a DOS with rather small amount of SOL.

Summary of Changes

We were too lazy to GC older storage entries. Especially, we do literally nothing for any account states while not touched.

So, introduce a bit eager compaction for non-zero lamports accounts at the snapshot generation intervals.

Background

This is one of two PRs towards #8168. This is an inbound compaction. The other (#8337) is an outbound (when loading snapshot) compaction. To really fix enlarged snapshot on TdS, we need to land both.

Part of #8168

@ryoqun ryoqun added the security label Feb 25, 2020
@ryoqun ryoqun changed the title Inbound compaction Do periodic inbound compaction for rooted slots Feb 25, 2020
@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 25, 2020

Well, I just noticed that when snapshot generation is disabled, this new compaction and existing purge_zero_lamports aren't run never... This could exhaust system resources..

@ryoqun ryoqun requested a review from sakridge Feb 25, 2020
) {
let mut accounts_index = self.accounts_index.write().unwrap();
let mut all_pubkeys: HashSet<Pubkey> = HashSet::<Pubkey>::default();
for root in accounts_index.not_compacted_roots.iter() {

This comment has been minimized.

Copy link
@ryoqun

ryoqun Feb 25, 2020

Author Member

I think the whole purge_zero_lamport thing just needs to be checked for the delta of newly-rooted slots since last purge, inspired by this #8405. Currently, only this new non-zero lamports account compaction is checked in that manner.

We can move to delta-purge for online processing once we finish out-bound compaction in place #8337. Would that be a good direction?

This comment has been minimized.

Copy link
@sakridge

sakridge Feb 25, 2020

Member

delta purge sounds good. It's a bit challenging because of the cascading situation, an account which is not part of the delta-set could be preventing a purge, so you would have to know to go look to see if that one could be purged as well.

This comment has been minimized.

Copy link
@ryoqun

ryoqun Feb 26, 2020

Author Member

Hhm. Yeah, that's right.. It looks like harder than what I originally thought.

}
}
let mut reclaims = Vec::new();
for pubkey in all_pubkeys {

This comment has been minimized.

Copy link
@ryoqun

ryoqun Feb 25, 2020

Author Member

I haven't measured how much this impacts the overall validator performance yet...

This comment has been minimized.

Copy link
@ryoqun

ryoqun Feb 25, 2020

Author Member

Basically, this incurs slot_vec write locks for the union of all updated accounts since last checked. Effectively this means one more such lock for all account updates.

This comment has been minimized.

Copy link
@sakridge

sakridge Feb 25, 2020

Member

Yea. it would be nice to see how it affects it. It seems the main cost is the accounts scan though, so this might not be too much. I dont have a good idea of how many accounts would be purged this way. So maybe we should test it in a testnet.

@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 25, 2020

@sakridge Could you review this please? I'm wondering there is more better solution or not, especially because of rather high CPU cost of current compaction implementation.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 25, 2020

Codecov Report

Merging #8436 into master will increase coverage by <.1%.
The diff coverage is 98.8%.

@@           Coverage Diff            @@
##           master   #8436     +/-   ##
========================================
+ Coverage      80%     80%   +<.1%     
========================================
  Files         256     256             
  Lines       55594   55728    +134     
========================================
+ Hits        44477   44607    +130     
- Misses      11117   11121      +4
runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 26, 2020

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

// Purge zero lamport accounts for garbage collection purposes
// Reclaim older states of rooted non-zero lamport accounts as a general
// AccountsDB bloat mitigation and preprocess for better zero-lamport purging.
fn purge_old_normal_accounts(&self, purges: &HashMap<Pubkey, Vec<(Slot, AccountInfo)>>) {

This comment has been minimized.

Copy link
@sakridge

sakridge Feb 26, 2020

Member

Can we make just one more rename for this function. Maybe something like clean_old_account_roots

This comment has been minimized.

Copy link
@ryoqun

ryoqun Feb 26, 2020

Author Member

Nice catch!

Done: 6c3a696.

I persisted ending with ..._accounts to avoid confusion with AccountsIndex::clean_dead_slot.

The former removes accounts and the later does actual slots. I think root as a noun connotes a (root) slot.

@sakridge

This comment has been minimized.

Copy link
Member

sakridge commented Feb 26, 2020

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

Looks pretty good. I had just one more rename and can we get an idea of the performance impact?

@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 26, 2020

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

Looks pretty good. I had just one more rename and can we get an idea of the performance impact?

@sakridge I'll work on. :) My first testnet perf. thing. :)

Copy link
Member

sakridge left a comment

lgtm, thanks for the renames

@ryoqun ryoqun changed the title Do periodic inbound compaction for rooted slots Do periodic inbound cleaning for rooted slots Feb 26, 2020
@mergify mergify bot dismissed sakridge’s stale review Feb 27, 2020

Pull request has been modified.

@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 27, 2020

A somewhat disappointing yet interesting performance reports, considering this is my first one. :)
grafana-testnet-monitor-edge-ryoqun-before.pdf
grafana-testnet-monitor-edge-ryoqun-after.pdf

Findings:

  • Overall, the critical section of snapshot creation got slow (+~1.8s, +3x).
    • Max confirmation time is increased (+1.5x for 99th percentile, +2x for max) due to more noticeable stoppage for each snapshot creation
  • Clean up is working (AppendVes are ~15% fewer)
  • However, the overall TPS is increased (+~10%)
    • I guess this is due to reduced workset.
  • Lock contention is visible on the "Resource Usage" panel.

Considering the above, I've introduced a cheap parallelization: 34ab324

@sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean.

@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 27, 2020

Commenting is TBD (just performance report pdfs, for now)

In short, no compelling optimization result is observed.... ;)

grafana-testnet-monitor-edge-ryoqun-after-par-v1.pdf
grafana-testnet-monitor-edge-ryoqun-after-par-v2.pdf

@sakridge

This comment has been minimized.

Copy link
Member

sakridge commented Feb 27, 2020

A somewhat disappointing yet interesting performance reports, considering this is my first one. :)
grafana-testnet-monitor-edge-ryoqun-before.pdf
grafana-testnet-monitor-edge-ryoqun-after.pdf

Findings:

  • Overall, the critical section of snapshot creation got slow (+~1.8s, +3x).

    • Max confirmation time is increased (+1.5x for 99th percentile, +2x for max) due to more noticeable stoppage for each snapshot creation
  • Clean up is working (AppendVes are ~15% fewer)

  • However, the overall TPS is increased (+~10%)

    • I guess this is due to reduced workset.
  • Lock contention is visible on the "Resource Usage" panel.

Considering the above, I've introduced a cheap parallelization: 34ab324

@sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean.

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

@sakridge

This comment has been minimized.

Copy link
Member

sakridge commented Feb 27, 2020

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

Or maybe sample the number of accounts so that maybe we only clean up say 10% of them with each run.

@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Feb 28, 2020

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

I see! Ok I'm learning performance sensitivity in Solana... :)

Before we finally give up and considering async or sampling, I've done my next round of optimizations with good resutls.

grafana-testnet-monitor-edge-ryoqun-after-par-v2.5.pdf (Introduced metrics)
grafana-testnet-monitor-edge-ryoqun-after-par-v3.pdf (Removed the storage scan step)

Compared to the base commit:

  • Overall, the critical section of snapshot creation got slow a bit (+~200ms, +1.2x).
    • Max confirmation time is negligibly increased (+3% for 99th percentile, +15% for max) due to somewhat increased stoppage for each snapshot creation
  • Cleaning should be working but not observable (This is a bit odd)
  • The overall TPS isn't changed (=~ +/-1%)
    • I guess this is due to no change to the count of AppendVecs
  • Lock contention isn't visible anymore on the "Resource Usage" panel.

Finally, I think I've come to the compelling optimization result. Is this +200ms latency increase is still a concern? Then, I'll consider further optimization or sampling or asyncs.

@ryoqun ryoqun requested a review from sakridge Feb 28, 2020
runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
@ryoqun ryoqun force-pushed the ryoqun:inbound-compaction branch from 11ddaa1 to 851e530 Mar 3, 2020
@mergify mergify bot dismissed sakridge’s stale review Mar 3, 2020

Pull request has been modified.

@ryoqun ryoqun merged commit d861033 into solana-labs:master Mar 3, 2020
14 checks passed
14 checks passed
Rule: v0.23 backport (backport) Backports have been created
Details
Rule: v1.0 backport (backport) Backports have been created
Details
Summary 3 rules match and 2 potential rules
Details
buildkite/solana Build #19984 passed (1 hour, 26 minutes, 47 seconds)
Details
buildkite/solana/bench Passed (25 minutes, 9 seconds)
Details
buildkite/solana/checks Passed (2 minutes, 17 seconds)
Details
buildkite/solana/coverage Passed (15 minutes, 29 seconds)
Details
buildkite/solana/local-cluster Passed (23 minutes, 29 seconds)
Details
buildkite/solana/move Passed (5 minutes, 2 seconds)
Details
buildkite/solana/pipeline-upload Passed (3 seconds)
Details
buildkite/solana/shellcheck Passed (31 seconds)
Details
buildkite/solana/stable Passed (42 minutes, 9 seconds)
Details
buildkite/solana/stable-perf Passed (12 minutes, 27 seconds)
Details
ci-gate Pull Request accepted for CI pipeline
mergify bot pushed a commit that referenced this pull request Mar 3, 2020
* Do periodic inbound compaction for rooted slots

* Add comment

* nits

* Consider not_compacted_roots in cleanup_dead_slot

* Renames in AccountsIndex

* Rename to reflect expansion of removed accounts

* Fix a comment

* rename

* Parallelize clean over AccountsIndex

* Some niceties

* Reduce locks and real chunked parallelism

* Measure each step for sampling opportunities

* Just noticed par iter is maybe lazy

* Replace storage scan with optimized index scan

* Various clean-ups

* Clear uncleared_roots even if no updates

(cherry picked from commit d861033)

# Conflicts:
#	ledger/src/snapshot_utils.rs
#	runtime/src/accounts_db.rs
#	runtime/src/accounts_index.rs
#	runtime/src/bank.rs
mergify bot pushed a commit that referenced this pull request Mar 3, 2020
* Do periodic inbound compaction for rooted slots

* Add comment

* nits

* Consider not_compacted_roots in cleanup_dead_slot

* Renames in AccountsIndex

* Rename to reflect expansion of removed accounts

* Fix a comment

* rename

* Parallelize clean over AccountsIndex

* Some niceties

* Reduce locks and real chunked parallelism

* Measure each step for sampling opportunities

* Just noticed par iter is maybe lazy

* Replace storage scan with optimized index scan

* Various clean-ups

* Clear uncleared_roots even if no updates

(cherry picked from commit d861033)
solana-grimes pushed a commit that referenced this pull request Mar 3, 2020
automerge
ryoqun added a commit that referenced this pull request Mar 3, 2020
* Do periodic inbound compaction for rooted slots

* Add comment

* nits

* Consider not_compacted_roots in cleanup_dead_slot

* Renames in AccountsIndex

* Rename to reflect expansion of removed accounts

* Fix a comment

* rename

* Parallelize clean over AccountsIndex

* Some niceties

* Reduce locks and real chunked parallelism

* Measure each step for sampling opportunities

* Just noticed par iter is maybe lazy

* Replace storage scan with optimized index scan

* Various clean-ups

* Clear uncleared_roots even if no updates
@ryoqun ryoqun added automerge and removed automerge labels Mar 3, 2020
solana-grimes pushed a commit that referenced this pull request Mar 3, 2020
automerge
@ryoqun

This comment has been minimized.

Copy link
Member Author

ryoqun commented Mar 3, 2020

Phew finally I've finished all the back-porting work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.