Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: eliminate accounts index arc and rwlock #18452

Closed
wants to merge 1 commit into from

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Jul 6, 2021

Problem

Prototyping moving to a disk-based accounts index. Working on 1B accounts. The disk-based accounts index model relies on higher level account index locks instead of locks per entry. Much more research to be done.

Summary of Changes

Eliminate arc and rwlock on slotlist. This results in more lock contention. But, recently we enabled binning of the accounts index. More bins means less lock contention.
Fixes #

@jeffwashington jeffwashington force-pushed the copies4 branch 2 times, most recently from b45eb86 to d44228a Compare Jul 7, 2021
@jeffwashington
Copy link
Contributor Author

jeffwashington commented Jul 7, 2021

Index generation metrics:
total_us, insertion_time_us go down. scan_stores_us even goes down. Likely less thread over-scheduling.

lemond:
this change with serial insertion (other pr):
generate_index total_us= 7587545i scan_stores_us=10500839i insertion_time_us=19661952i min_bin_size=4565247i max_bin_size=4577150i total_items=73129210i
generate_index total_us=12805024i scan_stores_us=15308801i insertion_time_us=27794907i min_bin_size=4565247i max_bin_size=4577150i total_items=73129210i
master^

64 bins, serial insertion:
generate_index total_us= 5661059i scan_stores_us= 9342401i insertion_time_us=14970163i min_bin_size=1139650i max_bin_size=1146464i total_items=73129210i

@jeffwashington
Copy link
Contributor Author

jeffwashington commented Jul 7, 2021

index READ metrics for reading every item in all storages in parallel:
note that insertion_time_us is really 'get' time us. The code and metrics are overloaded for simplicity.
reading all pubkeys takes 17s with this pr vs 23s for master (many threads are doing this).
in wall clock time, it takes 3.9s with this pr vs 4.9s for master

this pr:
generate_index total_us=3869973i scan_stores_us=7386426i insertion_time_us=17054601i min_bin_size=4565247i max_bin_size=4577150i total_items=73129210i
generate_index total_us=4916640i scan_stores_us=7522161i insertion_time_us=23235349i min_bin_size=4565247i max_bin_size=4577150i total_items=73129210i
master^

64 bins:
generate_index total_us=3787284i scan_stores_us=7533124i insertion_time_us=16517258i min_bin_size=1139650i max_bin_size=1146464i total_items=73129210i

true
}
};
let w_account_maps = self.get_account_maps_write_lock(pubkey);
Copy link
Contributor

@carllin carllin Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary concerns here are:

  1. If a set of hotly updated pubkeys end up in the same bin, this could cause a performance degradation, especially if people are grinding for certain vanity pubkeys.
    A good benchmark of this would be to generate a bunch of keys with the same first byte so they end up in the same bin, then run a set of transfers from those keys to another set of keys, and compare the perf to master.

  2. Contention with lookups in clean:

    match self.accounts_index.get(pubkey, None, max_clean_root) {
    . The larger the active set grows, the more keys will need to be cleaned in every root, which means it's likely a significant number of keys per bin will need to be cleaned every iteration. For example if we assume the set of dirty keys per clean interval (100 slots) is 100,000, then we will have in estimation 100,000 / number of bins conflicting keys with the main replay index update here. This could be an issue if you are trying to update a key in a particular bin that is also currently being used by clean, which should happen every 1/number of bins updates, so likely at least once in a slot if there's an ongoing clean. Ideally in these cases replay would be prioritized over clean, but there's no guarantee of that.

  3. RPC lookups (not even scans) for existing accounts like get_balance or get_account can no longer run concurrently with updates, so a lot of RPC lookups may starve progress here. This would be another good thing to try with an RPC node. I think Stephen has set up a bench for this in the past to test RPC scan + index contention, which we could repurpose for other RPC queries.

The metric to look at for contention would be accounts_db_store_timings::update_index

@jeffwashington jeffwashington force-pushed the copies4 branch 6 times, most recently from c523204 to 2f3e99e Compare Jul 13, 2021
@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

Merging #18452 (695d814) into master (f51d648) will decrease coverage by 0.0%.
The diff coverage is 97.5%.

@@            Coverage Diff            @@
##           master   #18452     +/-   ##
=========================================
- Coverage    82.8%    82.8%   -0.1%     
=========================================
  Files         444      444             
  Lines      126568   126614     +46     
=========================================
+ Hits       104859   104888     +29     
- Misses      21709    21726     +17     

@stale
Copy link

stale bot commented Jul 21, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale
Copy link

stale bot commented Aug 3, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Aug 3, 2021
@stale
Copy link

stale bot commented Aug 10, 2021

This stale pull request has been automatically closed. Thank you for your contributions.

@stale stale bot closed this Aug 10, 2021
@stale
Copy link

stale bot commented Aug 28, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Aug 28, 2021
@stale
Copy link

stale bot commented Sep 4, 2021

This stale pull request has been automatically closed. Thank you for your contributions.

@stale stale bot closed this Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants