-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add accounts hard-link files into the bank snapshot directory #29496
Add accounts hard-link files into the bank snapshot directory #29496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brooksprumo hit all & more of the things I would have commented on here
8de9cde
to
340f1c9
Compare
a45da45
to
379df31
Compare
0194e90
to
aa74ca6
Compare
bd94a92
to
196c679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to go over this again, but submitting some initial thoughts + nits.
@xiangzhu70 I think one thing that would be really valuable for the PR (and for people looking back without as much context in the future) is if you added an example of what the new snapshot & accounts directories look like in the PR description - preferrably with multiple accounts_dirs.
d991fe5
to
72868f0
Compare
@@ -141,14 +141,16 @@ pub struct SnapshotRequestHandler { | |||
pub accounts_package_sender: Sender<AccountsPackage>, | |||
} | |||
|
|||
type AccountStorages = Vec<Arc<AccountStorageEntry>>; | |||
|
|||
impl SnapshotRequestHandler { | |||
// Returns the latest requested snapshot slot, if one exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is out of date; we also return the snapshot storages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually...is this comment even correct either? It returns blockheight
, not slot. @brooksprumo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to the changes in this PR. not sure if I should make this change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing + updating the comment. I was originally suggesting it should be done here until I noticed the blockheight/slot mismatch - but better to have it fixed now that we've seen it!
/// Hard-link the files from accounts/ to snapshot/<bank_slot>/accounts/ | ||
/// This keeps the appendvec files alive and with the bank snapshot. The slot and id | ||
/// in the file names are also updated in case its file is a recycled one with inconsistent slot | ||
/// and id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing where are the slot and id in the file names updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great finding! I had the code which does the updating. Somehow it was lost when making the multi account_path changes. Now I get the code back. Thanks!
36f26b0
to
3b2dc90
Compare
d1478f9
to
5799adf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
heh, do you want me to re-review after I've approved the PR? |
Oh, I didn't notice your approval earlier. No need to re-review again. Many thanks for all the review work! |
…-labs#29496) * Add accounts hard-link files into the bank snapshot directory * Small adjustments and fixes. * Address some of the review issues * Fix compilation issues * Change the latest slot snapshot storage from VecDeque to Option * IoWithSourceAndFile and expanded comments on accounts * last_slot_snapshot_storages in return value * Update comments following the review input * rename dir_accounts_hard_links to hard_link_path * Add dir_full_state flag for add_bank_snapshot * Let appendvec files hardlinking work with multiple accounts paths across multiple partitions * Fixes for rebasing * fix tests which generates account_path without adding run/ * rebasing fixes * fix account path test failures * fix test test_concurrent_snapshot_packaging * review comments. renamed the path setup function * Addressed most of the review comments * update with more review comments * handle error from create_accounts_run_and_snapshot_dirs * fix rebasing duplicate * minor accounts_dir path cleanup * minor cleanup, remove commented code * misc review comments * build error fix * Fix test_incremental_snapshot_download_with_crossing_full_snapshot_interval_at_startup * fix build error on MAX_BANK_SNAPSHOTS_TO_RETAIN * rebase fix, update hardlink filename * minor comment spelling fix * rebasing fixes * fix rebase issues; with_extension * comments changes for review * misc minor review issues * bank.fill_bank_with_ticks_for_tests * error handling on appendvec path * fix use_jit * minor comments refining * Remove type AccountStorages * get_account_path_from_appendvec_path return changed to Option * removed appendvec_path.to_path_buf in create_accounts_run_and_snapshot_dirs * add test_get_snapshot_accounts_hardlink_dir * update last_snapshot_storages comment * update last_snapshot_storages comment * symlink map_err * simplify test_get_snapshot_accounts_hardlink_dir with fake paths * log last_snapshot_storages at the end of the loop
Problem
To construct a bank from the bank snapshot directory, we need to build up the snapshot directory to contain the full state of the snapshot. This is one of the major step, to add the accounts appendvec files into the snapshot as hardlinks.
Because hardlinking only works on the same file system partition, a hardlink directory snapshot/ is created for every slot and every account_path provided. Each <account_path>/snapshot// contains the hardlinks of the account files in run/. The bank snapshot accounts_hardlinks contain symlinks to these directories.
For example, if the accounts_paths vector has the following:
~/ledger/accounts/
/mnt/accounts/
The bank snapshot directory snapshot/ will have an accounts_hardlinks/ directory, which contains symlinks
account_path_0 -> ~/ledger/accounts/snapshot/. # hardlinks of files in ~/ledger/accounts/run
account_path_1 -> /mnt/accounts/snapshot # hardlinks of files in /mnt/accounts/run
This is a split of the PR #28745
Summary of Changes
In add_bank_snapshot, add the hard-links of the accounts/ appendvec files.
Hold the reference counts of the appendvecs for the latest snapshot, to prevent the appendvecs from being recycled. The recycling would change the content of the hardlinked files. For now, a vector of SnapshotStorages is used to hold the reference counts. It could be changed to use option. But some data corruption error happened. Will debug this issue later.
There are also some other small debugging info changes to help debug the appendvec change history.
Fixes #