Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add accounts hard-link files into the bank snapshot directory #29496

Merged

Conversation

xiangzhu70
Copy link
Contributor

@xiangzhu70 xiangzhu70 commented Jan 3, 2023

Problem

To construct a bank from the bank snapshot directory, we need to build up the snapshot directory to contain the full state of the snapshot. This is one of the major step, to add the accounts appendvec files into the snapshot as hardlinks.

Because hardlinking only works on the same file system partition, a hardlink directory snapshot/ is created for every slot and every account_path provided. Each <account_path>/snapshot// contains the hardlinks of the account files in run/. The bank snapshot accounts_hardlinks contain symlinks to these directories.

For example, if the accounts_paths vector has the following:
~/ledger/accounts/
/mnt/accounts/

The bank snapshot directory snapshot/ will have an accounts_hardlinks/ directory, which contains symlinks
account_path_0 -> ~/ledger/accounts/snapshot/. # hardlinks of files in ~/ledger/accounts/run
account_path_1 -> /mnt/accounts/snapshot # hardlinks of files in /mnt/accounts/run

This is a split of the PR #28745

Summary of Changes

In add_bank_snapshot, add the hard-links of the accounts/ appendvec files.
Hold the reference counts of the appendvecs for the latest snapshot, to prevent the appendvecs from being recycled. The recycling would change the content of the hardlinked files. For now, a vector of SnapshotStorages is used to hold the reference counts. It could be changed to use option. But some data corruption error happened. Will debug this issue later.
There are also some other small debugging info changes to help debug the appendvec change history.

Fixes #

@xiangzhu70 xiangzhu70 marked this pull request as ready for review January 4, 2023 04:57
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/accounts_db.rs Show resolved Hide resolved
runtime/src/append_vec.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brooksprumo hit all & more of the things I would have commented on here

runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_package.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to go over this again, but submitting some initial thoughts + nits.

@xiangzhu70 I think one thing that would be really valuable for the PR (and for people looking back without as much context in the future) is if you added an example of what the new snapshot & accounts directories look like in the PR description - preferrably with multiple accounts_dirs.

core/tests/snapshots.rs Outdated Show resolved Hide resolved
core/src/snapshot_packager_service.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
runtime/src/accounts_background_service.rs Outdated Show resolved Hide resolved
@xiangzhu70 xiangzhu70 force-pushed the add_accounts_to_bank_snapshot_dir branch 2 times, most recently from d991fe5 to 72868f0 Compare January 31, 2023 18:42
@@ -141,14 +141,16 @@ pub struct SnapshotRequestHandler {
pub accounts_package_sender: Sender<AccountsPackage>,
}

type AccountStorages = Vec<Arc<AccountStorageEntry>>;

impl SnapshotRequestHandler {
// Returns the latest requested snapshot slot, if one exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is out of date; we also return the snapshot storages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually...is this comment even correct either? It returns blockheight, not slot. @brooksprumo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to the changes in this PR. not sure if I should make this change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing + updating the comment. I was originally suggesting it should be done here until I noticed the blockheight/slot mismatch - but better to have it fixed now that we've seen it!

runtime/src/snapshot_utils.rs Show resolved Hide resolved
/// Hard-link the files from accounts/ to snapshot/<bank_slot>/accounts/
/// This keeps the appendvec files alive and with the bank snapshot. The slot and id
/// in the file names are also updated in case its file is a recycled one with inconsistent slot
/// and id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing where are the slot and id in the file names updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great finding! I had the code which does the updating. Somehow it was lost when making the multi account_path changes. Now I get the code back. Thanks!

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
@xiangzhu70 xiangzhu70 force-pushed the add_accounts_to_bank_snapshot_dir branch 3 times, most recently from 36f26b0 to 3b2dc90 Compare February 2, 2023 16:24
@xiangzhu70 xiangzhu70 force-pushed the add_accounts_to_bank_snapshot_dir branch from d1478f9 to 5799adf Compare February 15, 2023 15:58
@brooksprumo brooksprumo self-requested a review February 15, 2023 16:38
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@brooksprumo
Copy link
Contributor

heh, do you want me to re-review after I've approved the PR?

@xiangzhu70
Copy link
Contributor Author

do you want me to re-review after I've approved the PR?

Oh, I didn't notice your approval earlier. No need to re-review again. Many thanks for all the review work!

@xiangzhu70 xiangzhu70 merged commit 4909267 into solana-labs:master Feb 15, 2023
nickfrosty pushed a commit to nickfrosty/solana that referenced this pull request Mar 12, 2023
…-labs#29496)

* Add accounts hard-link files into the bank snapshot directory

* Small adjustments and fixes.

* Address some of the review issues

* Fix compilation issues

* Change the latest slot snapshot storage from VecDeque to Option

* IoWithSourceAndFile and expanded comments on accounts

* last_slot_snapshot_storages in return value

* Update comments following the review input

* rename dir_accounts_hard_links to hard_link_path

* Add dir_full_state flag for add_bank_snapshot

* Let appendvec files hardlinking work with multiple accounts paths across multiple partitions

* Fixes for rebasing

* fix tests which generates account_path without adding run/

* rebasing fixes

* fix account path test failures

* fix test test_concurrent_snapshot_packaging

* review comments.  renamed the path setup function

* Addressed most of the review comments

* update with more review comments

* handle error from create_accounts_run_and_snapshot_dirs

* fix rebasing duplicate

* minor accounts_dir path cleanup

* minor cleanup, remove commented code

* misc review comments

* build error fix

* Fix test_incremental_snapshot_download_with_crossing_full_snapshot_interval_at_startup

* fix build error on MAX_BANK_SNAPSHOTS_TO_RETAIN

* rebase fix, update hardlink filename

* minor comment spelling fix

* rebasing fixes

* fix rebase issues; with_extension

* comments changes for review

* misc minor review issues

* bank.fill_bank_with_ticks_for_tests

* error handling on appendvec path

* fix use_jit

* minor comments refining

* Remove type AccountStorages

* get_account_path_from_appendvec_path return changed to Option

* removed appendvec_path.to_path_buf in create_accounts_run_and_snapshot_dirs

* add test_get_snapshot_accounts_hardlink_dir

* update last_snapshot_storages comment

* update last_snapshot_storages comment

* symlink map_err

* simplify test_get_snapshot_accounts_hardlink_dir with fake paths

* log last_snapshot_storages at the end of the loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants