Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(state-sync): DB Snapshots #9090

Merged
merged 168 commits into from
Jun 20, 2023
Merged

feat(state-sync): DB Snapshots #9090

merged 168 commits into from
Jun 20, 2023

Conversation

nikurt
Copy link
Contributor

@nikurt nikurt commented May 22, 2023

FlatStorage provides a fast way to generate state parts, but it needs a consistent view of state at the beginning of the epoch.
This PR makes a snapshot of the whole DB, deletes unused columns. This gives the node a read-only database with exactly the view of Flat State that is needed for State Sync.
The snapshot is made in a separate actix Actor to avoid blocking ClientActor.

To improve the iteration speed on the development, added a testing mechanism. A config option can trigger state snapshots every N blocks. Makes no sense for state sync, but very useful to observe enough snapshot events.

Tested that a node can process blocks while snapshotting is in progress.
Without compaction snapshots can result in 100GB extra disk space requirement.

Enabling compaction reduces extra disk overhead from ~100GB to ~10GB.

@nikurt nikurt requested a review from Longarithm May 24, 2023 12:32
@nikurt nikurt marked this pull request as ready for review May 24, 2023 12:32
@nikurt nikurt requested a review from a team as a code owner May 24, 2023 12:32
near-bulldozer bot pushed a commit that referenced this pull request May 25, 2023
Gennerate inner part of state part using flat storage using idea present in #8984.

In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`.

It requires couple of minor changes:
* now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing
* `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage
* `FlatStateValue` moved to `primitives` to allow more general access


## TODO
* prometheus metrics
* integration test checking that flat storage is used during normal block processing on client (or wait for #9090)
 
## Testing

https://nayduck.near.org/#/run/3023

Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of:
* results with/without flat storage must match
* result with incorrect flat storage must be an error
* result with flat storage and missing intermediate node should be still okay
@Longarithm
Copy link
Member

I like the code, it's very clear. But I have some general concern.

We shouldn't put state_snapshot fields in NightshadeRuntime. It is designed to be stateless and hold only logic for transaction processing. My rule of thumb is: the less methods we have in RuntimeAdapter, the better, as it is already bloated, and there was a huge effort to separate EpochManager from it. Logic for making snapshots and getting state parts don't really fit in Runtime because it is not related to txn processing. I think that having obtain_state_part inside RuntimeAdapter was a mistake, which is causing this unnatural dependency of state parts logic on Runtime.

I don't have opinion yet where this logic should be - maybe it's a separate struct like StatePartManager in Client, maybe part of StateSync. But it is more or less clear that it doesn't need to know about Runtime. As a short-term hack we could pass StateSnapshot to obtain_state_part to make the code work. Happy to discuss it here or offline.

@Longarithm
Copy link
Member

Maybe this logic fits well into ShardTries, because this struct owns state store updates and provides state views. Looks like it already knows all needed context.

nikurt pushed a commit that referenced this pull request May 31, 2023
Gennerate inner part of state part using flat storage using idea present in #8984.

In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`.

It requires couple of minor changes:
* now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing
* `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage
* `FlatStateValue` moved to `primitives` to allow more general access

* prometheus metrics
* integration test checking that flat storage is used during normal block processing on client (or wait for #9090)

https://nayduck.near.org/#/run/3023

Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of:
* results with/without flat storage must match
* result with incorrect flat storage must be an error
* result with flat storage and missing intermediate node should be still okay
@nikurt
Copy link
Contributor Author

nikurt commented Jun 1, 2023

@Longarithm Please take another look.
Moved the data and logic to ShardTries. Maybe this logic needs to be in a separate class?
Also, I butchered some logic that in shard_tries.rs, please advise.

Many tests are failing because they use epoch length that is too short. It takes a couple of seconds to:

  • delete an old snapshot
  • checkpoint
  • delete columns
  • compact
    And if snapshots are requested too often, it's not happy. Handled that by ignoring consecutive requests if I can't lock flat storage.

@Longarithm
Copy link
Member

Looking.

Many tests are failing because they use epoch length that is too short.

I see... We can disable removing columns and compaction for tests, because state is very small there. Could it simplify changes?

@nikurt
Copy link
Contributor Author

nikurt commented Jun 2, 2023

I see... We can disable removing columns and compaction for tests, because state is very small there. Could it simplify changes?

It'll help, but there are still limits to it, and I don't want to introduce more configuration flags.

@nikurt
Copy link
Contributor Author

nikurt commented Jun 5, 2023

@Longarithm Please take a look

Comment on lines 101 to 105
assert_ne!(
flat_storage_manager.set_flat_state_updates_mode(false),
Some(false),
"Failed to lock flat state updates"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion looks fine (would it even be possible to propagate error from this actor to main actor?).
But let's change set_flat_state_updates_mode return type to Result<bool, StorageError>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to check whether set_flat_state_updates_mode() succeeded. If not, then ignore the snapshotting request.
I don't return an error, because the caller doesn't care about an error. The blocks need to be processed further. Prefer the fail-open (aka fail-safe) approach, because the snapshots are not critical to any individual node. Though they are critical to the network as a whole.

Comment on lines 169 to 171
Some(v) => assert_eq!(
prev_shard_value, v,
"All FlatStorage are expected to have the same value of `move_head_enabled`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to comment below get_make_snapshot_callback - let's make it an error and propagate further. Looks like we can't return None from here anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see an updated version of this function.

core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
nearcore/src/runtime/mod.rs Outdated Show resolved Hide resolved
core/store/src/trie/state_parts.rs Show resolved Hide resolved
core/o11y/src/macros.rs Outdated Show resolved Hide resolved
core/store/src/flat/manager.rs Outdated Show resolved Hide resolved
core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved
@@ -2547,11 +2568,40 @@ fn test_catchup_gas_price_change() {
genesis.config.min_gas_price = min_gas_price;
genesis.config.gas_limit = 1000000000000;
let chain_genesis = ChainGenesis::new(&genesis);

let tmp_dir1 = tempfile::tempdir().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can't reuse real_epoch_managers and nightshade_runtimes as before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nightshade_runtimes() initializes runtime with home_dir ../../../../ where I can't create a new directory. Therefore, I create a runtime manually.
real_epoch_managers() by itself is fine, but if I use it, I don't have an epoch manager to pass to the runtime I'm initializing manually.

integration-tests/src/tests/client/process_blocks.rs Outdated Show resolved Hide resolved
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Jun 8, 2023
Gennerate inner part of state part using flat storage using idea present in near#8984.

In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`.

It requires couple of minor changes:
* now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing
* `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage
* `FlatStateValue` moved to `primitives` to allow more general access

* prometheus metrics
* integration test checking that flat storage is used during normal block processing on client (or wait for near#9090)

https://nayduck.near.org/#/run/3023

Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of:
* results with/without flat storage must match
* result with incorrect flat storage must be an error
* result with flat storage and missing intermediate node should be still okay
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Jun 8, 2023
Gennerate inner part of state part using flat storage using idea present in near#8984.

In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`.

It requires couple of minor changes:
* now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing
* `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage
* `FlatStateValue` moved to `primitives` to allow more general access

* prometheus metrics
* integration test checking that flat storage is used during normal block processing on client (or wait for near#9090)

https://nayduck.near.org/#/run/3023

Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of:
* results with/without flat storage must match
* result with incorrect flat storage must be an error
* result with flat storage and missing intermediate node should be still okay
@nikurt nikurt requested a review from Longarithm June 12, 2023 17:14
@nikurt
Copy link
Contributor Author

nikurt commented Jun 19, 2023

Nayduck tests are fine, but a few tests fail occasionally due to reasons unrelated to this PR.

@near-bulldozer near-bulldozer bot merged commit a5ede1d into master Jun 20, 2023
@near-bulldozer near-bulldozer bot deleted the nikurt-seize-the-state branch June 20, 2023 14:07
ppca added a commit that referenced this pull request Sep 19, 2023
marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this pull request Mar 15, 2024
This was added in near#9090
to provide a way to reduce the size of snapshots, as the commit
message said. But that's not needed anymore when we just drop the
unneeded column families and have a smaller snapshot to begin with
marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this pull request Mar 15, 2024
This was added in near#9090
to provide a way to reduce the size of snapshots, as the commit
message said. But that's not needed anymore when we just drop the
unneeded column families and have a smaller snapshot to begin with
marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this pull request Mar 15, 2024
This was added in near#9090
to provide a way to reduce the size of snapshots, as the commit
message said. But that's not needed anymore when we just drop the
unneeded column families and have a smaller snapshot to begin with
marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this pull request Mar 15, 2024
This was added in near#9090
to provide a way to reduce the size of snapshots, as the commit
message said. But that's not needed anymore when we just drop the
unneeded column families and have a smaller snapshot to begin with
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.