Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(state-viewer): print hash and memory usage of tree nodes #8919

Merged
merged 74 commits into from
Apr 25, 2023

Conversation

nikurt
Copy link
Contributor

@nikurt nikurt commented Apr 18, 2023

  • makes neard view-chain --format full print hash and node's memory usage.
  • adds a print sub-sub-sub-command to print the sub-tree contained in a state part: ./neard view-state state-parts --root-dir /tmp --shard-id 0 apply --part-id 42 --action print current
  • fixes neard view-state-part dump to print state record in a human readable way.

Also adds an action 'print' to `neard view-state state-part apply` to print the sub-tree contained in a state part.

Also fixes `neard view-state-part dump` to print state record in a human readable way.
@nikurt nikurt requested a review from a team as a code owner April 18, 2023 10:29
@nikurt nikurt requested a review from akhi3030 April 18, 2023 10:29
@nikurt
Copy link
Contributor Author

nikurt commented Apr 18, 2023

cc: @Longarithm

@akhi3030
Copy link
Collaborator

assigning @Longarithm as the reviewer.

@nikurt nikurt changed the title fix(state-viewer): print tree shows hash and memory usage fix(state-viewer): print hash and memory usage of tree nodes Apr 24, 2023
khorolets and others added 20 commits April 25, 2023 14:49
My previous PR #8762 was reverted by #8780 because release automation was broken.

Looks like the cause of broken automation was a crate `near-test-contract`. We had a discussion about this package and the reasons to publish it in the first place in the very first PR. Now we have proof that a dependency that mentioned in the `dev-dependencies` section of `Cargo.toml` is not required to be published to crates.io.

Once figured out I created a PR #8779 but it was unnoticed and instead all the changes were reverted in #8780.

As agreed with @jakmeier I am creating the same changes but disabled `near-test-contract` for publishing necessary crates in the future.
Replace Children::encode_into and decode methods by implementations of
BorshSerializable and BorshDeserializable traits and with that change
RawTrieNode into an enum for which borsh implementations can be
derived.  While the first step is quite straightforward, the second
requires RawTrieNode::Branch variant being split into two:
BranchNoValue and BranchWithValue.

The main benefit of this is less code (the commit has delta of -68
lines) and API consistent with other serialised objects.  However,
this doesn’t come without downsides.

Firstly, having separate BranchNoValue and BranchWithValue branches
means that it’s now a bit more awkward to handle branch as a single
case with common code paths having to be refactored.

Secondly, one might imagine in the future a new representation for
nodes which embed value.  For example {4u8}{key_len:u32}{key}{value}
might be added for a leaf.  This would essentially mean that value
comprises of the rest of encoded bytes.  The issue for borsh is that
it would like to know length of {value} before it reads it.  Either
redundant {value_len:u32} would need to be included or some hacks
employed (namely not using borsh for RawTrieNodeWithSize and then
consuming the rest of the reader when decoding RawTrieNode).

Lastly, I’m not certain about performance implications.  Borsh might
actually be slower than hand-implemented decoder which operates on
a slice.  If this is actually the case I haven’t measured.
As it fails with:
```
thread 'tests::test_contract_precompilation' panicked at 'Compilation result should be non-empty', runtime/runtime/src/lib.rs:2550:14
```
presumably, because the compilation cache is not populated on aarch64-apple.

A prior art: #8412
This PR re-introduces changes from #8805 together with #8892 after fixing #8908:

The PR addresses #8265 by introducing aggregation of compute usage across all operations performed during chunk application and limiting this compute usage to 1s.

This should not change the behavior of nodes in the short run because compute costs match gas costs which is validated by the assert, so any discrepancy should be caught on canary nodes.
This closes up the near-vm work. I believe #8886 (comment) is the only remaining point to be checked in former PRs, in addition to all the code added in this PR.

Note that there are still things in #8323 that are not in this PR, because either I noticed they actually should not be merged (eg. the wasmparser bump would be a protocol upgrade, so will wait until we get limited replayability), or because they are actually orthogonal to this landing.
Part of #8827.

Adds `flat_storage_creation_enabled` config flag to enable/disable flat storage creation. The default value is `true`.

This is needed for SREs to be able to manually enable/disable flat storage creation when taking state backup.
…mmaries for PRs (#8926)

### Purpose

To reduce the effort involved in writing a pull request description. [Here](https://githubnext.com/projects/copilot-for-pull-requests) you can find more details about this feature.

Below is a copy of the GiHub PRs Copilot tutorial:

### Get started

​When you are ready to create a new pull request you can use Copilot to suggest parts of the pull request description. To do this, simply include the relevant markers in your text and they will be expanded for you automatically when you save the description. You can edit your description as much as you like, and whatever markers are present when you save it will be expanded automatically for you. Watch for the 👀 👀 👀 reaction from Copilot to show that its working!

### Markers

`copilot:all` all the content, in one go.
`copilot:summary` a one-paragraph summary of the changes in the pull request.
`copilot:walkthrough` a detailed list of changes, including links to the relevant pieces of code.
`copilot:poem` a poem about the changes in the pull request.

### Example

Here's an example for a pull request titled "Add edges for type assignability". Original description provided by author:

```
copilot:summary
​
We need this information to provide as input to the analysis. The compiler already knows it so it makes sense to extract it here.
```

​ Expanded description with Copilot's suggestions:

> 🤖 Generated by Copilot
>
> This pull request adds semantic analysis to the feature extraction plugin, which enhances the feature graph with information about types and assignability of expressions in Java code. It modifies the pom.xml file, the FeaturePlugin.java file, the DotOutput.java file, the FeatureGraph.java file, the FeatureGraphChecks.java file, and the graph.proto file to support the new feature. It also adds three new classes: AssignabilityAnalysis.java, TypeAnalysis.java, and TypeScanner.java, which implement the semantic analysis logic. Finally, it adds two new test classes: AssignabilityAnalysisTest.java and TypeScannerTest.java, which verify the correctness of the semantic analysis and the feature graph construction. ​ We need this information to provide as input to the analysis. The compiler already knows it so it makes sense to extract it here.

### Errors and omissions from the AI

The outputs might contain errors and omissions. Once they've been generated you can modify or remove them by editing the description as normal. We include an attribution line to highlight Copilot's suggestions. This can be removed if you wish.

### Rolling it out

​We include Copilot markers in our pull request templates so that every pull request includes some standard content. Here's an example:

```
### WHAT
copilot:summary
​
copilot:poem

### WHY
<!-- author to complete -->

### HOW
copilot:walkthrough
```

Check out the GitHub guidance [on creating your own pull request template](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository).
Avoided clippy warnings that want me to rename enum values or change function signatures.
The clippy effort is tracked in #8145

Can't say that any of the findings were critical, but the following warnings in my personal opinion make the code most readable:
* Prefer `is_empty()`
* Derive `Default` for enums
* Use short-hand field initializes: `MyStruct { x }` instead of `MyStruct { x: x }`
* Prefer `sort_by_key()`
* Prefer `!x` to `x == false`
Borsh can now read from a `Read`er and write to a `Writer` so use it
for handling database state dump files.  Negative delta FTW.
We have two types of errors in the codebase resulting in `clippy::derivable_impls` warning:
1. Enums implement `default()` explicitly when it can be derived. This seems to be an unambiguous readability improvement. Removing an explicit function definition is good. `#[default]` is clear enough.
2. Structs implement `default()` to initialize all fields to default values of corresponding types. This seems an improvement too. @jakmeier please check whether it's a readability for `QemuCommandBuilder`.

The PR was generated by running this and adjusting the result manually
```sh
cargo clippy --fix -- -A clippy::all -W clippy::derivable_impls
```

This is part of #8145
Adds a message to the changelog of the next release about compute costs protocol change.
This patch includes the final code parts of the meta-tx gas cost
estimations. It was delayed because the results were too flaky for
continuous estimations.

Specifically, this adds the estimations
1. `ActionDelegate` to estimate send + send + exec all together
2. `ActionDelegateSendNotSir`, `ActionDelegateSendSir`,
`ActionDelegateExec` to estimate each step separately 

Making these estimations stable is tricky due to the limit that only
one delegate action can be in a receipt. The check is only done in the
`send` step, so I was able to cheat my way around it for
`ActionDelegateExec`.
But I didn't find a way for the send cost estimations. 
Hence in that case we measure the cost of verifying a single action.

This results in high variation within the sample measurements, which
makes the average a poor statistic to look at. So instead, I report the
75th percentile of at least 12 measurements. The 75th percentile
was chosen over the media (50th percentile) to ensure we pick at least
a somewhat pessimistic value of the distribution.

Repeated estimations will still result in flaky results, so it's far from a 
perfect solution. But at least it removes outliers on both extremes.

In the current state, the `icount` estimations seem to be very stable,
which is what we need for continuous estimation.

For `time` based estimations, the `ActionDelegate` sometimes gets a
`HIGH-VARIANCE` or `SUBTRACTION-UNDERFLOW` uncertainty
marker. This is a common problem shared with other actions and I
don't intend to solve it in this PR. But the other other three estimations
should be stable also in `time` measurements.
…thEpochManagerAdapter. (#8917)

As a reminder of the overall goal, we want to split RuntimeWithEpochManagerAdapter, so that any code that needs the runtime will use an Arc<RuntimeAdapter>, and any code that uses the epoch manager will use the EpochManagerHandle.

This PR splits usages in state-viewer, so that it no longer uses RuntimeWithEpochManagerAdapter. It will use EpochManagerHandle wherever possible, but when needed, we fall back to EpochManagerAdapter for code that still needs to convert from RuntimeWithEpochManagerAdapter (since that cannot in general provide an EpochManagerHandle). When EpochManagerAdapter is no longer created from RuntimeWithEpochManagerAdapter, we can remove EpochManagerAdapter and always use EpochManagerHandle, a concrete type.
Before this change, the mock node keeps handles to the ClientActor
and ViewClientActor, and mocks the role of the PeerManagerActor. Here
we change things so that it instead just listens on a localhost socket
and sends messages to a node that has been started normally with
nearcore::start_with_config(). This has the advantage of allowing us
to in theory test/benchmark the whole node's code via only its public
API.
These are not currently supported and any decision to enable them should
be a intentional decision, not an accidental one.

Overall these changes are no-op since the deserialization will fail
earlier when trying to deserialize with pwasm deserializer, but if and
when we eventually get rid of it, it would be nice if these extensions
didn't accidentally get enabled.

This puts to rest the question raised in #8886 (comment).
Adding README documentation for setting up mirror tests from #8162, https://github.com/near/near-ops/pull/972.
Co-authored-by: nikurt <86772482+nikurt@users.noreply.github.com>
Count the number chunks produced and expected to be produced so far for each individual shard.
When one chunk producer is assigned to multiple shards, we can see the production / expected for each of the shard.

Tested on an RPC node in mainnet.
Paste from the metrics tab of a node in mainnet. [link](https://gist.github.com/VanBarbascu/26dc011291df7f128d609c8457d57469)
nikurt and others added 18 commits April 25, 2023 15:43
Also adds an action 'print' to `neard view-state state-part apply` to print the sub-tree contained in a state part.

Also fixes `neard view-state-part dump` to print state record in a human readable way.
Also adds a unit test for state dump and an integration test for state dump and state sync from that dump
Replace Children::encode_into and decode methods by implementations of
BorshSerializable and BorshDeserializable traits and with that change
RawTrieNode into an enum for which borsh implementations can be
derived.  While the first step is quite straightforward, the second
requires RawTrieNode::Branch variant being split into two:
BranchNoValue and BranchWithValue.

The main benefit of this is less code (the commit has delta of -68
lines) and API consistent with other serialised objects.  However,
this doesn’t come without downsides.

Firstly, having separate BranchNoValue and BranchWithValue branches
means that it’s now a bit more awkward to handle branch as a single
case with common code paths having to be refactored.

Secondly, one might imagine in the future a new representation for
nodes which embed value.  For example {4u8}{key_len:u32}{key}{value}
might be added for a leaf.  This would essentially mean that value
comprises of the rest of encoded bytes.  The issue for borsh is that
it would like to know length of {value} before it reads it.  Either
redundant {value_len:u32} would need to be included or some hacks
employed (namely not using borsh for RawTrieNodeWithSize and then
consuming the rest of the reader when decoding RawTrieNode).

Lastly, I’m not certain about performance implications.  Borsh might
actually be slower than hand-implemented decoder which operates on
a slice.  If this is actually the case I haven’t measured.
Avoided clippy warnings that want me to rename enum values or change function signatures.
The clippy effort is tracked in #8145

Can't say that any of the findings were critical, but the following warnings in my personal opinion make the code most readable:
* Prefer `is_empty()`
* Derive `Default` for enums
* Use short-hand field initializes: `MyStruct { x }` instead of `MyStruct { x: x }`
* Prefer `sort_by_key()`
* Prefer `!x` to `x == false`
Also adds a unit test for state dump and an integration test for state dump and state sync from that dump
Also adds an action 'print' to `neard view-state state-part apply` to print the sub-tree contained in a state part.

Also fixes `neard view-state-part dump` to print state record in a human readable way.
tools/state-viewer/src/state_parts.rs Outdated Show resolved Hide resolved
@nikurt nikurt requested a review from Longarithm April 25, 2023 15:01
tools/state-viewer/src/state_parts.rs Show resolved Hide resolved
tools/state-viewer/src/state_parts.rs Outdated Show resolved Hide resolved
tools/state-viewer/src/state_parts.rs Outdated Show resolved Hide resolved
@near-bulldozer near-bulldozer bot merged commit 50470fd into master Apr 25, 2023
@near-bulldozer near-bulldozer bot deleted the nikurt-print-tree branch April 25, 2023 15:35
nikurt added a commit that referenced this pull request Apr 28, 2023
* makes `neard view-chain --format full` print hash and node's memory usage.
* adds a `print` sub-sub-sub-command to print the sub-tree contained in a state part: `./neard view-state state-parts --root-dir /tmp --shard-id 0 apply --part-id 42 --action print current`
* fixes `neard view-state-part dump` to print state record in a human readable way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet