Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to artificially increase the size of ChunkStateWitness #11687

Closed
wants to merge 3 commits into from

Conversation

jancionear
Copy link
Contributor

This PR adds a new feature to neard: artificial_witness_size.
When enabled, the node will read artificial_witness_size_to_add from config.json and add this many bytes of data to all produced witnesses. Data is added by injecting a dummy transaction with random contract code of desired size. The injected transaction is automatically removed before witness validation.

artificial_witness_size_to_add is a MutableConfigValue so it can be adjusted on the fly without restarting the node.

The feature has to be enabled on all nodes in the network because normal chunk validators will reject witnesses with the injected transaction when the feature isn't enabled.

Doing it by injecting transactions doesn't change the serialization format, so it's compatible with existing networks.

The extra transaction is only added when artificial_witness_size_to_add is larger than zero, so it's safe to go back to using binaries where this feature is disabled by setting artificial_witness_size_to_add to zero and then disabling the feature.

We can use it to test how witness distribution behaves at larger sizes and to simulate doomsday scenarios.

I manually tested that the feature works on localnet by setting up a 4 node/6 shard localnet and forcing it to send 100 MB witnesses. The network struggled, but it went forward at 0.1 bps.

I increased the number of buckets for witness size because the current ones go only up to 28 MB. After the change they go up to 120 MB.

Refs: #11184

The current ones go only up to 28 MB. After the change they go up to 120 MB.
We don't want big witnesses to be rejected because of the default limits
@jancionear jancionear added the A-stateless-validation Area: stateless validation label Jun 28, 2024
@jancionear jancionear requested a review from a team as a code owner June 28, 2024 11:36
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 80.76923% with 5 lines in your changes missing coverage. Please review.

Project coverage is 71.73%. Comparing base (89fad46) to head (d5cb7d9).

Files Patch % Lines
chain/client/src/client.rs 0.00% 4 Missing ⚠️
nearcore/src/dyn_config.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11687      +/-   ##
==========================================
- Coverage   71.74%   71.73%   -0.01%     
==========================================
  Files         790      790              
  Lines      161838   161862      +24     
  Branches   161838   161862      +24     
==========================================
+ Hits       116106   116111       +5     
- Misses      40689    40711      +22     
+ Partials     5043     5040       -3     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.35% <3.84%> (+<0.01%) ⬆️
integration-tests 37.80% <80.76%> (-0.01%) ⬇️
linux 69.13% <34.61%> (-0.03%) ⬇️
linux-nightly 71.23% <80.76%> (-0.01%) ⬇️
macos 51.39% <34.61%> (-1.22%) ⬇️
pytests 1.58% <3.84%> (+<0.01%) ⬆️
sanity-checks 1.38% <3.84%> (+<0.01%) ⬆️
unittests 66.33% <80.76%> (+<0.01%) ⬆️
upgradability 0.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems alright, though to me it seems like appending a field, or perhaps otherwise making the witness serialization invalid from the first principles while this feature is enabled would be a safer bet. But I'm not quite sure why it is necessary to have this be compatible with existing networks...

(Making it invalid to deserialize normally might be just length * 1024 || random_data || actual_witness_data or something)

@@ -269,6 +269,9 @@ impl Client {
// wait for validation to finish.
self.send_state_witness_ack(&witness, &signer);

#[cfg(feature = "artificial_witness_size")]
Copy link
Collaborator

@nagisa nagisa Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if a node with this compile feature sent a witness to a node without? Would the node reject the witness entirely, or would it try to operate on the now-present artificially generated entries and somehow fail at a later date (due to e.g. the fact that added_artificial_witness_size not a real account.)

From what I can tell, these artificial witnesses are structurally valid, they just use well-known values in certain fields to differentiate them from the regular data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if a node with this compile feature sent a witness to a node without? Would the node reject the witness entirely, or would it try to operate on the now-present artificially generated entries and somehow fail at a later date (due to e.g. the fact that added_artificial_witness_size not a real account.)

I believe that witness validation would fail - apply_chunk expects that all transactions are valid and will fail if one of them isn't, so witness validation will fail.

use std::str::FromStr;

let size_as_usize: usize = size_to_add.as_u64().try_into().expect("Can't convert u64 to usize");
let mut random_data = vec![0; size_as_usize];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can potentially add more than exactly requested number of bytes due to encoding overhead changing. Is that okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can potentially add more than exactly requested number of bytes due to encoding overhead changing. Is that okay?

It's fine we're interested about the difference between 10 MB and 15 MB, a kilobyte of difference doesn't matter that much.

Copy link
Contributor

@pugachAG pugachAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing this with @jancionear we agreed to proceed with a slightly different approach that covers the underlying use case for this PR as well as allows us to test blockchain recovery with large witness after increasing block production time.

@jancionear
Copy link
Contributor Author

This seems alright, though to me it seems like appending a field, or perhaps otherwise making the witness serialization invalid from the first principles while this feature is enabled would be a safer bet. But I'm not quite sure why it is necessary to have this be compatible with existing networks...

(Making it invalid to deserialize normally might be just length * 1024 || random_data || actual_witness_data or something)

Setting up a new network is a lot of work, so I wanted to make it compatible with existing networks (forknet-20, forknet-100, statelessnet). Changing the serialization format would require us to restart all of the nodes in a network at once, which would be distruptive.

I'm thinking about running a witness size test on statelessnet, as it's the most "real" and diverse test environment we have, and it wouldn't be possible to do a coordinated restart there.

@jancionear
Copy link
Contributor Author

Closing in favor of #11703

@jancionear jancionear closed this Jul 3, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 5, 2024
This PR introduces a way to generate arbitrary large chunk state witness
as a result of processing a single receipt. This is needed to test what
happens with the blockchain if we missed something in the witness size
checks.
Part of #11656.

Large witness generation is triggered by a function call action with a
method name `internal_record_storage_garbage_<GARBAGE_SIZE>` with
`<GARBAGE_SIZE>` corresponding to a number of megabytes of random data
to be added to the storage proof. For example
`internal_record_storage_garbage_20` would result in 20MB of garbage
data in the witness. Having size as part of the method name (and not as
an argument) makes it easier to send a transaction using near cli
(`args` needs to be base64 encoded). Note that we don't need to deploy
any contracts, failed receipt will still result in garbage added to the
witness. This makes it very easy to use, just sending a transaction with
a single function call action is enough. The functionality is enabled
only when neard is complied with `test_features` enabled.

Alternative approaches considered:
* Adding a new feature that disables witness size checks. Then we can
deploy a contract that read/writes a lot of storage data to make the
storage proof large. We decided not to proceed with this approach since
this is considerably harder to use and also at some point we will still
hit compute/gas limits. Another benefit of the chosen approach is that
it doesn't increase apply chunk latency, so it allows us to test large
witness in isolation from slow chunk application.
* #11687. The resulting behaviour diverges significantly from what we
would expect to happen in the real world when large witness is a result
of applying some chunk. For example when testing doomsday scenario we
want the next chunk producer to try distributing the same witness after
the chunk is missed. This PR also covers the underlying use case #11184.

Large part of this PR is refactoring around runtime tests in chain to
make it easier to reuse the existing test setup code.
pugachAG added a commit that referenced this pull request Jul 9, 2024
This PR introduces a way to generate arbitrary large chunk state witness
as a result of processing a single receipt. This is needed to test what
happens with the blockchain if we missed something in the witness size
checks.
Part of #11656.

Large witness generation is triggered by a function call action with a
method name `internal_record_storage_garbage_<GARBAGE_SIZE>` with
`<GARBAGE_SIZE>` corresponding to a number of megabytes of random data
to be added to the storage proof. For example
`internal_record_storage_garbage_20` would result in 20MB of garbage
data in the witness. Having size as part of the method name (and not as
an argument) makes it easier to send a transaction using near cli
(`args` needs to be base64 encoded). Note that we don't need to deploy
any contracts, failed receipt will still result in garbage added to the
witness. This makes it very easy to use, just sending a transaction with
a single function call action is enough. The functionality is enabled
only when neard is complied with `test_features` enabled.

Alternative approaches considered:
* Adding a new feature that disables witness size checks. Then we can
deploy a contract that read/writes a lot of storage data to make the
storage proof large. We decided not to proceed with this approach since
this is considerably harder to use and also at some point we will still
hit compute/gas limits. Another benefit of the chosen approach is that
it doesn't increase apply chunk latency, so it allows us to test large
witness in isolation from slow chunk application.
* #11687. The resulting behaviour diverges significantly from what we
would expect to happen in the real world when large witness is a result
of applying some chunk. For example when testing doomsday scenario we
want the next chunk producer to try distributing the same witness after
the chunk is missed. This PR also covers the underlying use case #11184.

Large part of this PR is refactoring around runtime tests in chain to
make it easier to reuse the existing test setup code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants