Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless Validation launch doomsday scenarios #11656

Open
pugachAG opened this issue Jun 24, 2024 · 0 comments
Open

Stateless Validation launch doomsday scenarios #11656

pugachAG opened this issue Jun 24, 2024 · 0 comments
Assignees
Labels
A-stateless-validation Area: stateless validation

Comments

@pugachAG
Copy link
Contributor

This is a top-level issue to track potential failure scenarios with Stateless Validation launch. This also includes Congestion Control since it is included in the same release.

Congestion Control

cc @wacban

  • Significantly reduced transaction processing throughput due to unexpected behaviour of Congestion Control. The extreme case of this is a deadlock which means no progress in processing transactions at all.
  • Large amount of delayed receipts when Congestion Control is bootstrapped. This could cause slow chunk application since we need to iterate over all delayed receipts.
  • Inconsistent behaviour on different nodes caused by some bug in the implementation.

Stateless Validation

Memtrie

  • A bug in memtie implementation resulting in an invalid state.
  • Inconsistent behaviour of memtrie on different node resulting in chunks not being approved.

State Sync

  • Nodes tracking shard in the next epoch fail to load its state in time.

Chunk Endorsements

  • Missing chunks due to large state witness size.
@pugachAG pugachAG added the A-stateless-validation Area: stateless validation label Jun 24, 2024
@pugachAG pugachAG self-assigned this Jun 24, 2024
@pugachAG pugachAG changed the title Stateless validation launch doomsday scenarios Stateless Validation launch doomsday scenarios Jun 24, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 5, 2024
This PR introduces a way to generate arbitrary large chunk state witness
as a result of processing a single receipt. This is needed to test what
happens with the blockchain if we missed something in the witness size
checks.
Part of #11656.

Large witness generation is triggered by a function call action with a
method name `internal_record_storage_garbage_<GARBAGE_SIZE>` with
`<GARBAGE_SIZE>` corresponding to a number of megabytes of random data
to be added to the storage proof. For example
`internal_record_storage_garbage_20` would result in 20MB of garbage
data in the witness. Having size as part of the method name (and not as
an argument) makes it easier to send a transaction using near cli
(`args` needs to be base64 encoded). Note that we don't need to deploy
any contracts, failed receipt will still result in garbage added to the
witness. This makes it very easy to use, just sending a transaction with
a single function call action is enough. The functionality is enabled
only when neard is complied with `test_features` enabled.

Alternative approaches considered:
* Adding a new feature that disables witness size checks. Then we can
deploy a contract that read/writes a lot of storage data to make the
storage proof large. We decided not to proceed with this approach since
this is considerably harder to use and also at some point we will still
hit compute/gas limits. Another benefit of the chosen approach is that
it doesn't increase apply chunk latency, so it allows us to test large
witness in isolation from slow chunk application.
* #11687. The resulting behaviour diverges significantly from what we
would expect to happen in the real world when large witness is a result
of applying some chunk. For example when testing doomsday scenario we
want the next chunk producer to try distributing the same witness after
the chunk is missed. This PR also covers the underlying use case #11184.

Large part of this PR is refactoring around runtime tests in chain to
make it easier to reuse the existing test setup code.
pugachAG added a commit that referenced this issue Jul 9, 2024
This PR introduces a way to generate arbitrary large chunk state witness
as a result of processing a single receipt. This is needed to test what
happens with the blockchain if we missed something in the witness size
checks.
Part of #11656.

Large witness generation is triggered by a function call action with a
method name `internal_record_storage_garbage_<GARBAGE_SIZE>` with
`<GARBAGE_SIZE>` corresponding to a number of megabytes of random data
to be added to the storage proof. For example
`internal_record_storage_garbage_20` would result in 20MB of garbage
data in the witness. Having size as part of the method name (and not as
an argument) makes it easier to send a transaction using near cli
(`args` needs to be base64 encoded). Note that we don't need to deploy
any contracts, failed receipt will still result in garbage added to the
witness. This makes it very easy to use, just sending a transaction with
a single function call action is enough. The functionality is enabled
only when neard is complied with `test_features` enabled.

Alternative approaches considered:
* Adding a new feature that disables witness size checks. Then we can
deploy a contract that read/writes a lot of storage data to make the
storage proof large. We decided not to proceed with this approach since
this is considerably harder to use and also at some point we will still
hit compute/gas limits. Another benefit of the chosen approach is that
it doesn't increase apply chunk latency, so it allows us to test large
witness in isolation from slow chunk application.
* #11687. The resulting behaviour diverges significantly from what we
would expect to happen in the real world when large witness is a result
of applying some chunk. For example when testing doomsday scenario we
want the next chunk producer to try distributing the same witness after
the chunk is missed. This PR also covers the underlying use case #11184.

Large part of this PR is refactoring around runtime tests in chain to
make it easier to reuse the existing test setup code.
github-merge-queue bot pushed a commit that referenced this issue Jul 10, 2024
This PR introduces a nayduck test with large chunk state witness, see
test description for more details.
Remote run: https://nayduck.nearone.org/#/run/179
Part of #11656.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

No branches or pull requests

1 participant