Stateless Validation launch doomsday scenarios #11656

pugachAG · 2024-06-24T13:32:25Z

This is a top-level issue to track potential failure scenarios with Stateless Validation launch. This also includes Congestion Control since it is included in the same release.

Congestion Control

cc @wacban

Significantly reduced transaction processing throughput due to unexpected behaviour of Congestion Control. The extreme case of this is a deadlock which means no progress in processing transactions at all.
Large amount of delayed receipts when Congestion Control is bootstrapped. This could cause slow chunk application since we need to iterate over all delayed receipts.
Inconsistent behaviour on different nodes caused by some bug in the implementation.

Stateless Validation

Memtrie

A bug in memtie implementation resulting in an invalid state.
Inconsistent behaviour of memtrie on different node resulting in chunks not being approved.

State Sync

Nodes tracking shard in the next epoch fail to load its state in time.

Chunk Endorsements

Missing chunks due to large state witness size.

This PR introduces a way to generate arbitrary large chunk state witness as a result of processing a single receipt. This is needed to test what happens with the blockchain if we missed something in the witness size checks. Part of #11656. Large witness generation is triggered by a function call action with a method name `internal_record_storage_garbage_<GARBAGE_SIZE>` with `<GARBAGE_SIZE>` corresponding to a number of megabytes of random data to be added to the storage proof. For example `internal_record_storage_garbage_20` would result in 20MB of garbage data in the witness. Having size as part of the method name (and not as an argument) makes it easier to send a transaction using near cli (`args` needs to be base64 encoded). Note that we don't need to deploy any contracts, failed receipt will still result in garbage added to the witness. This makes it very easy to use, just sending a transaction with a single function call action is enough. The functionality is enabled only when neard is complied with `test_features` enabled. Alternative approaches considered: * Adding a new feature that disables witness size checks. Then we can deploy a contract that read/writes a lot of storage data to make the storage proof large. We decided not to proceed with this approach since this is considerably harder to use and also at some point we will still hit compute/gas limits. Another benefit of the chosen approach is that it doesn't increase apply chunk latency, so it allows us to test large witness in isolation from slow chunk application. * #11687. The resulting behaviour diverges significantly from what we would expect to happen in the real world when large witness is a result of applying some chunk. For example when testing doomsday scenario we want the next chunk producer to try distributing the same witness after the chunk is missed. This PR also covers the underlying use case #11184. Large part of this PR is refactoring around runtime tests in chain to make it easier to reuse the existing test setup code.

Part of #11656

This PR introduces a nayduck test with large chunk state witness, see test description for more details. Remote run: https://nayduck.nearone.org/#/run/179 Part of #11656.

pugachAG added the A-stateless-validation Area: stateless validation label Jun 24, 2024

pugachAG self-assigned this Jun 24, 2024

pugachAG changed the title ~~Stateless validation launch doomsday scenarios~~ Stateless Validation launch doomsday scenarios Jun 24, 2024

walnut-the-cat mentioned this issue Jun 25, 2024

[ProjectTracking]: Stateless validation MainNet launch prep near/near-one-project-tracking#72

Open

27 tasks

github-actions bot mentioned this issue Jul 1, 2024

Monthly issue metrics report #11690

Open

pugachAG mentioned this issue Jul 3, 2024

feat: generate large state witness #11703

Merged

pugachAG mentioned this issue Jul 8, 2024

feat: add nayduck test for distributing large state witness #11731

Merged

pugachAG mentioned this issue Jul 9, 2024

feat: add release with test_features enabled #11747

Merged

github-merge-queue bot pushed a commit that referenced this issue Jul 10, 2024

feat: add release with test_features enabled (#11747)

0b56ce0

Part of #11656

pugachAG mentioned this issue Jul 11, 2024

feat: add script to generate steady stream large chunk state witnesses #11771

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateless Validation launch doomsday scenarios #11656

Stateless Validation launch doomsday scenarios #11656

pugachAG commented Jun 24, 2024

Stateless Validation launch doomsday scenarios #11656

Stateless Validation launch doomsday scenarios #11656

Comments

pugachAG commented Jun 24, 2024

Congestion Control

Stateless Validation

Memtrie

State Sync

Chunk Endorsements