[Runtime Epoch Split] (7/n) Split runtime everywhere else #8940

robin-near · 2023-04-21T19:12:54Z

This does the bulk of the remaining split, which is all a bit intertwined. Although there are a ton of changes, the transformations are simple for the most part.

RuntimeWithEpochManagerAdapter was the previous combined trait. There were two implementations of that trait: NightshadeRuntime and KeyValueRuntime.

Now, RuntimeWithEpochManagerAdapter is split into three: EpochManagerAdapter, ShardTracker, and RuntimeAdapter:

EpochManagerAdapter is a direct abstraction of EpochManagerHandle (which is basically a Mutex<EpochManager>). The reason why we cannot use EpochManagerHandle directly instead of a trait is because KeyValueRuntime cannot provide a concrete EpochManager implementation. It has a completely custom validator schedule that is not compatible at all. NightshadeRuntime, on the other hand, contains an embedded EpochManagerHandle.
ShardTracker is just a TrackedConfig plus an Arc<dyn EpochManagerAdapter>. It doesn't belong to EpochManagerAdapter because the TrackedConfig is not inherent to the protocol. ShardTracker is responsible for one thing: calculating whether this node itself cares or will care about a specific shard. For querying whether some other validator is supposed to track a shard, use the EpochManagerAdapter, not ShardTracker.
RuntimeAdapter is an abstraction of everything that has to do with the runtime and account storage, such as get_trie_for_shard. It does bleed a little bit into the EpochManagerAdapter responsibilities, more on that later.

How does this split work for the two implementations: NightshadeRuntime and KeyValueRuntime?

NightshadeRuntime is split into EpochManagerHandle, ShardTracker, and NightshadeRuntime. So essentially, EpochManagerHandle is moved out (well, conceptually, but it still keeps a reference to it because some runtime methods depend on it - eventually this can be moved out as well and have the EpochManagerAdapter be passed in), ShardTracker is moved out, and NightshadeRuntime itself now only implements RuntimeAdapter.
- When constructing the NightshadeRuntime, we must first construct the EpochManagerHandle. This is because we cannot have the NightshadeRuntime internally construct a different EpochManagerHandle, or else the two EpochManagers will not synchronize state, and things fail.
KeyValueRuntime is split into KeyValueEpochManager, and KeyValueRuntime. There's no ShardTracker specific to KeyValueXXX anymore (because it's a concrete type to begin with). KeyValueRuntime takes as a constructor argument KeyValueEpochManager; but here it's really just to help initialize some fields (num_shards, epoch_length, etc.).

Now, we talk about how users of this split are modified accordingly:

Wherever a struct used to keep a field runtime_adapter: Arc<dyn RuntimeWithEpochManagerAdapter>, it is now split into three fields, epoch_manager: Arc<dyn EpochManagerAdapter>, shard_tracker: ShardTracker, and runtime: Arc<dyn RuntimeAdapter>, but we can omit some of them if they are not actually used. Usages of the old field are individually changed to use only one of the new fields.
- Note that queries to self.runtime_adapter.cares_about_shard(..., ..., ..., is_me: true) is changed to self.shard_tracker.care_about_shard(..., ..., ..., true) whereas self.runtime_adapter.cares_about_shard(..., ..., ..., is_me: false) is changed to self.epoch_manager.cares_about_shard_from_prev_block, because the latter does not actually need the ShardTracker.
- Wherever we used to convert a runtime_adapter into one of the subtraits, we now must construct the subtrait directly:
  - any Arc<dyn RuntimeWithEpochManagerAdapter> argument that was passed in is split into up to three arguments depending on which are actually used.
  - any direct construction of NightshadeRuntime is easily converted to separate constructions of EpochManager, NightshadeRuntime, and ShardTracker.
  - any direct construction of KeyValueRuntime is easily converted to separate constructions of KeyValueEpochManager, KeyValueRuntime, and ShardTracker.
The TestEnv is one big part of this refactoring. The TestEnvBuilder used to take a .runtime_adapters(Vec<Arc<dyn RuntimeWithEpochManagerAdapter>>), but now we don't have the RuntimeWithEpochManagerAdapters anymore. We could split this into three, of course, but this would present a pretty bad inconvenience to any tests that call helpers like create_nightshade_runtimes(...). So, the logic is now the following:
- There are now .stores(...), .epoch_managers(...), .shard_trackers(...), and .runtimes(...) overrides available. These must be overridden in an appropriate partial order (epoch_managers > stores, shard_trackers > epoch_managers, runtimes > epoch_managers, and also, stores > clients, epoch_managers > validators).
- Any override that is not explicitly provided will be constructed with a default (where N is the number of clients):
  - stores will be default-constructed as N elements of create_test_store().
  - epoch_managers will be default-constructed as N elements of KeyValueEpochManager using the stores we have (this is why the stores must be finalized, either default-constructed or overridden, first, thus the partial order)
  - shard_trackers will be default-constructed as N elements of ShardTracker using the epoch_managers we have, with empty tracking config.
  - runtimes will be default-constructed as N elements of KeyValueRuntime using the stores and epoch_managers we have.
- In addition, there are helpers that allow overriding with something other than the default:
  - .track_all_shards(): constructs shard_trackers that use the epoch_managers we have but with AllShards tracking
  - .real_epoch_managers(): constructs epoch_managers as N Arc<EpochManagerHandle>s instead of KeyValueEpochManager
  - .nightshade_runtimes() (this is an extension method that lives in integration-tests, to avoid a dependency cycle): constructs runtimes as N Arc<NightshadeRuntime>s using the stores and epoch_managers we have.
- As a result, most of the constructions in integration-tests have been converted to using only these helper functions, as opposed to constructing runtimes directly in the tests. Some tests outside of integration-tests still construct directly, which is OK as they used to do that as well; we just now have to make sure to override stores, epoch_managers, and runtimes together so they are consistent.

mzhangmzz

Overall the PR looks fine, since it's replacing runtime_adapter with epoch_manager and fixing tests. You don't need to split it more.

mzhangmzz · 2023-04-28T17:25:45Z

chain/chain/src/chain.rs

-    pub runtime_adapter: Arc<dyn RuntimeWithEpochManagerAdapter>,
+    pub epoch_manager: Arc<dyn EpochManagerAdapter>,
+    pub shard_tracker: ShardTracker,
+    pub runtime: Arc<dyn RuntimeAdapter>,


Could you rename this field back to runtime_adapter? I think that will decrease the number of changed lines in this PR and I don't see a concrete reason for the rename

mzhangmzz · 2023-04-28T19:01:16Z

chain/chain/src/test_utils.rs

 }

 pub fn setup_with_validators(
    vs: ValidatorSchedule,
    epoch_length: u64,
    tx_validity_period: NumBlocks,
-) -> (Chain, Arc<KeyValueRuntime>, Vec<Arc<InMemoryValidatorSigner>>) {
+) -> (Chain, Arc<KeyValueEpochManager>, Arc<KeyValueRuntime>, Vec<Arc<InMemoryValidatorSigner>>) {


Maybe MockEpochManager/SimpleEpochManager is a better name? It doesn't always need to be used together with KeyValueRuntime.

robin-near · 2023-05-01T19:56:51Z

Some Nayduck tests are failing. I'll look into why. https://nayduck.near.org/#/test/455339

robin-near · 2023-05-04T16:58:25Z

Latest run is Nayduck-neutral.

Longarithm · 2023-05-04T17:25:26Z

I didn't look into code, but I have questions based on PR description:

Who calls ShardTracker::care_about_shard with is_me = false? Looks like no one, can we omit this parameter then?
Getting rid of create_nightshade_runtimes and other helpers is awesome. Can we add partial order of epoch_managers, stores, shard_trackers, ... to the comments over TestEnvBuilder to make it obvious for future test writers?

Follow-up question: after this it should be easier to test protocol upgrades because we can use mock epoch managers, right?

robin-near · 2023-05-04T17:44:12Z

@Longarithm thanks for taking a look!

There's one place that uses is_me = false, but also with account_id = None. This... actually becomes equivalent to is_me = true and account_id = None. (ugh). I didn't bother changing this here, but I do plan to do another simple refactoring that changes:
- ShardTracker::care_about_shard --> ShardTracker::care_about_shard_myself
- EpochManagerAdapter::cares_about_shard_from_prev_block --> EpochManagerAdapter::validator_tracks_shard
that way the naming is a lot clearer.
Sure. Let me add that in a separate PR, I've had enough of this sanity check nonsense lol

We should NOT use MockEpochManager. It's just soooo incorrect. Same with KeyValueRuntime. Please avoid them. The real EpochManager is actually pretty easy to use because it doesn't have many dependencies and only needs a few store columns, and we can still make it select a predictable set of block producers and chunk-only producers. Let me know if I can help there!

Longarithm · 2023-05-04T17:52:50Z

We should NOT use MockEpochManager. It's just soooo incorrect. Same with KeyValueRuntime. Please avoid them.

Ow, it wasn't obvious from description. Thank you!
In protocol upgrades tests I want to trigger protocol upgrade when I want instead of waiting for some magical number of blocks, but it is a separate issue then.

robin-near · 2023-05-04T17:56:56Z

In protocol upgrades tests I want to trigger protocol upgrade when I want instead of waiting for some magical number of blocks

I see, you can change the epoch height to be say 5, and then it shouldn't take very long. But in general forcing the protocol to do something that it isn't supposed to do will just break a hundred other things.

robin-near added 8 commits April 25, 2023 21:26

[Runtime Epoch Split] (7/n) More splitting around Chain.

70abe87

[Runtime Epoch Split] (8/n) Split Chain's runtime_adapter field.

aba3677

Fix undo-tool

33bca45

Fix VM tests which is unstable under cargo fix.

f97314f

Split runtime in Client and affected code.

cd23a8a

Split out KeyValueEpochManager from KeyValueRuntime.

8f848ed

Make NightshadeRuntime no longer implement EpochManager

8e70205

Fix after rebase.

b5d36d5

robin-near force-pushed the epoch9 branch from 0662a4d to b5d36d5 Compare April 26, 2023 21:46

Minor fix.

c16f459

robin-near changed the title ~~[Runtime Epoch Split] (7/n) Split runtime in Chain and Client.~~ [Runtime Epoch Split] (7/n) Split runtime everywhere else Apr 27, 2023

mzhangmzz reviewed Apr 28, 2023

View reviewed changes

robin-near added 3 commits May 1, 2023 09:38

Rename runtime back to runtime_adapter for client and chain.

e1fa6a6

Rename KeyValueEpochManager to MockEpochManager

4888419

Merge remote-tracking branch 'origin/master' into epoch9

5810f5d

robin-near marked this pull request as ready for review May 1, 2023 17:08

robin-near requested a review from a team as a code owner May 1, 2023 17:08

robin-near requested a review from mzhangmzz May 1, 2023 17:08

robin-near added 3 commits May 1, 2023 10:32

Fix compilation failures behind features.

9ee616a

Fix formatting.

c9930a9

Lol. More formatting correction.

73e6c76

robin-near and others added 7 commits May 3, 2023 13:49

Fix setup difference introduced by KeyValueRuntime splitting.

cea21cc

Merge remote-tracking branch 'origin/master' into epoch9

7ad64c2

Fix sanity check.

b2f2e14

Fix another small bug after rebasing.

8a34f87

Fix more sanity checks. When does this ever end? :(

39300b6

Why doesn't clippy just emit all the errors at once?

2ad72bb

Merge branch 'master' into epoch9

4afd10e

mzhangmzz approved these changes May 8, 2023

View reviewed changes

robin-near and others added 4 commits May 9, 2023 09:03

Merge branch 'master' into epoch9

86cb343

Correct new issue.

ad6b6eb

Merge remote-tracking branch 'origin/master' into epoch9

7116e99

Merge branch 'master' into epoch9

66796cf

robin-near added the S-automerge label May 11, 2023

near-bulldozer bot merged commit e32db71 into near:master May 11, 2023
3 checks passed

robin-near linked an issue Jul 19, 2023 that may be closed by this pull request

Separate EpochManager from Runtime #8515

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Runtime Epoch Split] (7/n) Split runtime everywhere else #8940

[Runtime Epoch Split] (7/n) Split runtime everywhere else #8940

robin-near commented Apr 21, 2023 •

edited

mzhangmzz left a comment

mzhangmzz Apr 28, 2023

mzhangmzz Apr 28, 2023

robin-near commented May 1, 2023

robin-near commented May 4, 2023

Longarithm commented May 4, 2023 •

edited

robin-near commented May 4, 2023

Longarithm commented May 4, 2023

robin-near commented May 4, 2023 •

edited

[Runtime Epoch Split] (7/n) Split runtime everywhere else #8940

[Runtime Epoch Split] (7/n) Split runtime everywhere else #8940

Conversation

robin-near commented Apr 21, 2023 • edited

mzhangmzz left a comment

Choose a reason for hiding this comment

mzhangmzz Apr 28, 2023

Choose a reason for hiding this comment

mzhangmzz Apr 28, 2023

Choose a reason for hiding this comment

robin-near commented May 1, 2023

robin-near commented May 4, 2023

Longarithm commented May 4, 2023 • edited

robin-near commented May 4, 2023

Longarithm commented May 4, 2023

robin-near commented May 4, 2023 • edited

robin-near commented Apr 21, 2023 •

edited

Longarithm commented May 4, 2023 •

edited

robin-near commented May 4, 2023 •

edited