Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime Epoch Split] (7/n) Split runtime everywhere else #8940

Merged
merged 26 commits into from
May 11, 2023

Conversation

robin-near
Copy link
Contributor

@robin-near robin-near commented Apr 21, 2023

This does the bulk of the remaining split, which is all a bit intertwined. Although there are a ton of changes, the transformations are simple for the most part.

RuntimeWithEpochManagerAdapter was the previous combined trait. There were two implementations of that trait: NightshadeRuntime and KeyValueRuntime.

Now, RuntimeWithEpochManagerAdapter is split into three: EpochManagerAdapter, ShardTracker, and RuntimeAdapter:

  • EpochManagerAdapter is a direct abstraction of EpochManagerHandle (which is basically a Mutex<EpochManager>). The reason why we cannot use EpochManagerHandle directly instead of a trait is because KeyValueRuntime cannot provide a concrete EpochManager implementation. It has a completely custom validator schedule that is not compatible at all. NightshadeRuntime, on the other hand, contains an embedded EpochManagerHandle.
  • ShardTracker is just a TrackedConfig plus an Arc<dyn EpochManagerAdapter>. It doesn't belong to EpochManagerAdapter because the TrackedConfig is not inherent to the protocol. ShardTracker is responsible for one thing: calculating whether this node itself cares or will care about a specific shard. For querying whether some other validator is supposed to track a shard, use the EpochManagerAdapter, not ShardTracker.
  • RuntimeAdapter is an abstraction of everything that has to do with the runtime and account storage, such as get_trie_for_shard. It does bleed a little bit into the EpochManagerAdapter responsibilities, more on that later.

How does this split work for the two implementations: NightshadeRuntime and KeyValueRuntime?

  • NightshadeRuntime is split into EpochManagerHandle, ShardTracker, and NightshadeRuntime. So essentially, EpochManagerHandle is moved out (well, conceptually, but it still keeps a reference to it because some runtime methods depend on it - eventually this can be moved out as well and have the EpochManagerAdapter be passed in), ShardTracker is moved out, and NightshadeRuntime itself now only implements RuntimeAdapter.
    • When constructing the NightshadeRuntime, we must first construct the EpochManagerHandle. This is because we cannot have the NightshadeRuntime internally construct a different EpochManagerHandle, or else the two EpochManagers will not synchronize state, and things fail.
  • KeyValueRuntime is split into KeyValueEpochManager, and KeyValueRuntime. There's no ShardTracker specific to KeyValueXXX anymore (because it's a concrete type to begin with). KeyValueRuntime takes as a constructor argument KeyValueEpochManager; but here it's really just to help initialize some fields (num_shards, epoch_length, etc.).

Now, we talk about how users of this split are modified accordingly:

  • Wherever a struct used to keep a field runtime_adapter: Arc<dyn RuntimeWithEpochManagerAdapter>, it is now split into three fields, epoch_manager: Arc<dyn EpochManagerAdapter>, shard_tracker: ShardTracker, and runtime: Arc<dyn RuntimeAdapter>, but we can omit some of them if they are not actually used. Usages of the old field are individually changed to use only one of the new fields.
    • Note that queries to self.runtime_adapter.cares_about_shard(..., ..., ..., is_me: true) is changed to self.shard_tracker.care_about_shard(..., ..., ..., true) whereas self.runtime_adapter.cares_about_shard(..., ..., ..., is_me: false) is changed to self.epoch_manager.cares_about_shard_from_prev_block, because the latter does not actually need the ShardTracker.
    • Wherever we used to convert a runtime_adapter into one of the subtraits, we now must construct the subtrait directly:
      • any Arc<dyn RuntimeWithEpochManagerAdapter> argument that was passed in is split into up to three arguments depending on which are actually used.
      • any direct construction of NightshadeRuntime is easily converted to separate constructions of EpochManager, NightshadeRuntime, and ShardTracker.
      • any direct construction of KeyValueRuntime is easily converted to separate constructions of KeyValueEpochManager, KeyValueRuntime, and ShardTracker.
  • The TestEnv is one big part of this refactoring. The TestEnvBuilder used to take a .runtime_adapters(Vec<Arc<dyn RuntimeWithEpochManagerAdapter>>), but now we don't have the RuntimeWithEpochManagerAdapters anymore. We could split this into three, of course, but this would present a pretty bad inconvenience to any tests that call helpers like create_nightshade_runtimes(...). So, the logic is now the following:
    • There are now .stores(...), .epoch_managers(...), .shard_trackers(...), and .runtimes(...) overrides available. These must be overridden in an appropriate partial order (epoch_managers > stores, shard_trackers > epoch_managers, runtimes > epoch_managers, and also, stores > clients, epoch_managers > validators).
    • Any override that is not explicitly provided will be constructed with a default (where N is the number of clients):
      • stores will be default-constructed as N elements of create_test_store().
      • epoch_managers will be default-constructed as N elements of KeyValueEpochManager using the stores we have (this is why the stores must be finalized, either default-constructed or overridden, first, thus the partial order)
      • shard_trackers will be default-constructed as N elements of ShardTracker using the epoch_managers we have, with empty tracking config.
      • runtimes will be default-constructed as N elements of KeyValueRuntime using the stores and epoch_managers we have.
    • In addition, there are helpers that allow overriding with something other than the default:
      • .track_all_shards(): constructs shard_trackers that use the epoch_managers we have but with AllShards tracking
      • .real_epoch_managers(): constructs epoch_managers as N Arc<EpochManagerHandle>s instead of KeyValueEpochManager
      • .nightshade_runtimes() (this is an extension method that lives in integration-tests, to avoid a dependency cycle): constructs runtimes as N Arc<NightshadeRuntime>s using the stores and epoch_managers we have.
    • As a result, most of the constructions in integration-tests have been converted to using only these helper functions, as opposed to constructing runtimes directly in the tests. Some tests outside of integration-tests still construct directly, which is OK as they used to do that as well; we just now have to make sure to override stores, epoch_managers, and runtimes together so they are consistent.

@robin-near robin-near changed the title [Runtime Epoch Split] (7/n) Split runtime in Chain and Client. [Runtime Epoch Split] (7/n) Split runtime everywhere else Apr 27, 2023
Copy link
Contributor

@mzhangmzz mzhangmzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks fine, since it's replacing runtime_adapter with epoch_manager and fixing tests. You don't need to split it more.

pub runtime_adapter: Arc<dyn RuntimeWithEpochManagerAdapter>,
pub epoch_manager: Arc<dyn EpochManagerAdapter>,
pub shard_tracker: ShardTracker,
pub runtime: Arc<dyn RuntimeAdapter>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this field back to runtime_adapter? I think that will decrease the number of changed lines in this PR and I don't see a concrete reason for the rename

}

pub fn setup_with_validators(
vs: ValidatorSchedule,
epoch_length: u64,
tx_validity_period: NumBlocks,
) -> (Chain, Arc<KeyValueRuntime>, Vec<Arc<InMemoryValidatorSigner>>) {
) -> (Chain, Arc<KeyValueEpochManager>, Arc<KeyValueRuntime>, Vec<Arc<InMemoryValidatorSigner>>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe MockEpochManager/SimpleEpochManager is a better name? It doesn't always need to be used together with KeyValueRuntime.

@robin-near robin-near marked this pull request as ready for review May 1, 2023 17:08
@robin-near robin-near requested a review from a team as a code owner May 1, 2023 17:08
@robin-near robin-near requested a review from mzhangmzz May 1, 2023 17:08
@robin-near
Copy link
Contributor Author

Some Nayduck tests are failing. I'll look into why. https://nayduck.near.org/#/test/455339

@robin-near
Copy link
Contributor Author

Latest run is Nayduck-neutral.

@Longarithm
Copy link
Member

Longarithm commented May 4, 2023

I didn't look into code, but I have questions based on PR description:

  1. Who calls ShardTracker::care_about_shard with is_me = false? Looks like no one, can we omit this parameter then?
  2. Getting rid of create_nightshade_runtimes and other helpers is awesome. Can we add partial order of epoch_managers, stores, shard_trackers, ... to the comments over TestEnvBuilder to make it obvious for future test writers?

Follow-up question: after this it should be easier to test protocol upgrades because we can use mock epoch managers, right?

@robin-near
Copy link
Contributor Author

@Longarithm thanks for taking a look!

  1. There's one place that uses is_me = false, but also with account_id = None. This... actually becomes equivalent to is_me = true and account_id = None. (ugh). I didn't bother changing this here, but I do plan to do another simple refactoring that changes:

    • ShardTracker::care_about_shard --> ShardTracker::care_about_shard_myself
    • EpochManagerAdapter::cares_about_shard_from_prev_block --> EpochManagerAdapter::validator_tracks_shard

    that way the naming is a lot clearer.

  2. Sure. Let me add that in a separate PR, I've had enough of this sanity check nonsense lol

We should NOT use MockEpochManager. It's just soooo incorrect. Same with KeyValueRuntime. Please avoid them. The real EpochManager is actually pretty easy to use because it doesn't have many dependencies and only needs a few store columns, and we can still make it select a predictable set of block producers and chunk-only producers. Let me know if I can help there!

@Longarithm
Copy link
Member

We should NOT use MockEpochManager. It's just soooo incorrect. Same with KeyValueRuntime. Please avoid them.

Ow, it wasn't obvious from description. Thank you!
In protocol upgrades tests I want to trigger protocol upgrade when I want instead of waiting for some magical number of blocks, but it is a separate issue then.

@robin-near
Copy link
Contributor Author

robin-near commented May 4, 2023

In protocol upgrades tests I want to trigger protocol upgrade when I want instead of waiting for some magical number of blocks

I see, you can change the epoch height to be say 5, and then it shouldn't take very long. But in general forcing the protocol to do something that it isn't supposed to do will just break a hundred other things.

@near-bulldozer near-bulldozer bot merged commit e32db71 into near:master May 11, 2023
3 checks passed
@robin-near robin-near linked an issue Jul 19, 2023 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate EpochManager from Runtime
3 participants