-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: testloop stub for resharding v3 #12156
Conversation
27e518f
to
dd6143c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12156 +/- ##
==========================================
- Coverage 71.61% 71.60% -0.02%
==========================================
Files 824 824
Lines 165348 165507 +159
Branches 165348 165507 +159
==========================================
+ Hits 118412 118507 +95
- Misses 41803 41870 +67
+ Partials 5133 5130 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
70ffa65
to
e24f89a
Compare
// I *think* this is not relevant anymore, since we download | ||
// already the next epoch's state. | ||
// let shards_to_split = self.get_shards_to_split(sync_hash, &state_sync_info, &me)?; | ||
let shards_to_split = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can shards_to_split be removed at all from this function then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite amusing how many people are touching the state sync code simultaneously 🤯
I'm down for deleting all of it and re-adding whatever is necessary on top of Robin's state sync rewrite.
cc @marcelo-gonzalez WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine if we keep this change for now and remove it as part of a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I plan to do, yes. This line is not triggered now anyway.
) | ||
}; | ||
|
||
let epoch_config_store = EpochConfigStore::test(BTreeMap::from_iter(vec![( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
} | ||
|
||
let epoch_config = if chain_id.starts_with("test-chain-") { | ||
// We still do this for localnet as nayduck depends on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mocknet would also depend on this (for now the configs for mocknet are commented out, also that mocknet does protocol transition, so needs to support different epoch configs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular block of code won't impact mainnet, because its chain name doesn't start from test-chain-
, right?
For mocknet, I have two ideas:
- Change constructor every time we introduce new mocknet. Always derive base config from mainnet config of the latest version.
- Add new folder to the
epoch_configs/
corresponding to each separate mocknet instance. Shouldn't take too much place in repo, because we need to support only latest mainnet and couple protocol transitions on top of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for option 1: use mainnet latest config as the as base config and update it based on the specific mocknet setup (eg. num seats). otherwise, we will keep updating the mocknet in epoch_configs, which should be based on mainnet with fixed set of overrides anyways.
core/primitives/src/epoch_manager.rs
Outdated
config_store: Some(epoch_config_store), | ||
chain_id: chain_id.to_string(), | ||
epoch_length, | ||
// The fields below must be DEPRECATED. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. can you add a TODO referring to the issue id, so we do not forget it? (same for other todos and deprecations)
self.config_store.as_ref().unwrap().get_config(protocol_version).as_ref().clone(); | ||
// TODO(#11265): epoch length is overridden in many tests so we | ||
// need to support it here. Consider removing `epoch_length` from | ||
// EpochConfig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get this why epoch_length is special here. it is already overriden through test_epoch_config so it will be in the configs returned by the EpochConfigStore right?, so why do we need to keep a local field for epoch_length here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many tests have setup pattern like TestEnvBuilder::new().epoch_length(5)...
.
This invalidates epoch lengths given in default EpochConfigStore
s.
So override must happen somewhere. We could put epoch length to new
or remove out of EpochConfig
, but that means more lines changes which I try to minimise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, we can address epoch length later, it is not a blocker for this PR from my side.
@@ -43,7 +44,7 @@ use super::utils::network::{chunk_endorsement_dropper, partial_encoded_chunks_dr | |||
|
|||
pub(crate) struct TestLoopBuilder { | |||
test_loop: TestLoopV2, | |||
genesis: Option<Genesis>, | |||
genesis_and_epoch_config_store: Option<(Genesis, EpochConfigStore)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep these fields separate? (even though same function generates them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, that's more lines of code... I can do that, once the general idea is approved by more reviewers.
@@ -97,8 +98,11 @@ impl TestLoopBuilder { | |||
} | |||
|
|||
/// Set the genesis configuration for the test loop. | |||
pub(crate) fn genesis(mut self, genesis: Genesis) -> Self { | |||
self.genesis = Some(genesis); | |||
pub(crate) fn genesis_and_epoch_config_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, can we set these separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with primary focus on the test itself)
// I *think* this is not relevant anymore, since we download | ||
// already the next epoch's state. | ||
// let shards_to_split = self.get_shards_to_split(sync_hash, &state_sync_info, &me)?; | ||
let shards_to_split = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite amusing how many people are touching the state sync code simultaneously 🤯
I'm down for deleting all of it and re-adding whatever is necessary on top of Robin's state sync rewrite.
cc @marcelo-gonzalez WDYT?
/// The fields below are DEPRECATED. | ||
/// Epoch config must be controlled by `config_store` only. | ||
/// TODO(#11265): remove these fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice (deprecating use production config)
core/primitives/src/shard_layout.rs
Outdated
@@ -79,7 +79,7 @@ pub struct ShardLayoutV1 { | |||
/// Each shard contains a range of accounts from one boundary account to | |||
/// another - or the smallest or largest account possible. The total | |||
/// number of shards is equal to the number of boundary accounts plus 1. | |||
boundary_accounts: Vec<AccountId>, | |||
pub boundary_accounts: Vec<AccountId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this public access? Same for version.
let accounts = | ||
(0..8).map(|i| format!("account{}", i).parse().unwrap()).collect::<Vec<AccountId>>(); | ||
let clients = accounts.iter().cloned().collect_vec(); | ||
let block_and_chunk_producers = (0..8).map(|idx| accounts[idx].as_str()).collect_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to also setup some chunk validator only nodes.
let base_shard_layout = base_epoch_config.shard_layout.clone(); | ||
let base_num_shards = base_shard_layout.shard_ids().count() as ShardId; | ||
let mut epoch_config = base_epoch_config.clone(); | ||
let ShardLayout::V1(ShardLayoutV1 { mut boundary_accounts, version, .. }) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be tempted to make it use V2
(0..base_num_shards - 1).map(|i| vec![i]).collect(); | ||
shards_split_map.push(vec![base_num_shards - 1, base_num_shards]); | ||
boundary_accounts.push(AccountId::try_from("x.near".to_string()).unwrap()); | ||
epoch_config.shard_layout = ShardLayout::v1(boundary_accounts, Some(shards_split_map), version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty cool that you setup the shard layout programmatically.
client.epoch_manager.get_epoch_height_from_prev_block(&tip.prev_block_hash).unwrap(); | ||
assert!(epoch_height < 5); | ||
let epoch_config = client.epoch_manager.get_epoch_config(&tip.epoch_id).unwrap(); | ||
return epoch_config.shard_layout.shard_ids().count() == base_num_shards as usize + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why count
? I see len
more often, is it not available here? Is it because it's an iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, because it's iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only remaining comment it about splitting genesis_and_epoch_config_store
but that could be addressed later. This PR already achieves a lot. Thanks. LGTM.
/// After uncommenting panics with | ||
/// StorageInconsistentState("Failed to find root node ... in memtrie") | ||
#[test] | ||
// #[ignore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can remove this "ignore"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I need to comment that back. I couldn't make the test pass
Goal
Write stub for test for resharding v3 switch. For this, I want the chain to switch between shard layouts.
And for that, I switch to
EpochConfigStore
as much as I can, which implies skippinguse_production_config
, overrides likeAllEpochConfig::config_max_kickout_stake
,EpochConfig
generations fromGenesisConfig
. This is a big step towards #11265.The most visible changes are:
Now TestLoop generates
genesis_and_epoch_config_store
instead of justgenesis
. Later we should have a separateEpochConfigStoreBuilder
which may accept some data shared between genesis and epoch configs, e.g. validators set. This is done to minimise changes.EpochManager::new_arc_handle
is the way how epoch manager is constructed on production nodes. Its logic is changed as follows:EpochConfigStore::for_chain_id
is used for getting epoch configs.chain_id.starts_with("test-chain-")
, we use onlyEpochConfig::from(genesis_config)
(see below!)Genesis::test_epoch_config
. It doesn't use any genesis data, just stays in this crate for now for convenience. This is for simple tests in single module.Achievements
test_fix_min_stake_ratio
tests exactly what we want - we takeEpochConfigStore::for_chain_id("mainnet")
and see that it allows to include small validator after protocol upgrade.test_resharding_v3
we define old and new shard layouts, and test the switch explicitly without hidden overrides.EpochManager::new_from_genesis_config_with_test_overrides
is removed.for_epoch_id
for custom mocknet chain name.Failures
Nayduck often configures epoch config through genesis, e.g. by setting
block_producer_kickout_threshold
to 0. It is much more work to change this configuration, so I add a hack: if chain_id starts withtest-chain-
- name which nayduck uses - epoch config is derived from genesis. Many old integration tests use this chain id as well.However, the improvement here is that we generate only one epoch config, without any overrides.
epoch_length is sometimes taken from
ChainGenesis
, not fromEpochConfig
. To be safe, I set epoch length in both genesis and epoch configs.This still lacks testing on live node. Using this on canary or forknet could be insightful.