New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: reduce places where we assume shard ids are contiguous #10230
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #10230 +/- ##
==========================================
+ Coverage 71.80% 71.82% +0.01%
==========================================
Files 707 707
Lines 141904 141944 +40
Branches 141904 141944 +40
==========================================
+ Hits 101896 101953 +57
+ Misses 35290 35272 -18
- Partials 4718 4719 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -417,6 +417,10 @@ impl EpochManagerAdapter for MockEpochManager { | |||
Ok(self.num_shards) | |||
} | |||
|
|||
fn shard_ids(&self, _epoch_id: &EpochId) -> Result<Vec<ShardId>, EpochError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating here runs a risk of making some hot loops that check for shard_ids quite a bit more expensive. I would advocate for a SmallVec
or perhaps returning an impl Iterator<Item=ShardId>
(insert the result wrapper where you think it is more appropriate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps returning an impl Iterator<Item=ShardId> (insert the result wrapper where you think it is more appropriate.)
This is a trait method so AFAIU, it cannot return an iterator
. Is there some way to doing that in rust now?
Allocating here runs a risk of making some hot loops that check for shard_ids quite a bit more expensive. I would advocate for a
SmallVec
Are you worried about the performance for the test code or the production code as well? I am asking because your comment is on the test code.
We could choose to return a SmallVec
or we could alternatively add an additional function fn is_valid_shard_id()
to handle those use cases.
I am also wondering if we might be doing some premature optimisation. Do you have specific checks in mind that will call this in a hot loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you worried about the performance for the test code or the production code as well? I am asking because your comment is on the test code.
My concern is largely with the production code. e.g. chain/chain/src/chain.rs calls this function frequently, and I don’t have a good intuition how frequently some of the functions there get called (especially those that are more nebulous like the has_all_receipts
). Anyway, I just grepped for the first definition and wrote my comment there.
This is a trait method so AFAIU, it cannot return an iterator. Is there some way to doing that in rust now?
Yeah traits make impl Iterator
harder. Still possible with associated types like so:
trait Foo {
type ShardIdIterator = ...;
fn shard_ids(&self) -> Self::ShardIdIterator { ... }
}
but unfortunately that requires an unstable feature if you can’t name the type being returned easily.
I am also wondering if we might be doing some premature optimisation. Do you have specific checks in mind that will call this in a hot loop?
A fair point. It is unfortunate we do not have a good automated performance evaluation tool, and it is also true we’ve landed other major changes to performance sensitive areas in the past without too much ceremony.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but unfortunately that requires an unstable feature if you can’t name the type being returned easily.
I am excited to see this feature land in the near future.
A fair point. It is unfortunate we do not have a good automated performance evaluation tool, and it is also true we’ve landed other major changes to performance sensitive areas in the past without too much ceremony.
I am leaning towards not worrying too much about the potential performance impact then. I believe we do do some performance testing before deploying to mainnet. So hopefully issues would be caught there.
@@ -64,7 +64,7 @@ pub fn proposals_to_epoch_info( | |||
&mut chunk_producer_proposals, | |||
max_cp_selected, | |||
min_stake_ratio, | |||
num_shards, | |||
shard_ids.len() as NumShards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should still use epoch_config.shard_layout.num_shards()
here I feel, or remove that method altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this PR is merged, I plan on removing the num_shards()
function all together. I wanted to do it piecemeal. Doing it in a single PR made the PR quite big.
@@ -417,6 +417,10 @@ impl EpochManagerAdapter for MockEpochManager { | |||
Ok(self.num_shards) | |||
} | |||
|
|||
fn shard_ids(&self, _epoch_id: &EpochId) -> Result<Vec<ShardId>, EpochError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you worried about the performance for the test code or the production code as well? I am asking because your comment is on the test code.
My concern is largely with the production code. e.g. chain/chain/src/chain.rs calls this function frequently, and I don’t have a good intuition how frequently some of the functions there get called (especially those that are more nebulous like the has_all_receipts
). Anyway, I just grepped for the first definition and wrote my comment there.
This is a trait method so AFAIU, it cannot return an iterator. Is there some way to doing that in rust now?
Yeah traits make impl Iterator
harder. Still possible with associated types like so:
trait Foo {
type ShardIdIterator = ...;
fn shard_ids(&self) -> Self::ShardIdIterator { ... }
}
but unfortunately that requires an unstable feature if you can’t name the type being returned easily.
I am also wondering if we might be doing some premature optimisation. Do you have specific checks in mind that will call this in a hot loop?
A fair point. It is unfortunate we do not have a good automated performance evaluation tool, and it is also true we’ve landed other major changes to performance sensitive areas in the past without too much ceremony.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic, thanks!
@@ -242,8 +240,8 @@ mod tests { | |||
|
|||
tracing::info!("checking the pool after resharding"); | |||
{ | |||
let num_shards = new_shard_layout.num_shards(); | |||
for shard_id in 0..num_shards { | |||
let shard_ids: Vec<_> = new_shard_layout.shard_ids().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: collect_vec from itertools is an alternative to explicit type. It's just preference, up to you.
@@ -1273,7 +1276,7 @@ impl ShardsManager { | |||
}; | |||
} | |||
|
|||
if header.shard_id() >= self.epoch_manager.num_shards(&epoch_id)? { | |||
if !self.epoch_manager.shard_ids(&epoch_id)?.contains(&header.shard_id()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit slower than before and it's hinting that the shards should actually be stored as a set rather then a vector. Totally not for this PR, just general thoughts for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I noticed the same thing. In general, I wonder how often we can expect number of shards of change and if the information could be cached. Still, we should do some performance analysis before introducing such complexities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it shard layout could change every epoch or every two epochs - quite a long time to begin with. In practice it's more likely to be much less frequent. The reason why it cannot change more often is becuase it's tied to the protocol version which can only change at epoch boundary.
) | ||
}) | ||
.collect(); | ||
let shards_to_sync: Vec<_> = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's prettier now :)
@@ -194,7 +194,7 @@ impl ShardLayout { | |||
/// Returns error if `shard_id` is an invalid shard id in the current layout | |||
/// Panics if `self` has no parent shard layout | |||
pub fn get_parent_shard_id(&self, shard_id: ShardId) -> Result<ShardId, ShardLayoutError> { | |||
if shard_id > self.num_shards() { | |||
if !self.shard_ids().any(|id| id == shard_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: contains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shard_ids()
is returning an iterator here, not a vector. AFAICT, there is no contains
function on iterators.
@@ -38,7 +38,7 @@ pub fn csv_to_json_configs(home: &Path, chain_id: String, tracked_shards: Vec<Sh | |||
// Verify that key files exist. | |||
assert!(home.join(NODE_KEY_FILE).as_path().exists(), "Node key file should exist"); | |||
|
|||
if tracked_shards.iter().any(|shard_id| *shard_id >= NUM_SHARDS) { | |||
if tracked_shards.iter().any(|&shard_id| shard_id >= NUM_SHARDS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should check if every tracked shard_id is in the shard_ids vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I wasn't sure what to do here. We have only defined NUM_SHARDS
here and not a vector of shard ids. I think in a separate change, we can have a global vector of shard ids instead of NUM_SHARDS and that I will clean things up more. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized it's outside of the regular node operation. Yeah it's hard to tell what's the num shards without access to chain, epoch and the current shard layout. I'm totally fine leaving it as is. Do you know where is this genesis-tool used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what the purpose of this tool is sadly.
let mut all_proposals = vec![]; | ||
let mut all_receipts = vec![]; | ||
for i in 0..num_shards { | ||
for shard_id in shard_ids { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback folks. I will merge this as is and look into doing some additional works in separate PRs.
@@ -194,7 +194,7 @@ impl ShardLayout { | |||
/// Returns error if `shard_id` is an invalid shard id in the current layout | |||
/// Panics if `self` has no parent shard layout | |||
pub fn get_parent_shard_id(&self, shard_id: ShardId) -> Result<ShardId, ShardLayoutError> { | |||
if shard_id > self.num_shards() { | |||
if !self.shard_ids().any(|id| id == shard_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shard_ids()
is returning an iterator here, not a vector. AFAICT, there is no contains
function on iterators.
@@ -38,7 +38,7 @@ pub fn csv_to_json_configs(home: &Path, chain_id: String, tracked_shards: Vec<Sh | |||
// Verify that key files exist. | |||
assert!(home.join(NODE_KEY_FILE).as_path().exists(), "Node key file should exist"); | |||
|
|||
if tracked_shards.iter().any(|shard_id| *shard_id >= NUM_SHARDS) { | |||
if tracked_shards.iter().any(|&shard_id| shard_id >= NUM_SHARDS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I wasn't sure what to do here. We have only defined NUM_SHARDS
here and not a vector of shard ids. I think in a separate change, we can have a global vector of shard ids instead of NUM_SHARDS and that I will clean things up more. WDYT?
…on (#10238) Instead of defining the number of shards and assuming they are contiguous, define a list of shard ids. As future work, we could try putting holes in the shard id list and see what tests / code breaks. Also see #10230 (comment).
) See #10230; #10242; etc. for prior works. I am slowly trying to remove all uses of the `num_shards()` function and improving the code quality in process. Changes made in this PR: - Use better names for variables to improve readability. - Before `shard_receipts` was a vector but it is cleaner for it to be a hashmap instead. - clarify that `account_id_to_shard_id_map` is just a simple cache. And use more idiomatic rust around accessing it. CC: @wacban
There are many places in the repo where we assume that the valid shard ids are in the range [0, num_shards). This PR is an attempt to improve the current state of affairs.
fn shard_ids()
which still employs the above assumption.Future work:
fn num_shards()
. It can be derived from the previous function. It should be removed for consistency reasons.