Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: reduce places where we assume shard ids are contiguous #10230

Merged
merged 2 commits into from Nov 23, 2023

Conversation

akhi3030
Copy link
Collaborator

There are many places in the repo where we assume that the valid shard ids are in the range [0, num_shards). This PR is an attempt to improve the current state of affairs.

  • ShardLayout introduces a new method: fn shard_ids() which still employs the above assumption.
  • All instances above assumption are moved to calls to the above function so that the assumption is centralised in a single place.

Future work:

  • If we have a function that returns a list of shard ids, we do not need fn num_shards(). It can be derived from the previous function. It should be removed for consistency reasons.

@akhi3030 akhi3030 marked this pull request as ready for review November 21, 2023 16:20
@akhi3030 akhi3030 requested a review from a team as a code owner November 21, 2023 16:20
Copy link

codecov bot commented Nov 21, 2023

Codecov Report

Attention: 56 lines in your changes are missing coverage. Please review.

Comparison is base (2318c3a) 71.80% compared to head (c3270a8) 71.82%.
Report is 1 commits behind head on master.

Files Patch % Lines
chain/client/src/client_actor.rs 0.00% 16 Missing ⚠️
tools/amend-genesis/src/lib.rs 0.00% 6 Missing and 1 partial ⚠️
tools/state-viewer/src/epoch_info.rs 0.00% 7 Missing ⚠️
chain/client/src/debug.rs 0.00% 6 Missing ⚠️
chain/chain/src/chain.rs 33.33% 0 Missing and 4 partials ⚠️
chain/epoch-manager/src/validator_selection.rs 78.94% 4 Missing ⚠️
chain/chunks/src/lib.rs 70.00% 0 Missing and 3 partials ⚠️
tools/state-viewer/src/state_changes.rs 0.00% 2 Missing ⚠️
chain/chain/src/flat_storage_creator.rs 50.00% 0 Missing and 1 partial ⚠️
chain/chain/src/test_utils.rs 0.00% 1 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10230      +/-   ##
==========================================
+ Coverage   71.80%   71.82%   +0.01%     
==========================================
  Files         707      707              
  Lines      141904   141944      +40     
  Branches   141904   141944      +40     
==========================================
+ Hits       101896   101953      +57     
+ Misses      35290    35272      -18     
- Partials     4718     4719       +1     
Flag Coverage Δ
backward-compatibility 0.08% <0.00%> (-0.01%) ⬇️
db-migration 0.08% <0.00%> (-0.01%) ⬇️
genesis-check 1.23% <0.00%> (-0.01%) ⬇️
integration-tests 36.16% <35.25%> (+<0.01%) ⬆️
linux 71.71% <59.42%> (+<0.01%) ⬆️
linux-nightly 71.59% <59.71%> (+0.03%) ⬆️
macos 55.71% <47.82%> (+1.80%) ⬆️
pytests 1.46% <0.00%> (-0.01%) ⬇️
sanity-checks 1.26% <0.00%> (-0.01%) ⬇️
unittests 68.16% <50.35%> (+0.01%) ⬆️
upgradability 0.13% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -417,6 +417,10 @@ impl EpochManagerAdapter for MockEpochManager {
Ok(self.num_shards)
}

fn shard_ids(&self, _epoch_id: &EpochId) -> Result<Vec<ShardId>, EpochError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating here runs a risk of making some hot loops that check for shard_ids quite a bit more expensive. I would advocate for a SmallVec or perhaps returning an impl Iterator<Item=ShardId> (insert the result wrapper where you think it is more appropriate.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps returning an impl Iterator<Item=ShardId> (insert the result wrapper where you think it is more appropriate.)

This is a trait method so AFAIU, it cannot return an iterator. Is there some way to doing that in rust now?

Allocating here runs a risk of making some hot loops that check for shard_ids quite a bit more expensive. I would advocate for a SmallVec

Are you worried about the performance for the test code or the production code as well? I am asking because your comment is on the test code.

We could choose to return a SmallVec or we could alternatively add an additional function fn is_valid_shard_id() to handle those use cases.

I am also wondering if we might be doing some premature optimisation. Do you have specific checks in mind that will call this in a hot loop?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried about the performance for the test code or the production code as well? I am asking because your comment is on the test code.

My concern is largely with the production code. e.g. chain/chain/src/chain.rs calls this function frequently, and I don’t have a good intuition how frequently some of the functions there get called (especially those that are more nebulous like the has_all_receipts). Anyway, I just grepped for the first definition and wrote my comment there.


This is a trait method so AFAIU, it cannot return an iterator. Is there some way to doing that in rust now?

Yeah traits make impl Iterator harder. Still possible with associated types like so:

trait Foo {
    type ShardIdIterator = ...;
    fn shard_ids(&self) -> Self::ShardIdIterator { ... }
}

but unfortunately that requires an unstable feature if you can’t name the type being returned easily.


I am also wondering if we might be doing some premature optimisation. Do you have specific checks in mind that will call this in a hot loop?

A fair point. It is unfortunate we do not have a good automated performance evaluation tool, and it is also true we’ve landed other major changes to performance sensitive areas in the past without too much ceremony.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but unfortunately that requires an unstable feature if you can’t name the type being returned easily.

I am excited to see this feature land in the near future.

A fair point. It is unfortunate we do not have a good automated performance evaluation tool, and it is also true we’ve landed other major changes to performance sensitive areas in the past without too much ceremony.

I am leaning towards not worrying too much about the potential performance impact then. I believe we do do some performance testing before deploying to mainnet. So hopefully issues would be caught there.

@@ -64,7 +64,7 @@ pub fn proposals_to_epoch_info(
&mut chunk_producer_proposals,
max_cp_selected,
min_stake_ratio,
num_shards,
shard_ids.len() as NumShards,
Copy link
Collaborator

@nagisa nagisa Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should still use epoch_config.shard_layout.num_shards() here I feel, or remove that method altogether.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR is merged, I plan on removing the num_shards() function all together. I wanted to do it piecemeal. Doing it in a single PR made the PR quite big.

@@ -417,6 +417,10 @@ impl EpochManagerAdapter for MockEpochManager {
Ok(self.num_shards)
}

fn shard_ids(&self, _epoch_id: &EpochId) -> Result<Vec<ShardId>, EpochError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried about the performance for the test code or the production code as well? I am asking because your comment is on the test code.

My concern is largely with the production code. e.g. chain/chain/src/chain.rs calls this function frequently, and I don’t have a good intuition how frequently some of the functions there get called (especially those that are more nebulous like the has_all_receipts). Anyway, I just grepped for the first definition and wrote my comment there.


This is a trait method so AFAIU, it cannot return an iterator. Is there some way to doing that in rust now?

Yeah traits make impl Iterator harder. Still possible with associated types like so:

trait Foo {
    type ShardIdIterator = ...;
    fn shard_ids(&self) -> Self::ShardIdIterator { ... }
}

but unfortunately that requires an unstable feature if you can’t name the type being returned easily.


I am also wondering if we might be doing some premature optimisation. Do you have specific checks in mind that will call this in a hot loop?

A fair point. It is unfortunate we do not have a good automated performance evaluation tool, and it is also true we’ve landed other major changes to performance sensitive areas in the past without too much ceremony.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fantastic, thanks!

@@ -242,8 +240,8 @@ mod tests {

tracing::info!("checking the pool after resharding");
{
let num_shards = new_shard_layout.num_shards();
for shard_id in 0..num_shards {
let shard_ids: Vec<_> = new_shard_layout.shard_ids().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: collect_vec from itertools is an alternative to explicit type. It's just preference, up to you.

@@ -1273,7 +1276,7 @@ impl ShardsManager {
};
}

if header.shard_id() >= self.epoch_manager.num_shards(&epoch_id)? {
if !self.epoch_manager.shard_ids(&epoch_id)?.contains(&header.shard_id()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit slower than before and it's hinting that the shards should actually be stored as a set rather then a vector. Totally not for this PR, just general thoughts for the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I noticed the same thing. In general, I wonder how often we can expect number of shards of change and if the information could be cached. Still, we should do some performance analysis before introducing such complexities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it shard layout could change every epoch or every two epochs - quite a long time to begin with. In practice it's more likely to be much less frequent. The reason why it cannot change more often is becuase it's tied to the protocol version which can only change at epoch boundary.

)
})
.collect();
let shards_to_sync: Vec<_> = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's prettier now :)

@@ -194,7 +194,7 @@ impl ShardLayout {
/// Returns error if `shard_id` is an invalid shard id in the current layout
/// Panics if `self` has no parent shard layout
pub fn get_parent_shard_id(&self, shard_id: ShardId) -> Result<ShardId, ShardLayoutError> {
if shard_id > self.num_shards() {
if !self.shard_ids().any(|id| id == shard_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: contains

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard_ids() is returning an iterator here, not a vector. AFAICT, there is no contains function on iterators.

@@ -38,7 +38,7 @@ pub fn csv_to_json_configs(home: &Path, chain_id: String, tracked_shards: Vec<Sh
// Verify that key files exist.
assert!(home.join(NODE_KEY_FILE).as_path().exists(), "Node key file should exist");

if tracked_shards.iter().any(|shard_id| *shard_id >= NUM_SHARDS) {
if tracked_shards.iter().any(|&shard_id| shard_id >= NUM_SHARDS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should check if every tracked shard_id is in the shard_ids vector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I wasn't sure what to do here. We have only defined NUM_SHARDS here and not a vector of shard ids. I think in a separate change, we can have a global vector of shard ids instead of NUM_SHARDS and that I will clean things up more. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized it's outside of the regular node operation. Yeah it's hard to tell what's the num shards without access to chain, epoch and the current shard layout. I'm totally fine leaving it as is. Do you know where is this genesis-tool used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what the purpose of this tool is sadly.

let mut all_proposals = vec![];
let mut all_receipts = vec![];
for i in 0..num_shards {
for shard_id in shard_ids {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Collaborator Author

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback folks. I will merge this as is and look into doing some additional works in separate PRs.

@@ -194,7 +194,7 @@ impl ShardLayout {
/// Returns error if `shard_id` is an invalid shard id in the current layout
/// Panics if `self` has no parent shard layout
pub fn get_parent_shard_id(&self, shard_id: ShardId) -> Result<ShardId, ShardLayoutError> {
if shard_id > self.num_shards() {
if !self.shard_ids().any(|id| id == shard_id) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard_ids() is returning an iterator here, not a vector. AFAICT, there is no contains function on iterators.

@@ -38,7 +38,7 @@ pub fn csv_to_json_configs(home: &Path, chain_id: String, tracked_shards: Vec<Sh
// Verify that key files exist.
assert!(home.join(NODE_KEY_FILE).as_path().exists(), "Node key file should exist");

if tracked_shards.iter().any(|shard_id| *shard_id >= NUM_SHARDS) {
if tracked_shards.iter().any(|&shard_id| shard_id >= NUM_SHARDS) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I wasn't sure what to do here. We have only defined NUM_SHARDS here and not a vector of shard ids. I think in a separate change, we can have a global vector of shard ids instead of NUM_SHARDS and that I will clean things up more. WDYT?

@akhi3030 akhi3030 added this pull request to the merge queue Nov 23, 2023
Merged via the queue into near:master with commit 431ae41 Nov 23, 2023
16 of 17 checks passed
@akhi3030 akhi3030 deleted the iter-shard-ids branch November 23, 2023 16:52
github-merge-queue bot pushed a commit that referenced this pull request Nov 24, 2023
github-merge-queue bot pushed a commit that referenced this pull request Nov 24, 2023
…on (#10238)

Instead of defining the number of shards and assuming they are
contiguous, define a list of shard ids.

As future work, we could try putting holes in the shard id list and see
what tests / code breaks.

Also see
#10230 (comment).
github-merge-queue bot pushed a commit that referenced this pull request Nov 24, 2023
github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2023
)

See #10230;
#10242; etc. for prior works.

I am slowly trying to remove all uses of the `num_shards()` function and
improving the code quality in process.

Changes made in this PR:
- Use better names for variables to improve readability.
- Before `shard_receipts` was a vector but it is cleaner for it to be a
hashmap instead.
- clarify that `account_id_to_shard_id_map` is just a simple cache. And
use more idiomatic rust around accessing it.

CC: @wacban
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants