NEPs on how to split states of shards for resharding #241

mzhangmzz · 2021-07-19T23:16:45Z

No description provided.

render · 2021-07-19T23:16:48Z

Your Render PR Server URL is https://nomicon-pr-241.onrender.com.

Follow its progress at https://dashboard.render.com/static/srv-c3r0fns6fj3f3m2jjpi0.

mzhangmzz · 2021-07-19T23:18:25Z

Most of the spec is done. Still need to fill some analysis sections like Rationales and alternatives, Unresolved questions, etc.

…ibilities

bowenwang1996

Looks good. One small thing: we prefer to have one sentence per line.

specs/Proposals/0040-split-states.md

Longarithm

Thank you, it is highly readable!

specs/Proposals/0040-split-states.md

bowenwang1996 · 2021-07-22T00:41:59Z

specs/Proposals/0040-split-states.md

-is not a reason to accept the current or a future NEP. Such notes should be
-in the section on motivation or rationale in this or subsequent NEPs.
-The section merely provides additional information.
+None


well we've discussed possibly switching to on-chain state updates once we introduce challenges...

MaksymZavershynskyi · 2021-07-26T18:01:36Z

Thank you for thorough NEP and following https://gov.near.org/t/quality-control-for-protocol-api-changes/1941#the-format-4 especially for talking about drawbacks and alternatives! Can we also add a list of affected projects (maybe @bowenwang1996 can help since he has a lot of context) and pre-mortem (imagine a most likely scenario of how this design change can lead to a failure). Thanks!

…ersion. Addressed some comments but not all. Still need to finish the Future section

frol · 2021-07-29T07:21:25Z

I am sorry for jumping late to the discussion, but reviewing near/nearcore#4554, I feel like from the end-user point of view, they should care about some "shard id" (I feel it does not matter if it is unique in the epoch or unique across the epochs). Here are the questions I suggest we ask ourselves:

Why do you believe outside world should care about shard ord rather than unique shard id?
If outside world should care about ord, why should they care that it is “ord” rather than calling it “id”?

If we want some unique shard id, we may just introduce ShardUniqueId (ShardUid) instead of going through this massive renaming effort.

mzhangmzz · 2021-07-29T16:10:19Z

I am sorry for jumping late to the discussion, but reviewing near/nearcore#4554, I feel like from the end-user point of view, they should care about some "shard id" (I feel it does not matter if it is unique in the epoch or unique across the epochs). Here are the questions I suggest we ask ourselves:

Why do you believe outside world should care about shard ord rather than unique shard id?
If outside world should care about ord, why should they care that it is “ord” rather than calling it “id”?

If we want some unique shard id, we may just introduce ShardUniqueId (ShardUid) instead of going through this massive renaming effort.

@frol thanks for the detailed comment! Regarding your questions

Why do you believe outside world should care about shard ord rather than unique shard id?

That's a great question. I think unique shard id is a concept that closely tied to how we choose to represent shard internally in the state. Thus it may subject to change in the future if the implementation changes. By hiding it away from protocol level and the outside world, such change would not trigger protocol level upgrade in the future.

If outside world should care about ord, why should they care that it is “ord” rather than calling it “id”?

You are right on this. Initially I didn't think renaming would be such an invasive change and I wanted the naming to highlight the difference between "ord" and "id". If you think the renaming creates a lot of trouble on the API side, we can definitely not do it and just create a new name for the new id like you suggested.

bowenwang1996

One question that occurred to me day: what happens to nonvalidator nodes which specify the shards they track? It seems that, according to what is proposed here, those nodes will start tracking all the shards that the original shard split into. The side effect is that what the node actually tracks will no longer be the same as what is specified in config.json and if a node stops and starts again after the split is done, it will stop tracking the shards whose ids are not in tracked_shards. This could be confusing for people who need to run nonvalidator nodes and we should at least document such peculiar behavior. It would certainly be better if we can make this experience more smooth for nonvalidator nodes as well.

mzhangmzz · 2021-07-29T20:25:52Z

One question that occurred to me day: what happens to nonvalidator nodes which specify the shards they track? It seems that, according to what is proposed here, those nodes will start tracking all the shards that the original shard split into. The side effect is that what the node actually tracks will no longer be the same as what is specified in config.json and if a node stops and starts again after the split is done, it will stop tracking the shards whose ids are not in tracked_shards. This could be confusing for people who need to run nonvalidator nodes and we should at least document such peculiar behavior. It would certainly be better if we can make this experience more smooth for nonvalidator nodes as well.

That's a good point. If we want to make the process smoother for validator nodes, when specifying shards to track, we can make them supply the epoch_id along with the shard_id as well, then at start, ShardTracker can figure out what shards these shards become in the current shards.

bowenwang1996 · 2021-07-29T20:57:09Z

One question that occurred to me day: what happens to nonvalidator nodes which specify the shards they track? It seems that, according to what is proposed here, those nodes will start tracking all the shards that the original shard split into. The side effect is that what the node actually tracks will no longer be the same as what is specified in config.json and if a node stops and starts again after the split is done, it will stop tracking the shards whose ids are not in tracked_shards. This could be confusing for people who need to run nonvalidator nodes and we should at least document such peculiar behavior. It would certainly be better if we can make this experience more smooth for nonvalidator nodes as well.

That's a good point. If we want to make the process smoother for validator nodes, when specifying shards to track, we can make them supply the epoch_id along with the shard_id as well, then at start, ShardTracker can figure out what shards these shards become in the current shards.

Well the main problem here is that let's say someone wants to track all shards. Today they will specify tracked_shards: [0] to do that. But after the split of shards is done and they stop and restart their node, they will stop tracking every shard other than shard 0, which is suboptimal. To combat this, I think we have the following options:

Allow a node to specify that it wants to track all shards in config.json. Something like tracked_shards: all
When a node stops, overwrite config.json with shards it actually tracks. This might not be ideal if a node only wants to track one shard even if it is split into two.

mzhangmzz · 2021-07-29T21:15:43Z

One question that occurred to me day: what happens to nonvalidator nodes which specify the shards they track? It seems that, according to what is proposed here, those nodes will start tracking all the shards that the original shard split into. The side effect is that what the node actually tracks will no longer be the same as what is specified in config.json and if a node stops and starts again after the split is done, it will stop tracking the shards whose ids are not in tracked_shards. This could be confusing for people who need to run nonvalidator nodes and we should at least document such peculiar behavior. It would certainly be better if we can make this experience more smooth for nonvalidator nodes as well.

That's a good point. If we want to make the process smoother for validator nodes, when specifying shards to track, we can make them supply the epoch_id along with the shard_id as well, then at start, ShardTracker can figure out what shards these shards become in the current shards.

Well the main problem here is that let's say someone wants to track all shards. Today they will specify tracked_shards: [0] to do that. But after the split of shards is done and they stop and restart their node, they will stop tracking every shard other than shard 0, which is suboptimal. To combat this, I think we have the following options:

Allow a node to specify that it wants to track all shards in config.json. Something like tracked_shards: all

When a node stops, overwrite config.json with shards it actually tracks. This might not be ideal if a node only wants to track one shard even if it is split into two.

Yeah I understand what you meant. I think the issue again is that the meaning of shard id changes when resharding happens. One way to solve that is to be more specific than just providing the shard_id. The config file will include the epoch_id (or protocol version) along with the shard id. For example, let's say a validator starts with tracking shard 0 and they start from genesis. Then the config file will have tracked_shards: [0], tracked_shards_epoch: genesis_epoch_id. Then later when the validator restarts, the ShardTracker can know the shard 0 is not the current shard 0, but the genesis shard 0. It can derive which shards it should track now from the EpochInfos stored in the database.

mzhangmzz · 2021-07-29T23:08:35Z

As I am implementing the change regarding ShardId and ShardOrd, I do find that having two different ids referring to shards quite confusing and could be error prone. We need to be extremely careful to use the correct type whenever we use ShardId or ShardOrd. Since ShardId occurs so many places in the code, it is painstaking to get it exactly right. It may also introduce problems in the future if we are not careful and mix up these two. Since they are the same type (u64), it is quite easy to do so. Thus, I am having second thought on whether we should keep them separate or just use ShardId( or ShardUId as @frol calls it) everywhere.

As I mentioned, the downside of moving away from ShardOrd and using a unique number (whether we call it ShardId or ShardUId) to refer to Shard is that the unique number could be an implementation detail that subjects to change in the future. Thus, we may want to think about how we would like to approach resharding when challenges are enabled and dynamic sharding to see if we want to switch the protocol-level code to also use ShardId instead of ShardOrd.

@bowenwang1996 @abacabadabacaba what are your thoughts?

bowenwang1996 · 2021-07-30T17:47:10Z

@mzhangmzz using the same type does not really prevent us from mixing them up? It seems that we need some internal representation that is different anyways. So it seems that introducing a strictly different type may be helpful (this could be purely an implementation detail)

frol · 2021-08-03T21:03:08Z

Today they will specify tracked_shards: [0]

I suggest to take a step back and ask ourselves why would anyone care about shard number X? and how would they know that they want to track exactly the shard number X? Maybe they care about a set of specific accounts and we should track based on the list of account ids instead of the shard ord/id/uid?

mzhangmzz · 2021-08-03T22:03:45Z

Today they will specify tracked_shards: [0]

I suggest to take a step back and ask ourselves why would anyone care about shard number X? and how would they know that they want to track exactly the shard number X? Maybe they care about a set of specific accounts and we should track based on the list of account ids instead of the shard ord/id/uid?

I think that's a very good point. Maybe the outside world should not have information about the internal shard layout at all, since we want the sharding implementation to be as transient as possible. We can provide two options, tracking all shards, or tracking shards from accounts.

bowenwang1996 · 2021-08-04T18:37:27Z

Maybe they care about a set of specific accounts and we should track based on the list of account ids instead of the shard ord/id/uid?

It seems that there are the following use cases:

Tracking all shards.
Tracking a set of accounts.

Today we already support tracking a set of accounts through tracked_accounts. We just need to make sure that this continues to work after having multiple shards. In addition, we need to make sure that people who want to track all shards have a way to do that.

This changeset implements ShardLayout and ShardConfig in near/NEPs#241

specs/Proposals/0040-split-states.md

This PR implements ShardUId as specified in near/NEPs#241 so that trie states are indexed by 'shard_uidinstead ofshard_id`. The main change is to change the interface around trie storage to use shard_uid instead of shard_id. The database columns ColState and ColTrieChanges are now indexed by shard_uid instead of shard_id. Changes that this PR does not include: change account_id_to_shard_id to account for changing shard layout gc Test Plan: The main source for errors is where shard_id and shard_uid is mixed up or the wrong shard_version is used for shard_uid, which may result in DB not found error. To test for the first type, we will change a wide range of existing unit and integration tests to use a shard layout with shard_version = 1 so shard_id and shard_uid cannot be mixed up. Co-authored-by: Bowen Wang <bowenwang1996@users.noreply.github.com>

specs/Proposals/0040-split-states.md

bowenwang1996 · 2021-09-11T00:47:33Z

specs/Proposals/0040-split-states.md

+ In `apply_chunks`, after processing each chunk, the state changes in `apply_results` are sorted into changes to new shards.
+At the end, we apply these changes to the new shards.


Ideally this should also be done in a different thread, but for this specific upgrade I think it is fine

specs/Proposals/0040-split-states.md

bowenwang1996

Thanks for the update!

bowenwang1996 · 2021-09-15T00:24:05Z

specs/Proposals/0040-split-states.md

+### Database storage
+The following database columns are stored with `ShardId` as part of the database key, it will be replaced by `ShardUId`
+- ColState
+- ColChunkExtra
+- ColTrieChanges


bowenwang1996 · 2021-09-15T00:24:21Z

specs/Proposals/0040-split-states.md

 #### `TrieCachingStorage`
-Trie storage will be contruct database key from `ShardUId` and hash of the trie node.
+Trie storage will contruct database key from `ShardUId` and hash of the trie node.


Suggested change

Trie storage will contruct database key from `ShardUId` and hash of the trie node.

Trie storage will construct database key from `ShardUId` and hash of the trie node.

Move SimpleNIghtshade from nightly to stable. This feature enables the transition from one shard to four shards. For more details, see https://near.org/blog/near-launches-simple-nightshade-the-first-step-towards-a-sharded-blockchain/ and near/NEPs#241

frol · 2022-09-05T05:04:00Z

@mzhangmzz @bowenwang1996 Is it ready to be merged?

bowenwang1996 · 2022-09-12T15:43:12Z

This was implemented more than a year ago so let's merge the pull request. @mzhangmzz could you resolve conflicts and merge the PR? Thanks!

first version

85763ae

complete and proof read except for garbage collection and future poss…

610bf80

…ibilities

bowenwang1996 reviewed Jul 20, 2021

View reviewed changes

address some comments

a676262

mzhangmzz requested review from EgorKulikov, abacabadabacaba, Longarithm and MaksymZavershynskyi July 20, 2021 17:39

Min Zhang added 2 commits July 20, 2021 07:51

fix typo

6cb7bd3

fix indentation

bb07e10

Longarithm approved these changes Jul 20, 2021

View reviewed changes

mzhangmzz mentioned this pull request Jul 21, 2021

rename ShardId to ShardOrd near/nearcore#4547

Closed

bowenwang1996 reviewed Jul 22, 2021

View reviewed changes

EgorKulikov approved these changes Jul 22, 2021

View reviewed changes

This was referenced Jul 22, 2021

Rename ShardId to ShardOrd and shard_id to shard_ord near/nearcore#4553

Closed

Rename ShardId to ShardOrd and shard_id to shard_ord near/nearcore#4554

Closed

Min Zhang added 2 commits July 27, 2021 21:22

change the design of how shard layout changes according to protocol v…

b37cdd5

…ersion. Addressed some comments but not all. Still need to finish the Future section

reorganized the part about sharding representation

cf2e76e

bowenwang1996 reviewed Jul 29, 2021

View reviewed changes

mzhangmzz mentioned this pull request Aug 2, 2021

Implements ShardLayout and ShardConfig near/nearcore#4614

Merged

mzhangmzz mentioned this pull request Aug 10, 2021

Change ShardTracker interface near/nearcore#4668

Merged

mzhangmzz added a commit to near/nearcore that referenced this pull request Aug 14, 2021

Implements ShardLayout and ShardConfig (#4614)

df2a421

This changeset implements ShardLayout and ShardConfig in near/NEPs#241

update: remove ShardOrd and add ShardUId: not finished yet

b692aae

mzhangmzz mentioned this pull request Aug 15, 2021

Implement ShardUId near/nearcore#4659

Merged

bowenwang1996 reviewed Aug 16, 2021

View reviewed changes

specs/Proposals/0040-split-states.md Outdated Show resolved Hide resolved

bowenwang1996 reviewed Sep 11, 2021

View reviewed changes

Min Zhang added 2 commits September 13, 2021 11:46

some udpates

1efcf82

address comments

ba03ba2

bowenwang1996 reviewed Sep 15, 2021

View reviewed changes

mzhangmzz mentioned this pull request Oct 12, 2021

Move SimpleNightshade from nightly to stable near/nearcore#4953

Merged

frol added the WG-protocol Protocol Standards Work Group should be accountable label Sep 5, 2022

Merge branch 'master' into split-states

993af1a

mzhangmzz requested a review from a team as a code owner September 14, 2022 15:36

Min Zhang added 3 commits September 14, 2022 11:37

remove SUMMARY.md

d278f8f

remove inadvertently added files

c3caeb5

fix check

569bb76

mzhangmzz merged commit 9e8603d into master Sep 14, 2022

mzhangmzz deleted the split-states branch September 14, 2022 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEPs on how to split states of shards for resharding #241

NEPs on how to split states of shards for resharding #241

mzhangmzz commented Jul 19, 2021

render bot commented Jul 19, 2021

mzhangmzz commented Jul 19, 2021

bowenwang1996 left a comment

Longarithm left a comment

bowenwang1996 Jul 22, 2021

MaksymZavershynskyi commented Jul 26, 2021

frol commented Jul 29, 2021

mzhangmzz commented Jul 29, 2021 •

edited

Loading

bowenwang1996 left a comment

mzhangmzz commented Jul 29, 2021

bowenwang1996 commented Jul 29, 2021

mzhangmzz commented Jul 29, 2021

mzhangmzz commented Jul 29, 2021 •

edited

Loading

bowenwang1996 commented Jul 30, 2021

frol commented Aug 3, 2021

mzhangmzz commented Aug 3, 2021

bowenwang1996 commented Aug 4, 2021

bowenwang1996 Sep 11, 2021

bowenwang1996 left a comment

bowenwang1996 Sep 15, 2021

bowenwang1996 Sep 15, 2021

frol commented Sep 5, 2022

bowenwang1996 commented Sep 12, 2022

		In `apply_chunks`, after processing each chunk, the state changes in `apply_results` are sorted into changes to new shards.
		At the end, we apply these changes to the new shards.

	Trie storage will contruct database key from `ShardUId` and hash of the trie node.
	Trie storage will construct database key from `ShardUId` and hash of the trie node.

NEPs on how to split states of shards for resharding #241

NEPs on how to split states of shards for resharding #241

Conversation

mzhangmzz commented Jul 19, 2021

render bot commented Jul 19, 2021

mzhangmzz commented Jul 19, 2021

bowenwang1996 left a comment

Choose a reason for hiding this comment

Longarithm left a comment

Choose a reason for hiding this comment

bowenwang1996 Jul 22, 2021

Choose a reason for hiding this comment

MaksymZavershynskyi commented Jul 26, 2021

frol commented Jul 29, 2021

mzhangmzz commented Jul 29, 2021 • edited Loading

bowenwang1996 left a comment

Choose a reason for hiding this comment

mzhangmzz commented Jul 29, 2021

bowenwang1996 commented Jul 29, 2021

mzhangmzz commented Jul 29, 2021

mzhangmzz commented Jul 29, 2021 • edited Loading

bowenwang1996 commented Jul 30, 2021

frol commented Aug 3, 2021

mzhangmzz commented Aug 3, 2021

bowenwang1996 commented Aug 4, 2021

bowenwang1996 Sep 11, 2021

Choose a reason for hiding this comment

bowenwang1996 left a comment

Choose a reason for hiding this comment

bowenwang1996 Sep 15, 2021

Choose a reason for hiding this comment

bowenwang1996 Sep 15, 2021

Choose a reason for hiding this comment

frol commented Sep 5, 2022

bowenwang1996 commented Sep 12, 2022

mzhangmzz commented Jul 29, 2021 •

edited

Loading

mzhangmzz commented Jul 29, 2021 •

edited

Loading