Chunk and block producer selection #167

birchmd · 2021-03-05T15:53:32Z

Summary

Write up of the new chunk and block producer selection algorithms for Simple Nightshade. Based on the discussion on the forum. Fixes #156

Motivation

Simple Nightshade is a stepping stone toward the full sharding solution. In Simple Nightshade block producers track all shards (because challenges are not ready yet), and there is a new role "chunk-only producers" which track just one shard and only produce chunks. The purpose of this new role is to maintain decentralization since block producers will need to run more expensive hardware to track all shards, and this is not accessible to everyone.

Therefore, we need to specify how chunk and block producers get selected taking into account the separation between chunk and block producers.

Guide-level explanation

There are two separate proposal sets: one for block producers and one for chunk-only producers. This ensures that nodes which are not running the proper hardware to track all shards will never accidentally become a block producer. The top N (in terms of stake) proposals from the block producers proposals become the block producers in the next epoch. The top N + M proposals from the combined chunk-only and block producer set become chunk producers (note: this means block producers are also chunk producers, hence the need for the chunk-only nomenclature when referencing validators which are not block producers). The chunk producers are divided (approximately) evenly between all shards. At each height, a specific block producer is chosen at random (weighted by their stakes) to produce the block for that height. Additionally, a random chunk producer (weighted by stake) is chosen in each shard to produce the chunk for that shard at that height. We also enforce the condition that the block producer at h + 1 will produce the chunk for some shard at h (to reduce network overhead).

Rewards given to validators are also separated into two pools: one for block producers and one for chunk-only producers. A fraction f of the total rewards is given to the block producers, and the remainder is given to the chunk-only producers. Within each pool the rewards are split in proportion to stake. f should be greater than 1/2 since block producers have more expensive hardware.

Reference-level explanation

All algorithms are given in full detail in the changes to the spec done in this PR.

…gov.near.org/t/block-and-chunk-producer-selection-algorithm-in-simple-nightshade/66

bowenwang1996 · 2021-03-05T17:53:04Z

specs/ChainSpec/SelectingBlockProducers.md

+
+```python
+# Concatenates the bytes of the epoch seed with the height,
+# then computes the sha256 hash.


Why do we need a cryptographically secure hash here?

I think it is needed so that it becomes computationally infeasible to construct an epoch_seed which gives rise to a pre-defined block/chunk producer order. Otherwise, an attacker may try to influence the VRF entropy in order to get a particular sequence, e.g. to have members of a byzantine cartel make many blocks in a row to cause some problems in the network.

That is quite difficult in practice. Given that each block producer has one bit of influence on the vrf output, if an attacker wants to achieve something meaningful, they need to have a large amount of stake that would allow them to initiate other attacks more easily. In addition, you cannot predict when the epoch is going to until shortly before it ends and therefore, it is quite difficult for an attacker to influence the vrf output of the last block of an epoch.

each block producer has one bit of influence on the vrf output

It's one bit per block right? And since BPs will be chosen in proportion to their stake this means they have influence proportional to their stake as well. E.g. if a cartel had 1/8 (12.5%) of the total stake then they could influence ~32 bits in the next epoch seed. Obviously this would not be enough to control the entire sequence of BPs in the next epoch, but I assume it means it would be possible to control a non-trivial sub-sequence (say 10 blocks long), where that sub-sequence my occur at any point during the next epoch (i.e. I don't think you could control the first 10 blocks for example, that is too narrow, but controlling some run of 10 blocks seems feasible). I haven't done any experiment to try this for myself, this is only my intuition, and so may not be at all correct.

That is quite difficult in practice

I agree. Though, I view that argument as falling in the category of "security by obscurity", which I don't think is a very strong guarantee. If something is hard then there will need to be a large incentive for someone to attempt it. But on the blockchain, especially a successful one, there do tend to be large incentives, and it might be that someone thinks the payout for pulling off a difficult attack is worth the effort.

Is your main concern with using a cryptographic hash performance? If so, we do not need to use sha256. We could use a faster cryptographic hash function; I only chose sha256 because it is what we use everywhere else in the system.

Actually using sha256 is fine. We can just cache the result

bowenwang1996 · 2021-03-05T17:57:19Z

specs/ChainSpec/SelectingBlockProducers.md

+# Ensure the block producer for the next block also produces one of the shards.
+# `bp` could already be in the result because block producers are also
+# chunk producers (see algorithm for selecting chunk producers from proposals).
+if bp not in result:


I am not sure whether this is a good idea. Block producers need to track all shards and only reducing the latency on one shard is not going to help a lot. Plus, this can result in some weird dynamics where a chunk only producer for some shard may not get an opportunity to produce any chunk, thereby losing their reward.

Sure, I'm fine to drop this if we think it does more harm than good.

bowenwang1996 · 2021-07-10T00:34:01Z

@abacabadabacaba pointed out that with this change, the chunk part distribution is not proportional to stakes. Rather, every block producer is treated equally and it may be problematic. I did some research and found the following: if an attacker want to break data availability they need to corrupt at least 2/3 of the total number of validators, which is ~67 validators with the current config. Today there are 58 validators and it makes sense to assume that they will continue being validators after this change, which means that at least the bottom 25 validators need to be corrupted. That already amounts to 87m NEAR. For the rest 42 validators, if we assume an average stake of 1m (which would imply that the threshold is likely much lower than 1m), this means that the total amount of stake that needs to be corrupted is 115m NEAR, which is about 28% of the total stake today. Regardless of percentage, it is hardly reasonable to think that an attacker can amass that much NEAR to perform such an attack. Another way to think about it is to assume that the maximum amount of validators is what we have today (58). In that case, even if chunk parts are distributed equally, to corrupt 2/3 of the total number of validators requires about 52% of the total stake we have today and if someone controls that much stake, they can easily stall the network. As a result, I don't think this change will have any material impact on the security of the protocol.

Chunk-only producers are an important stepping stone towards sharding in mainnet. See https://gov.near.org/t/block-and-chunk-producer-selection-algorithm-in-simple-nightshade/66 for more details. Also see near/NEPs#167 for the spec this work is based on. This PR does most of the work towards landing this feature. Much of the work with this PR was updating tons of tests because they were using the assumption that validators produce blocks/chunk in a cyclic order. That is no longer true because the randomness is done on the fly at each height instead of when processing the proposals. This PR is not yet suitable for merging to master; missing items are listed below: - [ ] Nayduck failures (looks like some tests are failing -- http://nayduck.eastus.cloudapp.azure.com:3000/#/run/1452) - [ ] Writing a new pytest to see this feature working end-to-end. This PR adds some new tests, and fixes a lot of old tests, so it probably works, but it's always nice to see an integration test. List of (possible) Nayduck failures to be addressed: ``` expensive nearcore test_rejoin test::test_4_20_kill1_two_shards pytest sanity/one_val.py pytest sanity/rpc_state_changes.py pytest sanity/staking2.py pytest sanity/staking_repro1.py pytest sanity/state_sync2.py pytest sanity/sync_chunks_from_archival.py pytest stress/stress.py 3 3 3 0 staking transactions local_network packets_drop pytest stress/stress.py 3 3 3 0 staking transactions node_restart packets_drop pytest stress/stress.py 3 3 3 0 staking transactions node_restart wipe_data pytest sanity/gc_after_sync.py pytest sanity/gc_sync_after_sync.py swap_nodes ```

bowenwang1996 · 2022-09-28T22:49:18Z

Already implemented. @mm-near @mzhangmzz could we merge this PR?

frol · 2022-09-29T14:56:08Z

I am adding S-approved tag to just indicate that it was implicitly approved through the implementation. Yet, if there are any changes still needed to reflect the implementation, it would be great to fix it before merging.

…ring

…ectingBlockProducers.md

frol

As NEPs moderator, I am approving this PR based on the previous discussion.

#167 is merged but the final implementation we went for chunk-only producers actually diverged from what was described there. This PR updates the block/chunk producers section so our documentation is up to date.

Chunk and block producer selection based on forum discussion https://…

a9eadef

…gov.near.org/t/block-and-chunk-producer-selection-algorithm-in-simple-nightshade/66

birchmd requested review from bowenwang1996 and SkidanovAlex March 5, 2021 16:17

bowenwang1996 reviewed Mar 5, 2021

View reviewed changes

bowenwang1996 requested a review from mikhailOK March 5, 2021 18:00

This was referenced Mar 5, 2021

Block producer selection algorithm (#76) #83

Closed

Implement new block producer selection near/nearcore#2880

Closed

birchmd mentioned this pull request Mar 15, 2021

Versioning for ValidatorStake near/nearcore#4115

Merged

birchmd mentioned this pull request Apr 2, 2021

feat(chain): Chunk Only Producers near/nearcore#4193

Merged

2 tasks

bowenwang1996 mentioned this pull request Sep 10, 2021

Fuzz test the new validator selection algorithm near/nearcore#4813

Closed

mzhangmzz mentioned this pull request Jun 23, 2022

make reward calculator support chunk only producer near/nearcore#7105

Closed

frol added the WG-protocol Protocol Standards Work Group should be accountable label Sep 5, 2022

frol added the S-approved A NEP that was approved by a working group. label Sep 29, 2022

ori-near added the A-NEP A NEAR Enhancement Proposal (NEP). label Oct 13, 2022

frol added 2 commits November 21, 2022 14:44

Merge remote-tracking branch 'origin/master' into 156-bps-and-cops

65d73e0

fix: Fixed TeX formulas syntax to be compatible with Docusaurus rende…

837b2fb

…ring

frol requested a review from a team as a code owner November 21, 2022 13:47

fix: Fixed the broken links from Economics/README.md to ChainSpec/Sel…

e545d6c

…ectingBlockProducers.md

frol approved these changes Nov 21, 2022

View reviewed changes

frol merged commit 111c56f into near:master Nov 21, 2022

mzhangmzz mentioned this pull request Dec 21, 2022

Update SelecitonBlockProducers.md to be up to date #445

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk and block producer selection #167

Chunk and block producer selection #167

birchmd commented Mar 5, 2021 •

edited

Loading

bowenwang1996 Mar 5, 2021

birchmd Mar 5, 2021

bowenwang1996 Mar 8, 2021

birchmd Mar 8, 2021

bowenwang1996 Mar 8, 2021

bowenwang1996 Mar 5, 2021

birchmd Mar 5, 2021

bowenwang1996 commented Jul 10, 2021 •

edited

Loading

bowenwang1996 commented Sep 28, 2022

frol commented Sep 29, 2022

frol left a comment

Chunk and block producer selection #167

Chunk and block producer selection #167

Conversation

birchmd commented Mar 5, 2021 • edited Loading

Summary

Motivation

Guide-level explanation

Reference-level explanation

bowenwang1996 Mar 5, 2021

Choose a reason for hiding this comment

birchmd Mar 5, 2021

Choose a reason for hiding this comment

bowenwang1996 Mar 8, 2021

Choose a reason for hiding this comment

birchmd Mar 8, 2021

Choose a reason for hiding this comment

bowenwang1996 Mar 8, 2021

Choose a reason for hiding this comment

bowenwang1996 Mar 5, 2021

Choose a reason for hiding this comment

birchmd Mar 5, 2021

Choose a reason for hiding this comment

bowenwang1996 commented Jul 10, 2021 • edited Loading

bowenwang1996 commented Sep 28, 2022

frol commented Sep 29, 2022

frol left a comment

Choose a reason for hiding this comment

birchmd commented Mar 5, 2021 •

edited

Loading

bowenwang1996 commented Jul 10, 2021 •

edited

Loading