Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk and block producer selection #167

Merged
merged 4 commits into from
Nov 21, 2022
Merged

Conversation

birchmd
Copy link
Contributor

@birchmd birchmd commented Mar 5, 2021

Summary

Write up of the new chunk and block producer selection algorithms for Simple Nightshade. Based on the discussion on the forum. Fixes #156

Motivation

Simple Nightshade is a stepping stone toward the full sharding solution. In Simple Nightshade block producers track all shards (because challenges are not ready yet), and there is a new role "chunk-only producers" which track just one shard and only produce chunks. The purpose of this new role is to maintain decentralization since block producers will need to run more expensive hardware to track all shards, and this is not accessible to everyone.

Therefore, we need to specify how chunk and block producers get selected taking into account the separation between chunk and block producers.

Guide-level explanation

There are two separate proposal sets: one for block producers and one for chunk-only producers. This ensures that nodes which are not running the proper hardware to track all shards will never accidentally become a block producer. The top N (in terms of stake) proposals from the block producers proposals become the block producers in the next epoch. The top N + M proposals from the combined chunk-only and block producer set become chunk producers (note: this means block producers are also chunk producers, hence the need for the chunk-only nomenclature when referencing validators which are not block producers). The chunk producers are divided (approximately) evenly between all shards. At each height, a specific block producer is chosen at random (weighted by their stakes) to produce the block for that height. Additionally, a random chunk producer (weighted by stake) is chosen in each shard to produce the chunk for that shard at that height. We also enforce the condition that the block producer at h + 1 will produce the chunk for some shard at h (to reduce network overhead).

Rewards given to validators are also separated into two pools: one for block producers and one for chunk-only producers. A fraction f of the total rewards is given to the block producers, and the remainder is given to the chunk-only producers. Within each pool the rewards are split in proportion to stake. f should be greater than 1/2 since block producers have more expensive hardware.

Reference-level explanation

All algorithms are given in full detail in the changes to the spec done in this PR.


```python
# Concatenates the bytes of the epoch seed with the height,
# then computes the sha256 hash.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a cryptographically secure hash here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is needed so that it becomes computationally infeasible to construct an epoch_seed which gives rise to a pre-defined block/chunk producer order. Otherwise, an attacker may try to influence the VRF entropy in order to get a particular sequence, e.g. to have members of a byzantine cartel make many blocks in a row to cause some problems in the network.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite difficult in practice. Given that each block producer has one bit of influence on the vrf output, if an attacker wants to achieve something meaningful, they need to have a large amount of stake that would allow them to initiate other attacks more easily. In addition, you cannot predict when the epoch is going to until shortly before it ends and therefore, it is quite difficult for an attacker to influence the vrf output of the last block of an epoch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each block producer has one bit of influence on the vrf output

It's one bit per block right? And since BPs will be chosen in proportion to their stake this means they have influence proportional to their stake as well. E.g. if a cartel had 1/8 (12.5%) of the total stake then they could influence ~32 bits in the next epoch seed. Obviously this would not be enough to control the entire sequence of BPs in the next epoch, but I assume it means it would be possible to control a non-trivial sub-sequence (say 10 blocks long), where that sub-sequence my occur at any point during the next epoch (i.e. I don't think you could control the first 10 blocks for example, that is too narrow, but controlling some run of 10 blocks seems feasible). I haven't done any experiment to try this for myself, this is only my intuition, and so may not be at all correct.

That is quite difficult in practice

I agree. Though, I view that argument as falling in the category of "security by obscurity", which I don't think is a very strong guarantee. If something is hard then there will need to be a large incentive for someone to attempt it. But on the blockchain, especially a successful one, there do tend to be large incentives, and it might be that someone thinks the payout for pulling off a difficult attack is worth the effort.

Is your main concern with using a cryptographic hash performance? If so, we do not need to use sha256. We could use a faster cryptographic hash function; I only chose sha256 because it is what we use everywhere else in the system.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually using sha256 is fine. We can just cache the result

# Ensure the block producer for the next block also produces one of the shards.
# `bp` could already be in the result because block producers are also
# chunk producers (see algorithm for selecting chunk producers from proposals).
if bp not in result:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this is a good idea. Block producers need to track all shards and only reducing the latency on one shard is not going to help a lot. Plus, this can result in some weird dynamics where a chunk only producer for some shard may not get an opportunity to produce any chunk, thereby losing their reward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm fine to drop this if we think it does more harm than good.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Jul 10, 2021

@abacabadabacaba pointed out that with this change, the chunk part distribution is not proportional to stakes. Rather, every block producer is treated equally and it may be problematic. I did some research and found the following: if an attacker want to break data availability they need to corrupt at least 2/3 of the total number of validators, which is ~67 validators with the current config. Today there are 58 validators and it makes sense to assume that they will continue being validators after this change, which means that at least the bottom 25 validators need to be corrupted. That already amounts to 87m NEAR. For the rest 42 validators, if we assume an average stake of 1m (which would imply that the threshold is likely much lower than 1m), this means that the total amount of stake that needs to be corrupted is 115m NEAR, which is about 28% of the total stake today. Regardless of percentage, it is hardly reasonable to think that an attacker can amass that much NEAR to perform such an attack. Another way to think about it is to assume that the maximum amount of validators is what we have today (58). In that case, even if chunk parts are distributed equally, to corrupt 2/3 of the total number of validators requires about 52% of the total stake we have today and if someone controls that much stake, they can easily stall the network. As a result, I don't think this change will have any material impact on the security of the protocol.

near-bulldozer bot pushed a commit to near/nearcore that referenced this pull request Sep 8, 2021
Chunk-only producers are an important stepping stone towards sharding in mainnet. See https://gov.near.org/t/block-and-chunk-producer-selection-algorithm-in-simple-nightshade/66 for more details. Also see near/NEPs#167 for the spec this work is based on.

This PR does most of the work towards landing this feature. Much of the work with this PR was updating tons of tests because they were using the assumption that validators produce blocks/chunk in a cyclic order. That is no longer true because the randomness is done on the fly at each height instead of when processing the proposals.

This PR is not yet suitable for merging to master; missing items are listed below:

- [ ] Nayduck failures (looks like some tests are failing -- http://nayduck.eastus.cloudapp.azure.com:3000/#/run/1452)
- [ ] Writing a new pytest to see this feature working end-to-end. This PR adds some new tests, and fixes a lot of old tests, so it probably works, but it's always nice to see an integration test.

List of (possible) Nayduck failures to be addressed:
```
expensive nearcore test_rejoin test::test_4_20_kill1_two_shards	
pytest sanity/one_val.py	
pytest sanity/rpc_state_changes.py	
pytest sanity/staking2.py	
pytest sanity/staking_repro1.py	
pytest sanity/state_sync2.py	
pytest sanity/sync_chunks_from_archival.py	
pytest stress/stress.py 3 3 3 0 staking transactions local_network packets_drop	
pytest stress/stress.py 3 3 3 0 staking transactions node_restart packets_drop	
pytest stress/stress.py 3 3 3 0 staking transactions node_restart wipe_data	
pytest sanity/gc_after_sync.py
pytest sanity/gc_sync_after_sync.py swap_nodes
```
@frol frol added the WG-protocol Protocol Standards Work Group should be accountable label Sep 5, 2022
@bowenwang1996
Copy link
Collaborator

Already implemented. @mm-near @mzhangmzz could we merge this PR?

@frol frol added the S-approved A NEP that was approved by a working group. label Sep 29, 2022
@frol
Copy link
Collaborator

frol commented Sep 29, 2022

I am adding S-approved tag to just indicate that it was implicitly approved through the implementation. Yet, if there are any changes still needed to reflect the implementation, it would be great to fix it before merging.

@ori-near ori-near added the A-NEP A NEAR Enhancement Proposal (NEP). label Oct 13, 2022
@frol frol requested a review from a team as a code owner November 21, 2022 13:47
Copy link
Collaborator

@frol frol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As NEPs moderator, I am approving this PR based on the previous discussion.

@frol frol merged commit 111c56f into near:master Nov 21, 2022
frol pushed a commit that referenced this pull request Jan 11, 2023
#167 is merged but the final implementation we went for chunk-only
producers actually diverged from what was described there. This PR
updates the block/chunk producers section so our documentation is up to
date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-NEP A NEAR Enhancement Proposal (NEP). S-approved A NEP that was approved by a working group. WG-protocol Protocol Standards Work Group should be accountable
Projects
Status: APPROVED NEPs
Development

Successfully merging this pull request may close these issues.

Design new block and chunk producer selection process
4 participants