Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial spec for upgradability #64

Merged
merged 6 commits into from Jun 8, 2020
Merged

Initial spec for upgradability #64

merged 6 commits into from Jun 8, 2020

Conversation

ilblackdragon
Copy link
Member

This is outcome of discussions:

And in short provides smooth upgrading that is defined by % of validators switching to new version and reaching consensus within an epoch on that.

@render
Copy link

render bot commented May 12, 2020

specs/SUMMARY.md Outdated Show resolved Hide resolved
@render
Copy link

render bot commented May 12, 2020

Your Render PR Server at https://nomicon-pr-64.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-bqtcomhj38f7norl6jbg.

@ilblackdragon
Copy link
Member Author

@abacabadabacaba suggested from Protocol Research call:

  1. Store latest version that validators are running in EpochManager
  2. Validators indicate their version by resending staking transaction with indicating version
  3. Agreement is achieved based on epoch, where validators achieve some super-majority on a version in proposal - and kicks in the epoch where these validators are active (e.g. if proposals are sent in epoch T, it will become active in T + 2).
  4. Version should be a number, because people should agree on what next version should be first before

I generally, like the (1) & (3).
I don't like (2), because it's both requires API change for Staking Action, and it requires additional step for validators to re-sign new transaction. @SkidanovAlex suggested that Staking / Validator Key should allow to sign staking actions (last time I proposed this, @evgenykuzyakov was against it and overall is complicated, because we then need to filter for only transactions that only have single action and more).

Overall I don't fully agree with (4): even though we expect agreement in what next version will be, if there are any kind of conflicting change that people want to introduce - this mechanism will be the final way to decide. Obviously that with hash of spec reported by binary doesn't mean binary follows that spec or that there are no issues or that there are no issues in the spec. It's more to be able to handle forks in decision making.

Let's review a simple example, there are group of validators who want to increase the minimum gas price, forked the code and pushed a new version. At the same time core dev team pushes new update of protocol with some bells & whistles.
If we use integers, both of these of these will have the same version. It's possible that core team will figure out in time and bump versions, but there is a chance of weird outcome (even if it's TestNet stalling). And this is not validators being malicious - they followed procedure from their side by publishing source code, announcing it, running testnets, etc.

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify that the protocol version X must be backward compatible with at least protocol version X-1.

specs/ChainSpec/Upgradability.md Outdated Show resolved Hide resolved
specs/ChainSpec/Upgradability.md Show resolved Hide resolved
@abacabadabacaba
Copy link
Collaborator

My opinion:

  1. I'm still strongly in favor of consensus version being a plain number rather than a hash. Moreover, the only supported upgrade path should be from version i to version i+1. Consensus upgrade is very bug-prone, and having only one possible scenario to test should reduce the number of possible corner cases.
  2. If an upgrade that is only supported by some implementations gets activated on the mainnet, this will be a significant trouble for validators using other implementations. Thus, activating an upgrade that is not supported by implementations used by almost all validators should be avoided if possible.
  3. We want to have a way for validators to specify which consensus version they'd like to switch to. This shouldn't depend on them actually producing blocks (for example, a user who is going to become a validator in a future epoch should be able to do this). Having to resubmit a staking transaction is not ideal because it needs to be signed with an account key. I think that this shouldn't be a big problem in practice, because upgrades are not that frequent. In the future, we may implement a way to sign this kind of updates with a staking key instead of an account key.
  4. While we are at it, we may add a field for validators to specify which software they are running. This will have no effect on consensus, but we want to know what fraction of validators is running each implementation.
  5. Regarding switch condition: switching as soon as 80% of validators upgrade will do a very bad service to the remaining 20%, as they will need to upgrade in a hurry. I propose to switch only after at least 80% (or even 90%) of validators support the upgrade in each of the last 10 (or even 20) epochs.
  6. Also, even if a validator is running a new version of the software, they should have a configuration option to not propose an upgrade to a new consensus version.
  7. Regarding data structures: current version number should be the first field of the block header. This is because interpretation of any subsequent fields may change depending on the version. It should appear in the highest level part of the block header, that is, the part hash of which is the block hash. Also, version number should be the first field in approval messages. Version number of approvals must match version number of the block they appear in.
  8. Transactions should also be versioned. Version number of transactions doesn't need to match version number of blocks. When we introduce new transaction version, we should keep accepting the previous version for some time.

Let's review a simple example, there are group of validators who want to increase the minimum gas price, forked the code and pushed a new version. At the same time core dev team pushes new update of protocol with some bells & whistles.

It would be very irresponsible for the validators to do this without any testing and coordination with the rest of the participants. Imagine what happens if a group of companies running some of the Internet core infrastructure decide to change something in TCP/IP protocol. They fork the software their infrastructure is running and roll out a new version before telling anyone else. Would it work? I don't think so. In our case, upgrading the consensus would require agreement of a supermajority of the validators, and this includes an agreement to not do any conflicting upgrade in parallel, which means that such situation with two upgrades happening at the same time shouldn't be possible.

@lexfro
Copy link

lexfro commented May 18, 2020

Moreover, the only supported upgrade path should be from version i to version i+1.

Interesting. It changes my initial intuition from the "voting for some specific update" to the "voting for (some) update".. Which actually makes sense to me from that perspective (especially b/c there should be only one canonical chain and since validators are not voting for some specific binary(es)).

@bowenwang1996
Copy link
Collaborator

While we are at it, we may add a field for validators to specify which software they are running. This will have no effect on consensus, but we want to know what fraction of validators is running each implementation.

We can add it to telemetry. Forcing it as part of the protocol doesn't make a lot of sense.

@bowenwang1996
Copy link
Collaborator

Regarding switch condition: switching as soon as 80% of validators upgrade will do a very bad service to the remaining 20%, as they will need to upgrade in a hurry. I propose to switch only after at least 80% (or even 90%) of validators support the upgrade in each of the last 10 (or even 20) epochs.

I don't think it's really a problem. In practice I don't see upgrades happening in two consecutive epochs, but it also makes sense to be more cautious.

@ilblackdragon
Copy link
Member Author

  1. I'm still strongly in favor of consensus version being a plain number rather than a hash. Moreover, the only supported upgrade path should be from version i to version i+1. Consensus upgrade is very bug-prone, and having only one possible scenario to test should reduce the number of possible corner cases.

Yeah, that's fine.

  1. If an upgrade that is only supported by some implementations gets activated on the mainnet, this will be a significant trouble for validators using other implementations. Thus, activating an upgrade that is not supported by implementations used by almost all validators should be avoided if possible.

Well, that was the point of using this as a final coordination mechanism. Social consensus is great and should be used as much as possible, but you also want to make sure to have final stops.

TCP/IP is a bad example, because it's been 10+ years as they have been trying to update. We want to update every month the slowest.

  1. While we are at it, we may add a field for validators to specify which software they are running. This will have no effect on consensus, but we want to know what fraction of validators is running each implementation.

This is already in telemetry - https://explorer.betanet.near.org/nodes/validators

Regarding switch condition: switching as soon as 80% of validators upgrade will do a very bad service to the remaining 20%, as they will need to upgrade in a hurry

The condition is 80% of validators indicated a switch, epoch after next will activate it. So the rest of validators have epoch+some to switch. We can add a bit more timing, but I also don't want to give infinite amount of time here, because in case of security updates etc - we want this to be timely.

@bowenwang1996
Copy link
Collaborator

@ilblackdragon

security updates

We can discuss this separately but I feel like for critical vulnerabilities I am not entirely sure that using this kind of voting is the best idea.

@ilblackdragon
Copy link
Member Author

Quick recap from research meeting:

  • If we can, we should try to avoid adding new actions. At this point doing upgradability voting via smart contracts though is way too complicated.
  • We need a way to determine version for block/chunk when deserialized. It might be possible to determine this based on current height. But it might be hard to do given the deserialization calls.
  • Because recommended way to upgrade nodes is by starting a new node with new validator key and then re-issuing staking transaction with it. Hence we expect validators to already re-issue staking transaction.
  • It would be still great if staking transaction can be signed by signed key - but it will be complicated to implement at the moment, because verifier will need to have access to EpochManager.

@render
Copy link

render bot commented May 20, 2020

Your Render PR Server at https://nomicon-pr-64.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-bqtcomhj38f7norl6jbg.

@render
Copy link

render bot commented May 20, 2020

A deploy for your Render PR Server at https://nomicon-pr-64.onrender.com just failed.

View details on your dashboard at https://dashboard.render.com/static/srv-bqtcomhj38f7norl6jbg.

@render
Copy link

render bot commented May 20, 2020

A deploy for your Render PR Server at https://nomicon-pr-64.onrender.com just failed.

View details on your dashboard at https://dashboard.render.com/static/srv-bqtcomhj38f7norl6jbg.

@ilblackdragon
Copy link
Member Author

ilblackdragon commented May 20, 2020

Want to follow up and close the thread on upgradability spec.

Specifically, I went back and forth, and even though @abacabadabacaba idea of using proposals to indicate next version makes sense - it’s more complicated to implement, more complicated for validators and doesn’t really change that much the behavior.

It’s worth noting with more chunk producers (more shards) and hidden validators there indeed can be problem that if only block producers upgraded there is still large % of stake didn’t update yet.
But I think adding enough buffer should mitigate this issue and still at the end upgradability is about more on informing everyone on time about update and making smooth transition that doesn’t require state dump.

Given that, I kept the version for indicating about upcoming transition in block header, but now it aggregates total stake voting for the version and it checks K epochs before the current one to give everyone time.

See details here: https://github.com/nearprotocol/NEPs/blob/3748234118ea504453b4eabb7eeee17cd978e9ba/specs/ChainSpec/Upgradability.md#consensus

Additionally, I added example of how we can handle quite a few updates to data structures like this: https://github.com/nearprotocol/NEPs/blob/3748234118ea504453b4eabb7eeee17cd978e9ba/specs/ChainSpec/Upgradability.md#versioned-data-structures We will need to work more on this as we actually get real examples of protocol changes.

@render
Copy link

render bot commented May 20, 2020

A deploy for your Render PR Server at https://nomicon-pr-64.onrender.com just failed.

View details on your dashboard at https://dashboard.render.com/static/srv-bqtcomhj38f7norl6jbg.

specs/ChainSpec/Upgradability.md Outdated Show resolved Hide resolved
Co-authored-by: Bowen Wang <bowenwang1996@users.noreply.github.com>
@render
Copy link

render bot commented May 21, 2020

A deploy for your Render PR Server at https://nomicon-pr-64.onrender.com just failed.

View details on your dashboard at https://dashboard.render.com/static/srv-bqtcomhj38f7norl6jbg.

@bowenwang1996
Copy link
Collaborator

@ilblackdragon given the data structure you proposed

enum VersionedBlockHeader {
    BlockHeaderV1(BlockHeaderV1),
    /// Current version, where `BlockHeader` is used internally for all operations.
    BlockHeaderV2(BlockHeader),
}

when a node with the new version enters the network, how do the other nodes (with old version) process the header produced by the new version? It seems that there has to be some logic to produce the block in the old version with the new version number.

@ilblackdragon
Copy link
Member Author

Node running new binary will keep producing old block headers until the version transitions to the new one.
I'm trying to write this out near/nearcore#2701 here to nail down better how it should look like. Probably some traits will be used.

struct BlockHeaderInnerRest {
...
/// Latest version that current producing node binary is running on.
version: ProtocolVersion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking for my understanding. This field represents the highest version number the node producing the block could support, but not the version of the block header itself, correct? The actual version of the block header would be determined based on the enum variant it deserializes to.

If that is the case I wonder if a different name should be used. Say proposed_version or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I used latest_protocol_version in implementation here - near/nearcore#2701

@ilblackdragon
Copy link
Member Author

So there is a small alternation that makes it close to what @abacabadabacaba proposed, while allowing block producer to automatically do this. Instead of adding a field to the BlockHeader, can extend ValidatorStake to contain latest_protocol_version.

While by default Proposals from Stake transactions won't fill this field (until we change Stake action), we can extend validity of proposal if the chunk producer adds one for themself (one is important note to make sure they can't create large number of proposals).

Note my self: make sure our proposals code validation checks invalid for proposals.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented May 25, 2020

@ilblackdragon when do chunk producers add it? Also this is not future-proof. What do we do when we have hidden validators?

@ilblackdragon
Copy link
Member Author

@bowenwang1996 Chunk producers add it when it's their turn to produce chunk and it's the first time they produce a chunk since their binary version changed.

Hidden validators don't participate in either voting scheme, so the latest proposal is strictly more inclusive than original one (as it includes chunk producers on top of only block producers in the original proposal).

@bowenwang1996
Copy link
Collaborator

so the latest proposal is strictly more inclusive than original one

Currently they are the same. If you are talking about future possibilities, you are assuming that every block producer is a chunk producer.

@ilblackdragon ilblackdragon merged commit 7ca48ed into master Jun 8, 2020
@ilblackdragon ilblackdragon deleted the upgradability branch June 8, 2020 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants