Updates Part I: node auto-update #32
Nov 12, 2020
I would separate this into two SMIPs, since there are actually two independent features here. The first is dealing with automatic node updates, and the second with protocol updates.
The node auto-update procedure is agnostic to the update contents --- it consists of what you called "Phase I", but doesn't involve beacons or anything protocol-related. This is the mechanism that will also be used for emergency updates. The node auto-update should have a minimal grace period (during which the node operator can decide to veto an update) even for emergency updates. The reason is that veto power is intended, among other things, to mitigate an attack by an adversary who with access to the update signing key. In this case, the adversary could always claim the update is an emergency update, if that sidesteps the grace period.
I think it makes sense to have a much shorter grace period during the initial phase of mainnet (when emergency updates are much more likely), but in version 1.0 it should be long enough to allow for human response to an attack. (After 1.0, if there's a need for a faster emergency update, we'll have to rely on getting the word out to enough node operators who will override the grace period manually).
The protocol update procedure describes the rules for when (and if) to switch to a new protocol version, based on the consensus about intent to update. This mechanism doesn't really care how the node is updated; it's relevant even if there are no automatic node updates at all. Essentially, the code for executing an updated protocol will always include the code for executing the current version of the protocol, and a decision mechanism that determines when to switch versions. (This would be the case even if the decision mechanism does not check on-mesh consensus --- e.g., "always switch to the new protocol version at layer 5000").
Nov 12, 2020
Hi @tal-m, thanks for the feedback.
I generally agree with this. I think we can and probably should separate them as you propose. As written they are not completely independent since, e.g., I included the "protocol version signature" message in "Phase I", which is the vote that "Phase II" relies on.
Also agree with your points on grace period and veto power.
Nov 24, 2020
Looks good generally, some comments,
Nov 24, 2020
I thought about it more like a shortcut to passing a bunch of other CLI flags (e.g.,
Well, they're just sort of fundamentally different things. E.g., no P2P protocol negotiation should be necessary between two nodes if one of them installs a new version of go-spacemesh that doesn't touch the P2P code.
This is tricky, I think we should not try to build this for 0.2. Downloading over HTTP is much easier.
This isn't a bad idea. Especially if we expect that most users will be running smapp early on, and that smapp will be able to handle auto-updating go-spacemesh.
Nov 25, 2020
Most of these comments are minor, but I think the last point is very important.
Nov 25, 2020
I'm having some trouble understanding this. I guess I don't think about protocol updates as discrete "proposals" that can be adopted or not adopted independent of one another. In many cases, one proposal will depend on another and there may be a complex web of interdependencies. That's why I think it's better to think of a particular instantiation of the protocol as monolithic, and give it a unique, meaningless ID, like a hash or something.
It makes upgrades much easier. We know that, by a known point in time, all nodes running a particular, old version will have reached their "end-of-support" date and shut down (unless the user modified the source code and recompiled). Zcash has been using this to great success, see:
We can debate the exact right number but my gut tells me around 72 hours. 24 feels too short because someone could, e.g., be on a long flight (or, you know, a meditation retreat ;) for that long and "not get the memo."
I agree strongly with the case you made on the call, @noamnelke: we should strive to operate the testnet with identical parameters to mainnet wherever possible.
Agree. If you're explicitly trusting a particular developer to notify you of updates, and to provide you with signed updates (see #36), that's already "centralized." You can always choose to "track another updates channel" (to borrow Linux terminology).
Very good point. We could maybe key this on one of the existing beacons to do it securely and in an unpredictable fashion. As long as the upgrade happens before the protocol activation layer height, which of course all nodes do need to agree on!
Jan 19, 2022
I'm a bit confused about the big amount of issues for kinda coupled things.
I've posted some of my thoughts (mainly related to Smapp, but not at all) here: #34 (comment) Please check it out.
In the rest, I have some thoughts specifically about updating the Node itself.
I think this is one of the most important things that determines how we deliver updates. Since we have signed apps we already got a centralization — only we can introduce a new version. So in this case I don't see a reason to gossip about updates through p2p.
Gradual update of nodes
In case the Nodes will notify the network about the version that is used, I think we can try to make a gradual update by using the highest byte of
@lrettig can you sum up what is the benefits of such a solution?
I think this is a good idea. But:
Until we're a centralized source of updates, we can just check for the "patch" part in the semver and install it much faster than others. Have a 24-72 hours grace period for minor updates and some kind of "wait until NNN000 layer" (I mean next layer number that ends up with some zeroes) for major.
Auto-update flag and changing the mind
First of all, since the recommended way is to turn on auto-updates, I propose to name such flag as
Feb 4, 2022
It's not true that "only we can introduce a new version." We need to allow people to fork our code and offer competing versions. There's nothing enshrined or special about the software released by the Spacemesh team, other than the fact that we're releasing the first version.
This is a clever idea! I like the idea that not all nodes auto-update at the same time. I think we can work around the vulnerability you describe by using the highest byte to pick an update time, and remove the notion of a queue or of checking how many other nodes have already updated. E.g., cause nodes to auto-upgrade over a period of 24 hrs, and the exact time they perform the update within that window depends on their ID.
I answered this above:
All of this only applies to auto-updates. Node operators always have the option of manually installing updates without waiting for an auto-update or a grace period. In practice, in case of a critical error, we'd need to communicate directly with node operators and ask them to update immediately.
Defaults are important, and I think the default should always be not to auto-update (for governance reasons). We can recommend that users enable auto-update but I think it should be explicit opt-in.
Agree, we can add this to the API, it should be pretty straightforward.
--auto-updateflag. When enabled, this tracks a Spacemesh team-managed beacon that announces when an update is available, and automatically downloads and installs it, and restarts the node.
--testnetflag that sets the right network ID and network/consensus params for the testnet, and turns auto-updates on by default. They can be turned off with
--auto-update=false. This information will be printed in a warning message when the node first starts.
The text was updated successfully, but these errors were encountered: