-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UIP-6: App Version Safeguard #4919
Conversation
The smoke tests are failing, and I can replicate the failure locally:
which is good ol' #1884. It looks like writing to the storage on a fresh node rightly triggers initialization, which causes pd to error out. |
Is there any way to write to non verifiable storage without triggering this? |
This prevents an edge case where PD would crash if starting before the very first genesis. Not touching storage in that case will prevent nodes running continuously from genesis from benefitting from the safeguard, but an upgrade has already happened on mainnet, and so we don't care about not having the safeguard in this case.
Simplifies logic significantly Co-authored-by: @erwanor <erwanor@users.noreply.github.com>
Co-authored-by: @erwanor <erwanor@users.noreply.github.com>
To review, I prepared a local devnet based on current main (without the safeguard logic), ran it for a while, then switched the pd binary to one built from this branch and bounced the service. Then I submitted an upgrade proposal and confirmed it halted the chain. When I attempted
which shows an off-by-one in the "found" version; simple fixed already pushed. Will continue with local testing and follow up with more feedback. |
The `app_version_safeguard` logic extends the APP_VERSION logic with helper functions that translate an APP_VERSION to a software version. Previously, protocol versions increments only required changing the APP_VERSION const (as well as writing a migration), but now developers must also update the version match, otherwise tooling will report an "unknown" version by default. Adding a simple test to confirm the lookup returns a known value.
If an operator runs `pd migrate` and then runs it again, the error message should accurately describe the situation that the local state is already migrated, and state that pd is refusing to proceed as instructed. Notably this failure occurs even if the `--force` flag is provided to `pd migrate`, so I've updated the docstring on that flag accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally by drafting a no-op migration, bumping the APP_VERSION, and confirming behavior. Tacked on a few changes fussing with the err msgs reported, so that the tool is maximally informative to node operators. Very pleased with the behavior, thanks for all the careful thought put into this, @cronokirby!
Describe your changes
This implements UIP 6, creating an "app version safeguard", to try and prevent the wrong version of PD from being started against existing state, or running the wrong version of PD.
This code should be immediately implementable as a non-breaking point-release, which should immediately provide a safeguard against forgetting to upgrade to the next major version of the software before the next migration.
This should happen because nodes running the point release will start writing, to non-consensus state, the app version they have. The current migration in 0.80 will refuse to run unless this app version key is empty, or exactly the previous version, preventing forgetting to upgrade to the next version pre migration. Furthermore, the next migration should do the same, but with the next app version, so that it will not allow skipping the previous migration.
For testing, I think we should:
pd migrate
fails because of the newly added safeguard.Issue ticket number and link
Closes #4793.
Implements penumbra-zone/UIPs#10.
Checklist before requesting a review
I have added guiding text to explain how a reviewer should test these changes.
If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason: