Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UIP-6: App Version Safeguard #4919

Merged
merged 10 commits into from
Nov 15, 2024
Merged

UIP-6: App Version Safeguard #4919

merged 10 commits into from
Nov 15, 2024

Conversation

cronokirby
Copy link
Contributor

Describe your changes

This implements UIP 6, creating an "app version safeguard", to try and prevent the wrong version of PD from being started against existing state, or running the wrong version of PD.

This code should be immediately implementable as a non-breaking point-release, which should immediately provide a safeguard against forgetting to upgrade to the next major version of the software before the next migration.
This should happen because nodes running the point release will start writing, to non-consensus state, the app version they have. The current migration in 0.80 will refuse to run unless this app version key is empty, or exactly the previous version, preventing forgetting to upgrade to the next version pre migration. Furthermore, the next migration should do the same, but with the next app version, so that it will not allow skipping the previous migration.

For testing, I think we should:

  • test that we can take an existing node, and then start it with the new code, stop it, and then start it again
  • test that trying to run pd migrate fails because of the newly added safeguard.

Issue ticket number and link

Closes #4793.

Implements penumbra-zone/UIPs#10.

Checklist before requesting a review

  • I have added guiding text to explain how a reviewer should test these changes.

  • If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason:

    This code is sensitive to potentially breaking consensus, but the testing plan above should confirm that it is safe for a point release. If it is not safe for a point release, the code should be amended so that it is.

@conorsch
Copy link
Contributor

The smoke tests are failing, and I can replicate the failure locally: pd is falling down immediately with

thread 'tokio-runtime-worker' panicked at crates/core/app/src/server/consensus.rs:71:26:
init_chain must succeed: database already initialized

which is good ol' #1884. It looks like writing to the storage on a fresh node rightly triggers initialization, which causes pd to error out.

@cronokirby
Copy link
Contributor Author

Is there any way to write to non verifiable storage without triggering this?

This prevents an edge case where PD would crash if starting before the
very first genesis. Not touching storage in that case will prevent nodes
running continuously from genesis from benefitting from the safeguard,
but an upgrade has already happened on mainnet, and so we don't care
about not having the safeguard in this case.
Simplifies logic significantly

Co-authored-by: @erwanor <erwanor@users.noreply.github.com>
Co-authored-by: @erwanor <erwanor@users.noreply.github.com>
@conorsch
Copy link
Contributor

To review, I prepared a local devnet based on current main (without the safeguard logic), ran it for a while, then switched the pd binary to one built from this branch and bounced the service. Then I submitted an upgrade proposal and confirmed it halted the chain.

When I attempted pd migrate via the pre-upgrade version, I encountered an error:

Error: failed to upgrade state

Caused by:
    app version mismatch:
      expected 7 (penumbra v0.79.x)
      found 8 (penumbra v0.79.x)
    this migration should be run with penumbra v0.80.x instead

which shows an off-by-one in the "found" version; simple fixed already pushed. Will continue with local testing and follow up with more feedback.

The `app_version_safeguard` logic extends the APP_VERSION logic with
helper functions that translate an APP_VERSION to a software version.
Previously, protocol versions increments only required changing the
APP_VERSION const (as well as writing a migration), but now developers
must also update the version match, otherwise tooling will report an
"unknown" version by default. Adding a simple test to confirm the lookup
returns a known value.
If an operator runs `pd migrate` and then runs it again, the error
message should accurately describe the situation that the local state is
already migrated, and state that pd is refusing to proceed as
instructed.

Notably this failure occurs even if the `--force` flag is provided to
`pd migrate`, so I've updated the docstring on that flag accordingly.
Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally by drafting a no-op migration, bumping the APP_VERSION, and confirming behavior. Tacked on a few changes fussing with the err msgs reported, so that the tool is maximally informative to node operators. Very pleased with the behavior, thanks for all the careful thought put into this, @cronokirby!

@conorsch conorsch merged commit 9dd02b4 into main Nov 15, 2024
14 checks passed
@conorsch conorsch deleted the uip-6-app-version-safeguard branch November 15, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PD migrate should check the expected version of state before migrating
3 participants