Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to restart lighthouse if altair fork is missed #2526

Closed
realbigsean opened this issue Aug 19, 2021 · 3 comments
Closed

Unable to restart lighthouse if altair fork is missed #2526

realbigsean opened this issue Aug 19, 2021 · 3 comments
Assignees
Labels
v1.5.2 The release after v1.5.1

Comments

@realbigsean
Copy link
Member

Description

If you have lighthouse running v1.4 until after the altair fork occurs and then try to upgrade to v1.5, lighthouse won't start up. This is because we attempt to deserialize the head of the chain by first deserializing the slot of the head block, and comparing it to the fork schedule. But when we saved the block to the database, we used the pre-fork serialization (even though the slot of the block is post-fork).

Version

1.5.0-rc.1

Present Behaviour

Aug 19 18:24:29.427 INFO Starting beacon chain                   method: resume, service: beacon
Aug 19 18:24:44.083 CRIT Failed to start beacon node             reason: Failed to build beacon chain: DB error when reading head block: SszDecodeError(OffsetIntoFixedPortion(220)). Ensure the data directory is not initialized for a different network. The --purge-db flag can be used to permanently delete the existing data directory.
Aug 19 18:24:44.083 INFO Internal shutdown received              reason: Failed to start beacon node
Aug 19 18:24:44.083 INFO Shutting down..                         reason: Failure("Failed to start beacon node")

Expected Behaviour

I think we should automatically try to re-sync from before the fork in this scenario

Steps to resolve

Thinking we could do either or both of these:

  • If we fail to deserialize the head of the beacon chain at startup, we could attempt to deserialize it based on the prior fork, and if that's successful it would suggest we're in this scenario. So we could start up lighthouse and automatically start re-syncing from right before the fork.
  • We could add a flag --resync-from-fork altair to help recover from situations like this
@michaelsproul michaelsproul self-assigned this Aug 20, 2021
@michaelsproul
Copy link
Member

I'm going to have a go at this. It already looks like it might be a bit painful: all of our decoding methods assume they can determine the structure from the slot, so it might be messy to backtrack from the head of the minority chain to the pre-fork block while loading + deleting along the way.

@paulhauner
Copy link
Member

I'll flag this as v1.5.0 for the time being, just so it stays on our radar before we release. If we decide to push it to a later release, that's totally fine 🙂

@paulhauner paulhauner added the v1.5.0 For inclusion in v1.5.0 release label Aug 20, 2021
@michaelsproul michaelsproul added v1.5.1 To be included in the v1.5.1 relase v1.5.2 The release after v1.5.1 and removed v1.5.0 For inclusion in v1.5.0 release v1.5.1 To be included in the v1.5.1 relase labels Aug 23, 2021
bors bot pushed a commit that referenced this issue Aug 30, 2021
## Issue Addressed

Closes #2526

## Proposed Changes

If the head block fails to decode on start up, do two things:

1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`).
2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint.

## Additional Info

I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞
@paulhauner
Copy link
Member

Resolved in #2529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.5.2 The release after v1.5.1
Projects
None yet
Development

No branches or pull requests

3 participants