Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check justified checkpoint root in fork choice #2741

Closed
paulhauner opened this issue Oct 21, 2021 · 2 comments
Closed

Check justified checkpoint root in fork choice #2741

paulhauner opened this issue Oct 21, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@paulhauner
Copy link
Member

paulhauner commented Oct 21, 2021

Description

Credits to @hwwhww for bringing this to our attention.

In fork choice, Lighthouse checks checkpoint epochs:

/// This is the equivalent to the `filter_block_tree` function in the eth2 spec:
///
/// https://github.com/ethereum/eth2.0-specs/blob/v0.10.0/specs/phase0/fork-choice.md#filter_block_tree
///
/// Any node that has a different finalized or justified epoch should not be viable for the
/// head.
fn node_is_viable_for_head(&self, node: &ProtoNode) -> bool {
(node.justified_epoch == self.justified_epoch || self.justified_epoch == Epoch::new(0))
&& (node.finalized_epoch == self.finalized_epoch
|| self.finalized_epoch == Epoch::new(0))
}

However, the specification declares:

    correct_justified = (
        store.justified_checkpoint.epoch == GENESIS_EPOCH
        or head_state.current_justified_checkpoint == store.justified_checkpoint
    )
    correct_finalized = (
        store.finalized_checkpoint.epoch == GENESIS_EPOCH
        or head_state.finalized_checkpoint == store.finalized_checkpoint
    )

We have to check "checkpoints" are equal. So Lighthouse needs to update to check both epoch and root, not just the epoch.

Steps to resolve

In order to support this, we need to add the root information to the ProtoNode:

pub justified_epoch: Epoch,

... and also the ProtoArray:

pub justified_epoch: Epoch,

This will require a database migration. I think it should be fairly simple to migrate the ProtoNode, the roots of all the justified ancestors should be in the ProtoArray, so we should just be able to find the roots by just iterating backwards until we find the first ancestor where ancestor.slot <= start_slot(node.justified_epoch).

For ProtoArray, I think we should be able to get the root for the justified checkpoint from the ForkChoiceStore, which is persisted alongside it. We'd need to do some more reading to ensure there's a 1:1 relationship between these values, though.

@paulhauner paulhauner added the bug Something isn't working label Oct 27, 2021
@paulhauner
Copy link
Member Author

I think @realbigsean is interested in taking this.

@realbigsean realbigsean self-assigned this Oct 27, 2021
@paulhauner
Copy link
Member Author

paulhauner commented Oct 31, 2021

I think it should be fairly simple to migrate the ProtoNode, the roots of all the justified ancestors should be in the ProtoArray, so we should just be able to find the roots by just iterating backwards until we find the first ancestor where ancestor.slot <= start_slot(node.justified_epoch).

I was wrong here, specifically with "the roots of all the justified ancestors should be in the ProtoArray". That's true for the head block, but not necessarily earlier blocks (e.g., the finalized block). Earlier blocks may reference finalized blocks as their justified roots, meaning they are no longer in the ProtoArray.

With the "fairly simple" method unavailable, I can see two broad ways to implement this migration:

  1. Full migration: read BeaconStates from the store during the migration to fill in the justified roots.
  2. Partial migration: make the justified_root field an Option<Hash256>. All new blocks added to the proto-array get Some(root). When we do the check, only check the root if justified_root.is_some(). This means the migration happens gradually and we're full migrated once we flush out the proto-array with new blocks (probably a few epochs).

(1) is nice because it's one clean migration that's contained in database code and the actual fork choice code knows nothing of it. However, it means we need to do some database work on startup that might be a little slow. It also makes the assumption that we have the states for any block that's in ProtoArray, which I think is fine, but mo assumptions == mo problems.

(2) is nice because the migration code is blindingly simple. However, it means we need to add complexity to the actual fork choice code to handle the Option. It also puts is in a weird "partially updated" state that might have side-effects.

I'm not quite sure which way to go right now, but I'm going to post this so @realbigsean can have a read whilst I think more 🤔

bors bot pushed a commit that referenced this issue Nov 8, 2021
## Issue Addressed

Resolves #2545

## Proposed Changes

Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too.

During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset.

## Failing Cases

There is one failing case which is presently marked as `SkippedKnownFailure`:

```
case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual:
head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } | Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 }
```

This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants