Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require fork choice to run before proposal #2878

Merged
merged 1 commit into from
May 10, 2022

Conversation

michaelsproul
Copy link
Contributor

Presently the honest validator spec states that the proposer at slot should build their block upon what they think the head is during slot - 1:

A validator is expected to propose a SignedBeaconBlock at the beginning of any slot during which is_proposer(state, validator_index) returns True. To propose, the validator selects the BeaconBlock, parent, that in their view of the fork choice is the head of the chain during slot - 1.

The problem with this is that the attestations from slot - 1 are queued awaiting processing until the start of slot, and have the potential to cause a re-org away from what the head was in slot - 1.

The result is that a proposer's block may get immediately re-orged out and have no hope of becoming the head. Consider the following example:

A <-- B <---_-----_---- E
 \--------- C <-- D

There is a contest between B and C as to which block should be the head during slot C. For the sake of argument let's say that block C is the head during slot C, but that the queued attestations from slot C will tip fork choice in favour of block B. When the proposer of D makes their block following the current spec they build atop C, as it was the head during slot D - 1 = C. As soon as this block is applied to the fork choice, so are the attestations from slot C, so the head re-orgs back to B (assuming no proposer boost for D, more on that in a moment).

The role of proposer boost

If D is able to receive a proposer boost (due to being on time, and proposer boost being enabled) then it is more likely to succeed in becoming the head because its boost can counter the queued attestations. However if the weight of queued attestations exceeds the boost then the re-org will still occur immediately and D will have no hope of becoming the head.

Proposer boosting can also play a role in establishing the contest between B & C. The only way for C to be head during slot C in the scenario above is if it is receiving a proposer boost in excess of the voting weight on B (from slot B). In other words, without proposer boost there would be no reason for 0-weight C to be head, as B has strictly more weight behind it. However, it's possible that there could be a contest between two blocks over a longer timeframe: if we imagine that B and C are the heads of two longer competing chains sharing common ancestor A.

Therefore the flaw in the honest proposer logic is independent of the proposer boost feature.

Proposed fix

I think that proposers should run fork choice at the start of the slot they are due to propose in and build upon the head as it appears then.

In practical terms, this would require beacon chain clients to run get_head immediately before proposing a block. This likely represents <25ms of processing time at current validator counts.

@michaelsproul
Copy link
Contributor Author

I suspect that some clients are already running fork choice aggressively like this, but Lighthouse is not, and anyone following the spec to the letter will not be either.

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is an error

but we need to be careful with the fix.

The intention is "build on a block from a slot that is less than slot but for the fork choice run at start of slot". You want to avoid building on a block from slot which could potentially be your head if there were divergent shufflings and some weirdness.

@terencechain
Copy link
Contributor

I suspect that some clients are already running fork choice aggressively like this, but Lighthouse is not, and anyone following the spec to the letter will not be either.

Prysm implements the identical behavior to the spec today. Prysm will require some implementation changes to support this as we currently only process fork choice attestations twice per slot. I think I agree that update_head before proposing is the desired behavior, so I'm ok with this. Hopefully, the changes won't be as invasive to the rest of the clients

@ajsutton
Copy link
Contributor

ajsutton commented May 3, 2022

Teku doesn't currently run fork choice prior to proposing a block. However we do run fork choice (without apply pending attestations) when we import a new block if it isn't the child of our current head (if it is it automatically becomes the new head so we don't need to run fork choice). So a block that causes a reorg because it benefits from proposer boost would become the head in Teku as soon as we import it.

It's trivial for us to run get_head prior to proposing though - it was required in an earlier version of proposer boost we implemented so we still have a prepareForBlockProduction call into fork choice, it just doesn't do anything in the current implementation.

@g11tech
Copy link
Contributor

g11tech commented May 4, 2022

lodestar also doesn't run update_head before assembling block although its should be trivial to call update_head to process queued attestations, but there is still an issue in post merge scenario, that such a reorg of the head at start of the slot will lead to empty execution block generation.

Should the attestations of slot - 1 be processed in "copied"/"cloned" forkchoice to predict new head at slot at lets say last interval of the slot -1, so that the correct fcU can be issued to the execution engine to build block?
if there is still a head that is different from the predicted one, anyway the execution payload is going to be empty (or small)

@michaelsproul michaelsproul changed the title Clarify that fork choice runs before proposal Require fork choice to run before proposal May 5, 2022
@michaelsproul
Copy link
Contributor Author

@djrtwo: You want to avoid building on a block from slot which could potentially be your head if there were divergent shufflings and some weirdness.

Good point. This was implicit in the state transition validity before, but my new commit makes it explicit.

@ajsutton: So a block that causes a reorg because it benefits from proposer boost would become the head in Teku as soon as we import it.

I think this is a good strategy, however the re-orgs we're now seeing on mainnet are on proposer boosted nodes which initially see C (the boosted block from my example above) as head, and then re-org to B when the rest of the non-boosted nodes attest to it with more weight. I think Teku would see C as head due to the boost, but wouldn't re-run fork choice before proposing D, and might incorrectly attest to D (because it's the child of current head, C?).

Even before we ship this spec change the situation should improve as more users upgrade to proposer boost releases, because then re-orgs back to B that go against the boost will be less common.

@g11tech: there is still an issue in post merge scenario, that such a reorg of the head at start of the slot will lead to empty execution block generation.

I think your idea of running fork choice between the aggregation attestation deadline (8s) and the next slot is a good one. Maybe we should require fork choice to run 10s (5 * SLOTS_PER_EPOCH // 6) into slot - 1 while ingesting the attestations from slot - 1, and then again at the start of slot (which should hopefully be a quicker run with the same output). This is a more invasive spec change however, so it might be worth adding the two runs separately (the 10s and 12s runs, this PR being the 12s one).

@ajsutton
Copy link
Contributor

ajsutton commented May 5, 2022

I think Teku would see C as head due to the boost, but wouldn't re-run fork choice before proposing D, and might incorrectly attest to D (because it's the child of current head, C?).

Yep, I agree with this.

@potuz
Copy link
Contributor

potuz commented May 5, 2022

if it is it automatically becomes the new head so we don't need to run fork choice

This I think is a bug with or without this PR merged right? Teku should not attest to D in the above example even if it was itself that proposed it.

@ajsutton
Copy link
Contributor

ajsutton commented May 5, 2022

if it is it automatically becomes the new head so we don't need to run fork choice

This I think is a bug with or without this PR merged right? Teku should not attest to D in the above example even if it was itself that proposed it.

Fork choice works by walking forwards along the chain, selecting between competing child blocks. If we've run fork choice and found that X is our chain head, then we must not have any children of X or they would have been our head (no other choice and fork choice always follows to the tip of the chain). So if we import a block that is the child of our current head, it must be the only child of that block and thus fork choice would select it as the new chain head.

If we apply changes to weightings by processing attestations then we might reorg to a different chain, but until we do that we can safely update our chain head. Notably we ensure we have applied the new weightings from the previous slot attestations and run fork choice prior to creating attestations.

The important thing to note here is that attestations aren't for your chain head. They're for they block on your canonical chain at the slot the attestation is for. Similarly when creating a block the parent is the block on the canonical chain prior to the requested block slot. Normally the VC and BN have clocks in sync so they both work out to your chain head, but the attestation or block slot is specified in the REST API request and must be respected.

@potuz
Copy link
Contributor

potuz commented May 5, 2022

If we apply changes to weightings by processing attestations then we might reorg to a different chain, but until we do that we can safely update our chain head. Notably we ensure we have applied the new weightings from the previous slot attestations and run fork choice prior to creating attestations.

I think this is technically wrong: when you import D which continues the canonical chain and you have not yet applied weightings from attestations that appeared in the previous slot. You are saying "D is my new head for slot N+1, modulo what may have happened from attestations in slot N, I promise not to act on this D until I actually check that it deserved to be head in N+1".

Either way I see contradictory statements here: either Teku attests to D because it became head only by being the child of C, which is what I read in the previous message from Michael and its reply, or Teku may actually attest for something different than D then since you apply previous attestations before acting as in your last message.

I find this quite difficult even to implement: if you import D and you actually include the attestations that D included in your forkchoice, then you are counting whatever D included from the previous slot, which is biased for D's view of the chain, and not the attestations that the node gathered during that slot.

@ajsutton
Copy link
Contributor

ajsutton commented May 5, 2022

I think this is technically wrong: when you import D which continues the canonical chain and you have not yet applied weightings from attestations that appeared in the previous slot. You are saying "D is my new head for slot N+1, modulo what may have happened from attestations in slot N, I promise not to act on this D until I actually check that it deserved to be head in N+1".

Hmm, not really. I'm saying that based on my current world view (ie the attestations I've previously processed) C is my chain head. Then I import D and the attestations haven't changed, so D must be my chain head. I'm not running fork choice to select D, I just know that if C was my head and D is the first child then D would be the chain head based on the existing weightings I have.

Either way I see contradictory statements here: either Teku attests to D because it became head only by being the child of C, which is what I read in the previous message from Michael and its reply, or Teku may actually attest for something different than D then since you apply previous attestations before acting as in your last message.

In this situation there's a reorg required so every node is going to have to change their chain head when they process attestations from the prior slot. The question is just whether your changing from C to B or from D to B. That's fine, but in the normal case where there isn't a reorg Teku will already have the right head selected and has a head start in processing the various things that are triggered by updating the chain head.

I find this quite difficult even to implement: if you import D and you actually include the attestations that D included in your forkchoice, then you are counting whatever D included from the previous slot, which is biased for D's view of the chain, and not the attestations that the node gathered during that slot.

We don't run fork choice to make D the head because we don't need to. But D can only include attestations from prior slots anyway, all of which are now valid to consider so there's nothing wrong with considering them at this point. We can add additional information to get a better view of the network by also considering attestations we received via gossip but that weren't in a block but there's no guarantee that different nodes have the same view of those attestations anyway.

But I think we're quite off topic for this issue at this point. :)

@djrtwo djrtwo merged commit 242f286 into ethereum:dev May 10, 2022
bors bot pushed a commit to sigp/lighthouse that referenced this pull request May 20, 2022
## Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
bors bot pushed a commit to sigp/lighthouse that referenced this pull request May 20, 2022
## Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
bors bot pushed a commit to sigp/lighthouse that referenced this pull request May 20, 2022
## Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants