Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EFM] Invalid Service Events shortly after Epoch Commit #5631

Open
Tracked by #5762 ...
AlexHentschel opened this issue Apr 5, 2024 · 1 comment
Open
Tracked by #5762 ...

[EFM] Invalid Service Events shortly after Epoch Commit #5631

AlexHentschel opened this issue Apr 5, 2024 · 1 comment
Assignees

Comments

@AlexHentschel
Copy link
Member

Problem description

Currently, the EpochStateMachine, which orchestrates the Epoch Happy Path and Fallback, has this behaviour:

  • As of the block that encounters an invalid Epoch ServiceEvent, we engage Epoch Fallback Mode [EFM] and do not process any Epoch transitions anymore. This creates subtle edge cases for future light clients and can potentially drive consensus into an irreconcilable state (not sure)
    • Scenario:
      • Imagine that Epoch N ends at view 1000.
      • Block from view 1001 (first block of Epoch N+1) seals a result that has an invalid Epoch Service Event
    • How the current implementation will behave:

In my opinion, the consensus protocol has formally reached an irreconcilable state at this point. I think our current implementation would probably just stop producing blocks. Reasoning:

  • Note that when initializing the FallbackStateMachine, we do not re-apply the epoch transition.
  • Going into view 1001, Alice thought that she was the leader, based on the leader assignment for Epoch N+1. However, after running the Epoch State Machine, the Epoch state is still in Epoch N in fallback mode. Most likely, Alice is not the leader for view 1001 in EMF of Epoch N.
  • I don't think our software will handle this edge case. Certainly it is a violation of HotStuffs formal safety requirement: once you commit to a leader selection for some range of views (here the views belonging to Epoch N+1), you cannot change it (slightly simplified). Conceptually, we commit to the leader selection once we commit Epoch N+1.

I think a similar aspect has previously come up for the EFM recovery. Specifically, the EFM recovery cannot change the modus operandi for view ranges that the FallbackStateMachine has already committed.

Suggestion of Problem Solution

  1. Once an Epoch is committed (happy path) to some fork, that Epoch will become active on the specified view -- if this fork is extended beyond the epoch boundary. In other words, also the FallbackStateMachine will enact Epoch transitions that have previously been committed by the happy path protocol.
  2. The aspect where HappyPathStateMachine and FallbackStateMachine differ is the way they add new view ranges beyond the already committed.
@AlexHentschel
Copy link
Member Author

AlexHentschel commented Apr 11, 2024

Let us consider the following suggestion:

  1. Once a leader selection for a view range is committed, it can never be overwritten/changed
    • A leader selection view range is committed upon finalizing the EpochCommit event (not EpochSetup) on the happy path
    • A leader selection view range is committed upon entering EFM on the fallback path

and

  1. If an epoch extension is added, it's appended to the last committed leader selection view range.

Thoughts

There is a subtle detail in the proposed specification that we have to get correctly in order to not break consensus.To paraphrase, we are suggesting to use finality as a decision criterion on whether or not the Epoch State Machine accepts a service event leading to a changed leader selection for a future view range.

General Rule:

Finality cannot be used as an input for evolving the Protocol State. Generally, only information in the fork that is currently being extended can be used to determine the validity of a block. There are exceptions where using finality is safe, but those are generally very edge-casey.

Reasoning:

  • Finality is a determination that nodes make locally. Very explicitly, nodes that all know the fork A <- B <- C and receiving the candidate D such that A <- B <- C <- D; yet they might still have different finality statuses for the blocks. Specifically, this is because nodes may observe alternative forks that are subsequently orphaned and are not observed by other nodes. Nevertheless, observing subsequently- orphaned forks can (rarely) progress finality on the main fork.

    For example, me knowing the fork A <- B <- C <- D, I may conclude that B is finalized. On the one hand, based on my world view C is still unfinalized. On the other hand, some other node may know additional children of C that finalize C. So if we allow finality to influence what Protocol State transitions in block D are legal/illegal, me and that other node may disagree whether D is a valid extension of the chain.

  • Consensus rules guarantee that finality is eventually consistent. In other words, if some node finalizes block B and if the network continues to produce valid blocks, all honest nodes will eventually conclude that B is finalized.

Suggested change.

Above, I argued that this part of the suggested rule would break consensus:

❌ A leader selection view range is committed upon finalizing the EpochCommit event (not EpochSetup) in the happy path

Lets discuss how we can modify this rule to work out:

  1. I think as a first step, we need to have a definition of "committing a leader selection view range" for a specific fork. I would suggest:
    • On the happy path, a leader selection view range is committed for one specific fork, when an EpochCommit event is included in that fork.
    • On the unhappy path, a leader selection view range is committed when the EFM logic reaches the threshold view without a valid EpochCommit or EpochRecovery event
  2. With this definition, we can have different committed leader selection view ranges in different forks. This is no problem, as long as such view ranges are sufficiently far into the future.
    • Though, by the time the consensus committee for the view range takes over, it must have already been finalized which committee is taking over. In other words, finality is not important when the Epoch State Machine writes the leader selection view range into the protocol state. Finality is important when this view range activates. Conceptually, it is the same mechanics as with protocol version upgrades: we need this safety buffer between writing data into the Protocol State and this taking effect in the network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants