Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of Eth1 data votes on Medalla #2018

Closed
benjaminion opened this issue Aug 15, 2020 · 9 comments
Closed

Analysis of Eth1 data votes on Medalla #2018

benjaminion opened this issue Aug 15, 2020 · 9 comments

Comments

@benjaminion
Copy link
Contributor

benjaminion commented Aug 15, 2020

I extracted the Eth1 data votes from the Medalla network at the end of each Eth1 data voting period (1024 slots). The analysis covers the 72 periods (73,728 slots, 2304 epochs, ~10 days) that had been fully finalised before The Apocalypse.

Consensus on Eth1 block hashes

Currently, the beacon chain effectively aims to agree on Eth1 block hashes. It does this by a simple majority rule: within the 1024 slot period, once an Eth1 block hash has been voted for 513 or more times, it is agreed and the beacon state is updated. If no block hash achieves this threshold, then the beacon state is left unchanged.

The data from Medalla show that the chain failed to agree on an Eth1 block hash in 22% of Eth1 data voting periods.

Typically, voting was "all over the place", with between 3 and 140 different Eth1 block hashes being voted for within the seventy-two Eth1 data voting periods. There were two runs of three consecutive periods (~10 hours total) during which no agreement was reached.

Note that validator participation during this period was relatively low, being mostly between 70% and 80%. With higher validator participation it is likely that the threshold for agreement can be reached more often.

The consequence of failing to agree on the Eth1 block hash is that onboarding of new validators could be delayed.

Consensus on Eth1 deposit roots

In the current Phase 0 and Phase 1 specifications, state.eth1_data.block_hash is not used at all. Thus it is not strictly necessary to agree on it.

By contrast, the chain could very reliably have agreed on the Eth1 deposit root, which is actually used. The deposit root evidently changes less frequently than the block hash - not all Eth1 blocks contain deposits. There were no failures to reach a 50% + 1 majority on Eth1 deposit root.

Note that two Eth1 blocks with the same block hash will have the same deposit root, but not necessarily vice versa. Consensus on block hash implies consensus on deposit root, but the converse is not true.

Swiftness of agreement

[Update] When agreement is reached, it is on average reached 71% of the way through the period (after 727 slots) for the Eth1 block hash, and could be reached 69% of the way through the period (after 707 slots) for the Eth1 deposit root. The earliest point at which agreement could be reached is after 513 slots.

Raising the agreement threshold

It has been suggested that we raise the threshold for agreement from 50% to 60%. With this threshold, and this dataset, the beacon chain fails to agree on the Eth1 block hash 27% of the time, and on the Eth1 deposit root 2.7% of the time.

Proposal

Although it is not disastrous to fail to agree on the Eth1 data, it is inconvenient for new validators wishing to be onboarded swiftly. It is better to come to agreement more quickly than less quickly.

Since (a) it is easier to agree on Eth1 data root, and (b) the Eth1 data root and block hash have different, independent, purposes in the Eth2 protocol, I suggest coming to consensus about them separately. This should be fairly easy to implement by changing process_eth1_data() to the following:

Warning: I don't Python. This is probably broken. But you get the idea.

def process_eth1_data(state: BeaconState, body: BeaconBlockBody) -> None:
    state.eth1_data_votes.append(body.eth1_data)
    if [v.deposit_root for v in state.eth1_data_votes].count(body.eth1_data.deposit_root) * 2 > EPOCHS_PER_ETH1_VOTING_PERIOD * SLOTS_PER_EPOCH:
        state.eth1_data.deposit_root = body.eth1_data.deposit_root
        state.eth1_data.deposit_count = body.eth1_data.deposit_count
        if [v.block_hash for v in state.eth1_data_votes].count(body.eth1_data.block_hash) * 2 > EPOCHS_PER_ETH1_VOTING_PERIOD * SLOTS_PER_EPOCH:
            state.eth1_data.block_hash = body.eth1_data.block_hash

Summary

At the cost of a very small increase in code complexity we can significantly improve tracking of the Eth1 deposit contract data, and not unnecessarily hold up the onboarding of validators. There is no impact on tracking of Eth1 block roots.

@ralexstokes
Copy link
Member

@benjaminion this is awesome! in the summary spreadsheet what do the yellow and red cell shadings mean?

@ralexstokes
Copy link
Member

i'm wondering if you mean to only consider block_hash consensus conditional on deposit_root consensus?

we could consider uncoupling them further:

def process_eth1_data(state: BeaconState, body: BeaconBlockBody) -> None:
    state.eth1_data_votes.append(body.eth1_data)
    if [v.deposit_root for v in state.eth1_data_votes].count(body.eth1_data.deposit_root) * 2 > EPOCHS_PER_ETH1_VOTING_PERIOD * SLOTS_PER_EPOCH:
        state.eth1_data.deposit_root = body.eth1_data.deposit_root
        state.eth1_data.deposit_count = body.eth1_data.deposit_count
    if [v.block_hash for v in state.eth1_data_votes].count(body.eth1_data.block_hash) * 2 > EPOCHS_PER_ETH1_VOTING_PERIOD * SLOTS_PER_EPOCH:
        state.eth1_data.block_hash = body.eth1_data.block_hash

@ralexstokes
Copy link
Member

In the current Phase 0 and Phase 1 specifications, state.eth1_data.block_hash is not used at all. Thus it is not strictly necessary to agree on it.

but it is very useful for any eth1-eth2 interop ahead of the eth1-eth2 merger.

even if this data is only available off-chain or only available in a trusted manner on eth1, it still unlocks parts of the interop design space we just won't be able to have otherwise. so we shouldn't just ignore it or treat as a lower priority part of the spec :)

obviously the timing of the eth1-eth2 merger would inform the decision here, but if we are seeing hard time coming to eth1 consensus, i'd even support adding microincentives in a "phase 0.5" for "eth1 timeliness" on the eth2 side if it looks like the merger may take longer than expected.

@benjaminion
Copy link
Contributor Author

i'm wondering if you mean to only consider block_hash consensus conditional on deposit_root consensus?

Yes, that's a deliberate optimisation. Block hash consensus can happen only if deposit root consensus has happened. (To put it the other way, block hash consensus implies deposit root consensus.)

@benjaminion
Copy link
Contributor Author

In the current Phase 0 and Phase 1 specifications, state.eth1_data.block_hash is not used at all. Thus it is not strictly necessary to agree on it.

but it is very useful for any eth1-eth2 interop ahead of the eth1-eth2 merger.

Yes. My proposal above doesn't actually change the current Eth1 block hash voting mechanism at all. But the study does show that the Eth1 block hash voting mechanism isn't terribly effective. We may wish to consider more radical changes, such as incentivising consensus on Eth1 data (though that carries its own issues).

@benjaminion
Copy link
Contributor Author

in the summary spreadsheet what do the yellow and red cell shadings mean?

Sorry, I should have said. Red cells are where no consensus was reached for that Eth1 data voting period at the current 50% threshold. Yellow cells are those that would additionally fail if we increased the threshold to 60%.

@djrtwo
Copy link
Contributor

djrtwo commented Aug 24, 2020

There were no failures to reach a 50% + 1 majority on Eth1 deposit root.

This seems like implementation errors rather than spec errors. Any failure to agree on a block-hash/deposit-root combo after the first epoch (assuming max 1 epoch latency) is just a failure to follow the spec.

votes_to_consider should be 100% in agreement amongst implementations if we assume there are no forks in the eth1 chain 1000+ blocks deep [this is true in goerli 100% of the time].

Then if votes_to_consider is in agreement and I am looking at a handful of blocks, my vote is entirely deterministic regardless of client implementation and should quickly solidify as the chain during that voting period grows deeper

I'd like to better understand why the issue in conformance before we go change the spec.


My guess is that there is an off by one error causing prysm and other clients to disagree on the specific block so we end up relying entirely on prysm for the 51% votes which if prysm is ~70% of the network, wouldn't occur until on average in the 70% range.

The disagreement on block hash and agreement on deposit root also implies an off by a tiny amount error in vote_to_consider

@dapplion
Copy link
Collaborator

My guess is that there is an off by one error causing prysm and other clients to disagree on the specific block so we end up relying entirely on prysm for the 51% votes which if prysm is ~70% of the network, wouldn't occur until on average in the 70% range.

Relevant to this issue prysmaticlabs/prysm#7200

@benjaminion
Copy link
Contributor Author

Closing this as it is now ancient history, and it looks like we'll be getting rid of Eth1 data voting completely at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants