Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounting reform [isolated] #2176

Merged
merged 27 commits into from
Feb 4, 2021
Merged

Accounting reform [isolated] #2176

merged 27 commits into from
Feb 4, 2021

Conversation

djrtwo
Copy link
Contributor

@djrtwo djrtwo commented Jan 5, 2021

Isolated incentive accounting reform proposal.
Pulled out of #2159


Copied from #2140:

Note: 331b0f6 changed the flags usages.

Currently, when we receive an attestation, we defer almost all processing (except basic verification) to the end of an epoch. We store a PendingAttestation object that contains the attestation data (not including the signature), and at the end of an epoch we run a computation over these objects to determine which epochs get justified and/or finalized, and what each participant's reward or penalty should be. This is quite complex and expends extra effort on re-serializing, hashing, storing extra data in the case that attestations overlap, etc.

The natural alternative is to do accounting in real time, but this has historically been difficult because attestations contain a random sample of validators, and Merkle trees are not friendly to random edits: adjusting all n records together requires 1/4 hashes per record, but adjusting a few dispersed records requires an entire log(n) hashes per record.

But it turns out that we can get the best of both worlds. If we make a data structure containing the current-epoch participation records of each validator, we can make the data structure itself be in shuffling order. That is, item k in the data structure would not literally be the k'th validator; instead, it would be the k'th validator in the shuffled list. Because attestation committees are directly taken from contiguous segments of the shuffled list, this allows us to efficiently adjust the records associated with each validator directly in attestation processing.

This PR does this; it implements the current_epoch_flags and previous_epoch_flags data structures, which keeps track of "flags" (basically, a mini-bitfield per active validator) that determine which "duties" a validator successfully completed in the current and in the previous epoch. The duties include:

* FLAG_TARGET: correct source and target
* FLAG_TIMELY: got included within about 5 slots
* FLAG_HEAD_AND_VERY_TIMELY: got the head correct and got included within 1 slot

Note that there is some change here from the phase 0 duties (source, target, head, inclusion delay). The key changes are (i) the introduction of "combination duties" (do A and B to get a reward), and (ii) the "coarsening" of the inclusion delay reward from the original 1/n rule to a "very timely / timely / not timely" triad. (i) is done to more heavily penalize selective performance, where a validator picks and chooses which duties to fulfill and which duties to ignore to save costs, and (ii) is a result of the inherent limitations of the flag structure, which means that every bit of info used to determine rewards must be conserved.

Additionally, this PR changes base reward math somewhat: the "base reward" now becomes the maximum total expected reward that a validator can get per slot, and the denominator of 5 is replaced with a system of numerators: a global denominator of 64, and numerators specific to each duty. This system is more flexible.

@protolambda
Copy link
Contributor

FWIW, here are the current similar optimizations in zrnt: https://github.com/protolambda/zrnt/blob/master/eth2/beacon/epoch.go

Essentially constructing a flat (list, 1 piece of info per validator) view of status data. This optimization goes all the way back to the eth2 workshop in Sydney ~April 2019, where lighthouse first shared their version of it.

There are several reasons why this current optimization helps:

  1. Only touch every attestation once. Bitfield access, and merging with committee data to determine (non-)participating indices is more expensive than you think. Especially in langauges like Python/JS/Java, where bitfields are not as low-level. Using the participation data for rewards calculation is easy after preparing this optimized view.
  2. Minimize tree-structure interactions. In the spec, ZRNT, Teku and Lodestar the state-data is generally like a tree, to make forking of the state cheap (thanks to datasharing). Access of the data is more expensive though. So by flattening (optimization) it just before intensive usage (the epoch transition), you end up avoiding a lot of overhead, and get the best of both worlds.
  3. The flattened structure in the old optimization covers a lot of handy info: validator fields (epochs, slashed boolean), inclusion delay, 8 flags (source/target/head previous epoch participation, source/target/head current epoch participation, being unslashed, and being eligible). The trick with the flags is that you can combine them: AND conditions can be checked quickly and elegantly.

About the new changes:

  • Thanks to the shuffling order I'm not too worried about changes in the tree. And continuous updates might be more expensive, but epoch transitions will benefit, which is the real bottleneck.
  • Having space for additional flags later on is definitely nice, but we could use some of that space in the bitvector to put the previous and current epoch flags in the same bitfield, as is the case with the old client-side optimizations. Rotating them would be more expensive w.r.t. merkleization though (previously not a concern, since not in the state anyway)
  • Note that the current draft uses a list of bitvectors, which results in a merkle tree with 32 bytes per bitvector. Bitvectors are not packed in SSZ merkleization, basic integer types are. So if we change the bitvector[8] to a uint8, we have 32 times less leaf nodes in the tree.
    • To change it to a uint8: we'd have to use less elegant bit operations on the integers. E.g. val_flags & TIMELY_TARGET_FLAG != 0, and define flags as FLAGNAME = 1 << iota. A helper function polishes it all away easily though. E.g. has_flag(bitvec, FLAGNAME)

And last but not least: with the data in the state, I hope clients can serve it efficiently on the API. No need to do a whole epoch worth of processing anymore to determine detailed per-validator participation info, just read it directly from the state. Having such detailed voting info in the explorers would be great! cc @Buttaa

For clients adopting this PR:

  • Fast update would be to just read the participation data from the state, and copy it over into your current flattened validator-status structure. When using a slower tree-structure for the state, it may even be the best option.

Copy link
Contributor

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like this change. Here are some benefits:

  • It makes the spec more extensible and clearer, IMO.
  • I'm pretty sure it will reduce disk usage and hashing times out-of-the-box.
    • I'm not certain of the scale of the improvements, whipping up a benchmark is not trivial.
  • It's simpler to reason about BeaconState database optimizations that use the participation bitfields instead of lists of attestations.
  • At a glance, I think the "coarsening" of rewards will ultimately make it easier to pack attestations into blocks.

... to a "very timely / timely / not timely" triad.

I assume this triad has been removed in a refactor? I believe we now only have timely/not-timely.

Note that the current draft uses a list of bitvectors, which results in a merkle tree with 32 bytes per bitvector. Bitvectors are not packed in SSZ merkleization, basic integer types are. So if we change the bitvector[8] to a uint8, we have 32 times less leaf nodes in the tree.

I agree with this and I'd mildly prefer to see it use List[uint8, VALIDATOR_REGISTRY_LIMIT] for cheaper hashing. Here's some numbers:

  • 131k validators
    • bitlist
      • tree hash size: ~262kb
      • hashes required: ~65k
    • u8
      • tree hash size: ~8kb
      • hashes required: ~2k

I'm not marking this as an explicit approval since I haven't gone into detail with the economics/math of the rewards. But I'm happy for this PR to become the canonical spec, pending resolutions to my questions and review from EF researchers.

previous_epoch_attestations=List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH](),
# empty in pre state, since the upgrade is performed just after an epoch boundary.
current_epoch_attestations=List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH](),
previous_epoch_participation=[Bitvector[PARTICIPATION_FLAGS_LENGTH]() for _ in range(len(pre.validators))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional to drop the history of attestations from the previous epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to write a function to convert pending attestations into flags. This has relatively high consensus complexity for a single transition epoch.

One alternative is to just give all validators +base_reward at the epoch transition to avoid the complexity of the logic here...

* Fixing tests

* Add `is_post_lightclient_patch` helper to determine the fork version condition
Copy link
Member

@ralexstokes ralexstokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this generally looks great and i agree that this overall direction simplifies accounting so it is easier to maintain and extend moving forward, as well as making it easier to get right in the first place :) i left a comment about the rationale for PARTICIPATION_FLAGS_LENGTH and the following:

i'm confused how we are accomplishing the "batched updates" to the state tree which seems to be the reason we didn't do this in the first place... if i'm reading the PR correctly, then we retain the random access into the state via the use of get_attesting_indices in the updated process_attestation. but the entries under state.{previous, current}_epoch_participation are supposed to be in shuffling order so that each attestation touches a contiguous subset of that list at any time (thus amortizing the write overhead). it seems pretty straightforward to keep most of the machinery we have here and just map from "validator index" space to "shuffling index" space as needed, but the state will likely need to keep the epoch_participation fields as Vector[Bitvector[...]] so that we can find the right spot to insert the relevant updates as attestations will come in off the wire with no particular order.

@terencechain
Copy link
Contributor

I’m a fan of this change. The compressed participation bits is much easier to reason to than a list of pending attestations in state. It makes the spec more clear. It also enables us to reduce major reward/penalty flags down to just 3, which makes some conditions can be elegantly checked in process epoch.

I have the same question about using bitvector[8] versus uint8. It seems to me that changing to uint8 is worth it with a minor tradeoff for less elegant bit operations. I can do some benchmarks for this on Prysm to give us more data points.

Copy link
Contributor Author

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests and using uint8 for flags was just merged from an independently reviewed PR

This looks good to me, but need someone else to click approve because I authored the PR 😅
@hwwhww @protolambda

Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec LGTM. 👍
We can add some cosmetic changes and more tests with new PRs.

Co-authored-by: Hsiao-Wei Wang <hsiaowei.eth@gmail.com>
@djrtwo djrtwo merged commit 6f2c69e into dev Feb 4, 2021
@djrtwo djrtwo deleted the accounting-reform branch February 4, 2021 17:16
@@ -390,3 +390,11 @@ def wrapper(*args, spec: Spec, state: Any, **kw):
return None
return fn(*args, spec=spec, state=state, **kw)
return wrapper


def is_post_lightclient_patch(spec):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the chances of getting a clean light-client spec without the if's? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants