Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sharding attestation policing across validators/nodes #63

Open
djrtwo opened this issue Jul 17, 2019 · 7 comments
Labels
rfc

Comments

@djrtwo
Copy link
Collaborator

@djrtwo djrtwo commented Jul 17, 2019

sharding attestation policing across validators/nodes

For the network to be well-policed wrt attestation slashings, we need to actively be checking for slashable attestations within the "weak subjectivity period". The weak subjectivity period is variable in eth2 due to the validator queuing mechanism, but it is on the order of ~8 months when 10M eth (300k validators) are participating.

This brings up the non-trivial challenge of efficiently storing attestations and related data, and searching through this space on demand when new (and potentially historic) attestations come in.

After some consideration on the load of Attestations and search data that needs to be stored, it seems prudent to default shard this responsibility across nodes/validators rather than have each validator attempt to store and search the entire space. It is also assumed that large data consumers (block explorers, API providers, etc) will store and search the entire space.

This issue discusses both the efficient search algorithms, along with a discussion on to splitting up the search space.

Slashing search

There are two types of slashable attestations -- "double vote" and "surround vote". Double votes are relatively simple to find, whereas surround takes a little more consideration.

The strategies discussed here attempt minimize database reads for attestations when searching and instead rely upon relevant bitfields and epochs related to each validator. If a slashing is detected, then a more intensive find for the particular attestation from the DB is warranted.

Double vote

Detecting a double vote is relatively simple and requires storing only N bits per validators where N is the number of epochs in the weak subjectivity period. When a new attestation comes in for epoch, for each validator in the attestation, check if double_vote bit for validator at epoch is set to 1. If set to 1, found slashable message, retrieve from DB. If set to 0, set to 1.

This is a constant time search and constant time update per validator so linear in the number of validators in an attestation.

Surround vote

Detecting a surround vote is a bit more complex and requires more data. An algorithm developed by @vbuterin requires 2*N*log(N) bits per validator. The search is constant time per validator and the data update is linear in the worst case but normally likely sub-linear, near constant time update.

The algorithm requires storing 2*N epochs per validator. The basic principle is the following:

For each epoch E and for each validator, maintain the highest target epoch of an attestation whose source <= E. This creates an always-sorted list that can be updated by starting from the attestation's source position in the array and then continuing to move right until no longer adjusting the maximum. When a new attestations is received, check it against that list for each validator in the attestation to find if the new attestation is surrounded by any prior attestation. To check if the new attestation surrounds an earlier attestation, maintain the opposite of the above described array and perform the opposite checks.

Sharding responsibility

Although these algorithms are very efficient in search time, the data requirement for maintaining these lists can actually become quite large in addition to the already large data requirement of storing historic attestations through the weak subjectivity period, thus the need to shard responsibility of the attestation slashing search spec across validators.

We recommend strong client defaults in this matter to (1) provide sufficient policing of the network in the case that the 2/3 honesty assumption fails wrt safety while at the same time (2) do not place excessive resource requirements on any one validator to do so.

In addition to consumer node policing, we fully expect large data users to execute policing of the full dataset as they already will be storing it for other purposes. Fortunately, policing of attestations works if a minimum of one honest user sufficiently monitors the entire weak subjectivity period worth of attestations.

There are multiple options for strong user defaults, the most appealing of which scales as the number of validators working off of a node, but as most BN vs VC architecture does not persistently store which validators are connected to a BN, this might require additional tracking that is not currently in place. To this end, a BN could catalogue for which pubkeys validator attestation creation requests have been made and scale responsibility that way. If the user default goes in this direction, it likely makes sense to not broadcast slashable attestations immediately when found and instead to package them into blocks eventually for the one of the associated pubkeys connected to the BN and for the profitability of the validator.

An alternative is to have a slightly higher per-node default epoch range that a node is responsible for and disregard which validators might be connected. In this case, it likely makes sense to instead immediately broadcast slashable messages to be picked up by any validator to be put into a block. Due to the node not persistently knowing if validators are connected to it, a slashable message might be kept around locally for an indefinite period of time and not serve value to the network.

@djrtwo

This comment has been minimized.

Copy link
Collaborator Author

@djrtwo djrtwo commented Jul 17, 2019

You can actually construct a much less data intensive no-surround checker that has the potential of false positives. Essentially, use the mechanism proposed by vitalik, but instead of doing it per validator, just do it for all attestations. When a historic attestation comes in, check it against this mechanism. If you find a surround, do a deeper search in the database to find if the surrounded attestation has a collision in participating validators. If so, found slashable surround. If not, false positive.

This operates under two assumptions

  1. It is expensive to construct valid attestations (each validator can only produce max 1 per epoch before they are slashable) so is expensive to dos
  2. Due to the way in which attestations are constructed against valid states, it is abnormal for surrounded attestations to naturally occur

Thus a valid, historic attestation that surrounds anything is already a red flag.


A note on the data requirements of the non-false-positive surround checker in the original post

For 8 months of epochs, this is ~1MB per validator (54k epochs * 2 * 8-bytes-per-epoch) which is something like 300GB of surround meta-data if 300k validators. Can actually truncate the 8-bytes-per-epoch to just log(epochs_per_weak_subjectivity_period) ~= 2-bytes to reduce the load to 75GB of surround meta-data, but this still seems on the high side for simple search meta-data

@protolambda

This comment has been minimized.

Copy link

@protolambda protolambda commented Jul 19, 2019

See https://github.com/protolambda/eth2-surround for more options/ideas to implement surround matching with.
With the last option, using manhatten distances, you end up only spending 1 bit per validator per manhatten index. So that would be (54000+3150)*300000 bits = ~2.14 GB.
Then add a simple index which tells which attestations have been seen to match a given manhatten index (1 best case, ~2 average case with reasonable finality) to look up the actual attestation with.
And things could be faster with a few layered indices (batches of validators, instead of per-validator). So that would be approx. 2.2 - 2.5 GB for 8 months of matching data. (3.3... % of the previous optimized 75GB case)

@shayzluf

This comment has been minimized.

Copy link

@shayzluf shayzluf commented Aug 5, 2019

@protolambda still we have to store the attestations themselves in order to create a slashing container ? 54000 * 2344 committees per epoch * 328 bytes ~ 41gb using aggregated attestations

@protolambda

This comment has been minimized.

Copy link

@protolambda protolambda commented Aug 5, 2019

@shayzluf yes, that is the minimal requirement for eventual use. But 99% of the time, you likely don't use it. So ideally you put the old attestations into cheap (and possibly distributed/decentralized) storage, and only deal with the data that enables detecting the slashable events (the 2-3GB mentioned earlier) as efficiently as possible. (And this may be with some error rate, it's more than fine to look up an underlying attestation some small X% of the time, if detection is not 100% strict).

@shayzluf

This comment has been minimized.

Copy link

@shayzluf shayzluf commented Aug 25, 2019

@protolambda While we were implementing the recommended double vote detection method using a bitlist, we faced an issue where aggregated attestations with overlapping indices triggered a false positive for a slashable offense. A single bit per validator doesn't seem like enough to prevent this false positive. Do you have any suggestions on preventing the false positives for this case from occurring?

For example: Say we have a setup with 2 nodes, and 3 validator sets A, B, C. If we receive one aggregated attestation of set A + B, and then receive an aggregated attestation for the same data but of B + C, the bitlist per validator method would report the second attestation as a slashable offense for all validators in set B.

@djrtwo

This comment has been minimized.

Copy link
Collaborator Author

@djrtwo djrtwo commented Aug 26, 2019

So what we want to do is cheaply decide if it is worth it to do a more expensive DB lookup. In such a case, we can tolerate some false positives (if they aren't cheap to construct), but don't want any false negatives.

Unfortunately even without the above aggregation exploit, you could just send a full duplicate attestation (A in database, and A comes in on the wire sometime in the future) and it would trigger the false positive. This is true as long as we don't keep any data easily accessible about the attestation. Add the set of 32-byte hashes associated with the various AttestationData each epoch and mapping them to indices could help solve this at the cost of 32 times the diversity of AttestationData per epoch (assume min of 1, max of 10 or so). In the case above (A+B and B+C), we would have the associated hash of AttestationData for the original A+B and see that B isn't at fault for B+C.

At the end of the day, this and similar methods don't fully solve the problem. There is a DoS issue prior to this check. Step 1 is to figure out which attesters are even a part of the attestation and if the signature is valid. If an historic attestation comes in from the wire, the only way to find out this information is to pull up the historic BeaconState context and check the attestation against the historic shuffling (active validator set and seeds of randomness). This is an expensive DB read that could also be associated with expensive computation depending upon how frequently historic states are stored in the DB.

This becomes a major DoS vector as an attacker can even send invalid attestations to a policing node for free (not slashable) causing the node to do expensive historic lookups.

Still more thought to be done on this :/

Personal notes that might be useful

Goal:

  • shard responsibility across nodes such that the load on each does not exceed O(C) but that the entirety of the weak subjectivity period is protected

Load:

  • validating attestation (involves old state lookups)
  • checking attestation against other historic attestations

Exploits:

  • sending historic valid attestations already contained in the DB still causing the node to do some sort of old state lookup
  • sending invalid historic attestations

Needs:

  • add slot to AttestationData is requisite to easily shard responsibility without doing any DB lookups or validations

Questions:

  • can validation be cheap enough for sharded responsibility? Can nodes store a small enough set of BeaconStates to keep attestation validations of historic attestations cheap?
  • once attestations are validated, can we cheaply (without "costless" false positives) detect collisions?
  • how small per validator can we make the policing load?
@protolambda

This comment has been minimized.

Copy link

@protolambda protolambda commented Nov 1, 2019

New progress on surround vote matching: https://github.com/protolambda/eth2-surround#min-max-surround
The min-max tracking should be sufficient for slashings, and although the naive thing takes 64GB, there is a lot of repetitive data which I think allows for compression to the 1-2 GB range :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.