Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timing attack #2101

Closed
wants to merge 9 commits into from
Closed

Fix timing attack #2101

wants to merge 9 commits into from

Conversation

adiasg
Copy link
Contributor

@adiasg adiasg commented Oct 13, 2020

The PR introduces a simple fix for the fork choice timing attack presented in this paper: https://arxiv.org/abs/2009.04987

By making the attestation production time unpredictable to the attacker & unique for each validator, we make it harder for an attacker to separately influence the fork choice of disjoint subsets of validators by sending well-timed messages to each set, such that these messages are not gossiped with the other subset of validators before attestations for the slot are produced.

@mcdee
Copy link
Contributor

mcdee commented Oct 14, 2020

Has there been any consideration to the impact this could have on attestations going "too early" and missing the current block?

From local measurements, at current around 90% of blocks are received by my local validator before the 4 second mark. However, only around 60% of blocks are received before the 2 second mark. Obviously we'd expect a spread of validators across the 2-6 second range, but it does appear that this will reduce the % of validator clients that will use the block of the current slot as in their attestation.

@adiasg
Copy link
Contributor Author

adiasg commented Oct 14, 2020

Tuning the numbers according to real-world data was the motivation for converting the constants to configuration parameters 😃

Given your observations about block timings, it makes sense to change the attestation production time to 4 ± 1 sec from the start of the slot.

@mcdee Can you share the collected data so we can analyze this more?

@mcdee
Copy link
Contributor

mcdee commented Oct 14, 2020

blockdelay.txt

Here's a dump of the last ~7.5K blocks against one of my validator clients, standard prometheus histogram.

@adiasg
Copy link
Contributor Author

adiasg commented Oct 14, 2020

Thanks for sharing! Looks like ~75% blocks are seen within 3 seconds and ~85% are seen within 4 seconds of the slot start, so 4 ± 1 sec is a reasonable configuration.

@djrtwo djrtwo changed the base branch from dev to v1.0-candidate October 14, 2020 21:34
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty minor comments

Also, we use the magic number of 3 in aggregation broadcast. consider updating to the constant


A validator should create and broadcast the `attestation` to the associated attestation subnet when the earlier one of these two events occurs:
- the validator has received a valid block from the expected block proposer for the assigned `slot`, or
- `SECONDS_PER_SLOT/3 + slot_timing_entropy` seconds have elapsed since the start of the `slot` (using the `slot_timing_entropy` generated for this slot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `SECONDS_PER_SLOT/3 + slot_timing_entropy` seconds have elapsed since the start of the `slot` (using the `slot_timing_entropy` generated for this slot)
- `SECONDS_PER_SLOT / 3 + slot_timing_entropy` seconds have elapsed since the start of the `slot` (using the `slot_timing_entropy` generated for this slot)

specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
adiasg and others added 2 commits October 14, 2020 15:35
Co-authored-by: Danny Ryan <dannyjryan@gmail.com>
@@ -391,7 +393,13 @@ def get_block_signature(state: BeaconState, block: BeaconBlock, privkey: int) ->

A validator is expected to create, sign, and broadcast an attestation during each epoch. The `committee`, assigned `index`, and assigned `slot` for which the validator performs this role during an epoch are defined by `get_committee_assignment(state, epoch, validator_index)`.

A validator should create and broadcast the `attestation` to the associated attestation subnet when either (a) the validator has received a valid block from the expected block proposer for the assigned `slot` or (b) one-third of the `slot` has transpired (`SECONDS_PER_SLOT / 3` seconds after the start of `slot`) -- whichever comes _first_.
For each `slot`, a validator must generate a uniform random variable `slot_timing_entropy` between `(-SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR, SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR)` with millisecond resolution and using local entropy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the entropy be shared by multiple validators that is served under the same beacon node?

this will simplify the beacon node implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the entropy can be shared by multiple validators that are served under the same beacon node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the source of entropy can be shared or that randomly selected slot_timing_entropy value can be shared by all validators served by the same beacon node?

There are significant performance implications of every individual validator has to select the latest head and create its attestation at a different time. Currently a Validator Client only needs to ask the beacon node to create at most one AttestationData per committee per slot because all validators in that same committee can create an attestation from that AttestationData. And all validators can share the same selected head block.

With this change, if the value of slot_timing_entropy can't be shared, the number of validators a beacon node could support would be significantly reduced as it would need to update fork choice and create a new AttestationData for each individual validator.

Copy link
Contributor Author

@adiasg adiasg Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the gist of this fix:

By making the attestation production time unpredictable to the attacker & unique for each validator, ...

The attestation production time doesn't have to be unique for each and every validator. However, it is absolutely crucial that the attestation production time is unpredictable for anyone who does not control the validator and/or beacon node (for clients where the beacon node is the driver of validator duties). So, validators served by the same beacon node can have the same attestation production time, i.e., they can share the source of the entropy and the actual slot_timing_entropy value.

@vbuterin
Copy link
Contributor

I'd agree that this could make it harder for attacks, but I don't think it's a substitute for deeper changes (eg. my "the proposer has 1/4 slot weight" proposal) that provide liveness in the standard model (attacker chooses the latency of every message within the bounds [0, delta]).

The attack under this proposal (ie. this PR) would be: the attacker connects to every node (eg. by connecting to the network with a huge number of nodes and just waiting until they get included in the network and they make up 80%+ of all nodes in the network), and then splits the network 50/50 by broadcasting a set of attestations at exactly the time window when the slot_timing_entropy is right in the middle of its probability distribution. The attacker would have better connections to all the nodes than the nodes have to each other, so them accomplishing this is well within the realm of possibility.

@adiasg
Copy link
Contributor Author

adiasg commented Oct 15, 2020

The goal of the PR is to provide some satisfactory mitigation of the attack in v1.0 of the spec, while having relatively low code impact and low risk of the proposed changes. In addition, this fix is definitely backwards-compatible.

Since the attack is feasible & has become well-known by now, it would be a bad move to go ahead with v1.0 without any fixes.

For each `slot`, a validator must generate a uniform random variable `slot_timing_entropy` between `(-SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR, SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR)` with millisecond resolution and using local entropy.

A validator must create and broadcast the `attestation` to the associated attestation subnet when the earlier one of the following two events occurs:
- The validator has received a valid block from the expected block proposer for the assigned `slot`. In this case, the validator must set a timer for `abs(slot_timing_entropy)`. The end of this timer will be the trigger for attestation production.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the attack vector of sending the attestation on block receipt? that has some randomness built into it "naturally"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to mitigate the risk from an adversary who has faster connections to all validators than what the rest of the validators have between themselves. There are already some "Layer-0" projects in this space that provide this as a service (either currently, or will do in the near future), e.g., bloXroute and Marlin.

An attacker with this capability would be able to trigger attestation production at a predictable time of its choosing by always being the first one to inform validators about a new block. Hence, adding the timing entropy to make this attack vector unfeasible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, the time block_arrived + abs(slot_timing_entropy) should be capped at SECONDS_PER_SLOT / ATTESTATION_PRODUCTION_DIVISOR + slot_timing_entropy, in the worst case we'd see the block being sent out at slot + 6s, effectively, instead of slot + 5s being the maximum, which starts being very close to the aggregation cutoff time increasing the risk of loss of reward.

@adiasg
Copy link
Contributor Author

adiasg commented Oct 16, 2020

The attack under this proposal (ie. this PR) would be: the attacker connects to every node (eg. by connecting to the network with a huge number of nodes and just waiting until they get included in the network and they make up 80%+ of all nodes in the network), and then splits the network 50/50 by broadcasting a set of attestations at exactly the time window when the slot_timing_entropy is right in the middle of its probability distribution. The attacker would have better connections to all the nodes than the nodes have to each other, so them accomplishing this is well within the realm of possibility.

Yes, this attack is possible even with the fix from this PR in place. However, the chance of success of the attack is substantially lower than before!

Let's label the time when slot_timing_entropy is right in the middle of its probability distribution as t_attack, and assume that the network is synchronous. Then,

  • half of the validators will have already produced attestations by t_attack. These validators are unaffected by the attack, and make attestations for a single chain.
  • the other half of the nodes will produce attestations sometime within (t_attack, t_attack + SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR).
    • The attacker will be sending two different messages to two distinct subsets of validators (say, subset A and B) such that the messages are conflicting. These messages are only useful to the attacker when they are not gossiped between validators in A and B.
    • Let min_network_latency be the minimum network latency between validators in A and B. Then, the attacker will only be able to influence the attestations of validators who are scheduled to produce messages between (t_attack, t_attack + min_network_latency). On average, this will be the fraction min_network_latency * ATTESTATION_ENTROPY_DIVISOR / SECONDS_PER_SLOT. With the current values for the constants, this comes out to be min_network_latency (the numerical value in seconds).

@nrryuya
Copy link
Contributor

nrryuya commented Nov 9, 2020

I'm curious on how valuable this fix is (e.g., how weaker the network model where the liveness is guaranteed becomes, how much the fault tolerance changes) compared to the additional complexity of the implementation, the effect on the efficiency of the attestation aggregation, and the risk of unknown side effect (for instance, this fix will affect the analysis of the incentive compatibility of the timing of attesting.).

These messages are only useful to the attacker when they are note gossiped between validators in A and B.

The attacker's attestations are useful even if some portion of validators receive the attackers' attestations from the other subset and switch the chain to vote for. The difference of the two target chains' scores at the end of the current slot is |(the validators in A who switched) - (the validators in B who switched)|. If this difference is smaller than the number of the attacker's attestations in the next slot, the attacker can make the tie again. (Although in the original attack in the Ebb-and-Flow paper the attacker uses a few attestations per slot, the attacker can have much more attestations per slot proportionally to its stake.)

Let min_network_latency be the minimum network latency between validators in A and B

From the above observation, to consider the minimum network latency between the two subsets is not enough. We need to precisely analyze how many attacker's attestations are exchanged within the time window and how large the difference of the scores becomes as a result.

@djrtwo
Copy link
Contributor

djrtwo commented Mar 10, 2021

closing this. likely to go another path

@djrtwo djrtwo closed this Mar 10, 2021
@adiasg adiasg added the scope:security General protocol security-related items label Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:fork-choice scope:security General protocol security-related items scope:v-guide Validator guide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants