A malicious host could cause a denial of service by manipulating tick() #99

achamayou · 2019-05-29T13:25:53Z

Suppose a service composed of 3 nodes {n0 .. n2} , all nodes are synced(same term, index) at beginning.

Adversary controls a minority {n0}. (Enclaves are not compromised)

n0 is in Follower state, adversary may modify the code of untrusted-zone, so that AdminMessage::tick messages are sent to enclave much more frequently, and with large enough elapsed_ms value to trigger timeouts.

If I get it right, the victim enclave will keep sending RequestVote messages to peers, and because messages are constructed by the enclave, other peers will treat the RequestVote messages as legitimate, the honest leader will also transit to Follower state.

The adversary also drops in-bound messages to the victim enclave, so that victim enclave can not transit to Leader state, hence no AppendEntries messages will be sent.

The malicious node keeps being the first RequestVote message's sender for each new term, the cluster will be effectively shutdown.

Network is still partially synchronous, a majority is still alive, but liveness no longer held.

The most straightforward fix for this is to execute the random election timeout inside the enclave, to make sure it isn't shorter than a lower bound.

eddyashton · 2020-05-19T14:55:57Z

We've noticed a similar issue in one of our tests. We suspend the leader node for some time, to force an election. The other node choose a new leader and happily make progress, but when the original leader is unsuspended it gets an unusually large tick (covering the entire span of its suspension time), and this triggers an election. We explored some mitigations for this, but fundamentally it falls into the same category; Raft requires regular, accurate time updates from the host, and without these it is possible to trigger spurious elections.

The only fix is some form of trusted time within the enclave, perhaps from node-gossip channels or perhaps from spinning to spend time within the enclave, but we have no firm plan for this yet.

achamayou · 2021-04-19T10:19:06Z

We think implementation of the PreVote extension to Raft (https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf 9.6, ticketed in #2577 ) will mitigate this problem without requiring an expensive busy-wait.

achamayou added the bug label May 29, 2019

achamayou mentioned this issue May 29, 2019

[question] Raft implementation & Node crash recovery #86

Closed

achamayou added the liveness label May 31, 2019

achamayou added consensus 3.x labels May 23, 2022

achamayou assigned jumaffre May 23, 2022

achamayou added p2 labels Jul 25, 2022

shokouedamsr added this to the 3.x milestone Jul 25, 2022

achamayou unassigned jumaffre Aug 10, 2022

achamayou added 4.x and removed 3.x labels Oct 7, 2022

shokouedamsr removed the 4.x label Dec 5, 2022

shokouedamsr removed this from the 3.x milestone Dec 5, 2022

shokouedamsr removed p2 labels Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A malicious host could cause a denial of service by manipulating tick() #99

A malicious host could cause a denial of service by manipulating tick() #99

achamayou commented May 29, 2019

eddyashton commented May 19, 2020

achamayou commented Apr 19, 2021 •

edited

A malicious host could cause a denial of service by manipulating tick() #99

A malicious host could cause a denial of service by manipulating tick() #99

Comments

achamayou commented May 29, 2019

eddyashton commented May 19, 2020

achamayou commented Apr 19, 2021 • edited

achamayou commented Apr 19, 2021 •

edited