Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A malicious host could cause a denial of service by manipulating tick() #99

Open
achamayou opened this issue May 29, 2019 · 2 comments
Open

Comments

@achamayou
Copy link
Member

Reported by @dantengsky in #86

Suppose a service composed of 3 nodes {n0 .. n2} , all nodes are synced(same term, index) at beginning.

Adversary controls a minority {n0}. (Enclaves are not compromised)

n0 is in Follower state, adversary may modify the code of untrusted-zone, so that AdminMessage::tick messages are sent to enclave much more frequently, and with large enough elapsed_ms value to trigger timeouts.

If I get it right, the victim enclave will keep sending RequestVote messages to peers, and because messages are constructed by the enclave, other peers will treat the RequestVote messages as legitimate, the honest leader will also transit to Follower state.

The adversary also drops in-bound messages to the victim enclave, so that victim enclave can not transit to Leader state, hence no AppendEntries messages will be sent.

The malicious node keeps being the first RequestVote message's sender for each new term, the cluster will be effectively shutdown.

Network is still partially synchronous, a majority is still alive, but liveness no longer held.

The most straightforward fix for this is to execute the random election timeout inside the enclave, to make sure it isn't shorter than a lower bound.

@eddyashton
Copy link
Member

We've noticed a similar issue in one of our tests. We suspend the leader node for some time, to force an election. The other node choose a new leader and happily make progress, but when the original leader is unsuspended it gets an unusually large tick (covering the entire span of its suspension time), and this triggers an election. We explored some mitigations for this, but fundamentally it falls into the same category; Raft requires regular, accurate time updates from the host, and without these it is possible to trigger spurious elections.

The only fix is some form of trusted time within the enclave, perhaps from node-gossip channels or perhaps from spinning to spend time within the enclave, but we have no firm plan for this yet.

@achamayou
Copy link
Member Author

achamayou commented Apr 19, 2021

We think implementation of the PreVote extension to Raft (https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf 9.6, ticketed in #2577 ) will mitigate this problem without requiring an expensive busy-wait.

@shokouedamsr shokouedamsr added this to the 3.x milestone Jul 25, 2022
@achamayou achamayou added 4.x and removed 3.x labels Oct 7, 2022
@shokouedamsr shokouedamsr removed the 4.x label Dec 5, 2022
@shokouedamsr shokouedamsr removed this from the 3.x milestone Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants