Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHT improvement ideas #6

Open
jhiesey opened this Issue Sep 3, 2018 · 1 comment

Comments

2 participants
@jhiesey
Copy link

commented Sep 3, 2018

Although I don't see any single magic solution to making a secure and performant DHT, there are a lot of good ideas out there beyond what is implemented in libp2p so far. Here are some, divided into security and performance sections.

The "really should try" sections are based on examining the js implementation; I haven't examined the go version to see if it's different.

Security

The most serious security concern with a DHT is the Sybil attack, where an attacker controls a large number of DHT nodes at once. There are a few things this lets the attacker accomplish:

  • Eclipse attack/adverserial routing: Malicious nodes can choose their ids such that they are on the lookup path to specific targets. This allows the attacker to give incorrect routing information to hide one or more nodes.
  • Churn attack: Malicious nodes disappear from the network en masse either to cause all data for specific key to disappear, or just cause general chaos as routing tables become mostly useless.

Preventing Sybil attacks is one of the major areas of DHT research in the last decade.

There are also other ways to attack a DHT; for example, by flooding nodes with put or provide messages that are incorrect. From talking to @whyrusleeping it sounds like verifying signatures is in place for put, but bad provides aren't verified. I haven't investigated solutions to this.

Ideas we're already using

  • The routing table prefers to keep the oldest nodes over recently added ones. This makes it hard for an attacker to quickly take over the network, but can be subverted if an attacker is sufficently patient. This is part of the original Kademlia paper[1].

  • Put messages are signed where possible. This allows nodes to immediately reject messages without a valid signature. This doesn't apply to provide, however.

Ideas we really should try

  • Parallel queries should use disjoint paths. For both performance and redundancy, nodes do multiple lookups in parallel. The lists of nodes used by each of the parallel queries should be disjoint, so that each malicious node can't misdirect more than one of a set of parallel queries. This is one of the improvements in S/Kademlia[2]. See libp2p/go-libp2p-kad-dht#146

Ideas to consider

  • Limit node ids. S/Kademlia proposes restricting node ids such that H(id) begins with a configurable number of zeros (a crypto puzzle/PoW)[2]. This ensures that creating a valid node id takes some computational work, providing minimal protection against Sybil attacks. S/Kademlia further proposes a mechanism for increasing the work requirement after an id is created; see the paper for details. Unfortunately, this provides no protection against stockpiling valid ids ahead of time for a coordinated attack.

  • Require constant computation on puzzles. SybilControl[6] proposes a system where each node solves puzzles that depend on values recently provided by neighbors. Nodes that fail to provide valid proof of work can be ignored. This ensures that an attacker must have computational resources continuously available to attack the network. At a high level, as time progresses each node generates a series of challenges and respective solutions. Each challenge is dependent on both that node's previous challenges and the challenges sent by it's neighbors (e.g. routing table entries). This process ensures that every node's challenge is ultimately dependent on all other nodes' challenges after a long enough time. To verify along a complete lookup path, each node's solution is verified to be correct, and its respective challenge is verified to correctly depend on the challenge of its neighbor that is the previous node along the lookup path.

  • ID layering. Whanau[4] uses a clever approach to decide which keys a node is responsible for. Each node has a layer 0 id, which is sampled from the keys stored in the DHT. This ensures that these node ids are evenly distributed around the keys that are actually in use, assuming nodes are honest. If nodes are not honest, however, they may choose ids clustered around keys they wish to attack. To avoid this, each node creates an additional layer 1 id that is sampled from the distribution of layer 0 ids. This ensures that any malicious id clustering on layer 0 is mirrored by a cluster of honest nodes on layer 1. This process is repeated for additional layers; see the paper for details. The takeaway is that each node is responsible for a bunch of different parts of the keyspace, most of which are selected to counter malicious (or unlucky) clustering on the previous layer.

Ideas we probably shouldn't use

There are a lot of ideas in the literature that aren't currently applicable to libp2p but are worth knowing about.

  • Trust based on a social graph. Whanau uses a social trust graph as input, with two major assumptions: 1, that an attacker has limited ability to create edges between honest nodes and malicious nodes, since doing so requires social engineering; and 2, that the honest part of the social graph is "fast mixing"[4]. Unfortunately, the fast mixing requirement apparently doesn't reliably hold for real networks[5]. Using social trust to prevent Sybil attacks is a relatively active area of DHT research. Unfortunately, I don't see how libp2p would get such social graph data as input, so I suspect this won't be useful for us.

  • Massively greater replication, as proposed in Whanau[4] among many others. A DHT is much harder to attack, and much faster for lookups, if nodes have a large fraction of the data. For example, if the entire DHT stores n keys, then each node could store O(sqrt(n)) keys. On get, we can send O(sqrt(n)) parallel requests, giving O(1) expected round trip times per lookup. Unfortunately, each node needs to store O(sqrt(n)) values, which grows annoyingly fast.

  • Committee-based membership. In[8], a design is proposed where a committee of nodes is chosen to decide what nodes are allowed into the network (which may be a DHT or something else). Once the committee uses a Byzantine consensus algorithm to decide on an allowed membership list, this list is broadcast throughout the network. Unfortunately, this requires broadcasts when new nodes join, so it is probably not scalable for use in a DHT.

Performance

Ideas we really should try

  • On a successful get, we re-publish the value to nodes on the lookup path. Subsequent gets to hot keys will find the result without. This is part of the original Kademlia algorithm that doesn't appear to be implemented.

  • Sloppy provides. Analogous to re-publishing along the lookup path on get, provide doesn't need to go all the way to the "correct" node. If there are a lot of providers for a given key, new providers should be added to nodes on the lookup path. This is one of the fundamental innovations in Coral[3].

Ideas to consider

  • ID layering, like in Whanau[4]. See discussion in security section above.

  • Locality-based routing. Coral[3] builds multiple instances of the DHT at different scales, such that all nodes in a more local instance are within a maximum latency bound. In Coral, lookups run in parallel at all scales, so the expected lookup time improves without making the upper bound worse. I suspect a similar idea could be applied to Kademlia as well. Unfortunately, the algorithms to merge and split local instances as network conditions change are moderately complex.

  • Adjustable node degree vs. number of hops tradeoff. Given n nodes, Koorde[7] allows any option from degree 2 nodes with O(log n) lookup hops to degree O(log n) with O(log n/log log n) lookup hops. This flexibility could be nice but probably isn't a priority.

Ideas we probably shouldn't use

  • Vastly greater replication (as in Whanau) is probably not practical, but could vastly improve lookup time (see security section). The extreme case are so-called "1 hop DHTs," where on average one round trip time is needed per lookup. Requires relatively large storage on each node.

Other thoughts

It would be very helpful to be able to test on real, live networks. iptb is a nice start, but it doesn't realistically simulate network connectivity, latency, or churn.

Could we allow libp2p nodes to connect to a command-and-control server to allow running test queries, either on an opt-in or opt-out basis? This would give us far more insight into what's really happening.

References

1

Petar Maymounkov and David Mazieres. "Kademlia: A Peer-to-peer Information System Based on the XOR Metric"

2

Ingmar Baumgart and Sebastian Mies. "S/Kademlia: A Practivable Approach Towards Secure Key-Based Routing"

3

Michael J. Freedman and David Mazieres. "Sloppy hashing and self-organizing clusters"

4

Chris Lesniewski-Laas and M. Frans Kaashoek. "Whanau: A Sybil-proof Distributed Hash Table"

5

Xiang Zuo and Adriana Iamnitchi. "A Survey of Socially Aware Peer-to-Peer Systems"

6

Frank Li, Prateek Mittal, Matthew Caesar, and Nikita Borisov. "SybilControl: Practical Sybil Defense with Computational Puzzles"

7

M. Frans Kaashoek and David R. Karger. "Koorde: A simple degree-optimal distributed hash table"

8

Diksha Gupta, Jared Saia, and Maxwell Young. "Proof of Work Without All the Work: Computationally Efficient Attack-Resistant Systems"

@raulk

This comment has been minimized.

Copy link
Member

commented Sep 21, 2018

The routing table prefers to keep the oldest nodes over recently added ones. This makes it hard for an attacker to quickly take over the network, but can be subverted if an attacker is sufficently patient. This is part of the original Kademlia paper[1].

Currently this isn't the case in go-libp2p. When we encounter a new node, we evict the least recently seen one from the bucket. The Kademlia paper recommends PINGing the eviction candidate and only replacing it if it's unresponsive, but we don't do that currently AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.