-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dissociating routing table membership from connection state #283
Comments
There are some complications in removing the requirement that entries correspond to an active connections. It's desirable to have routing tables be persisted, however the entries are worthless without the associated peer addresses, as it's not cost-effective to resume a saved routing table if the peers must be located by querying the network, and to do so may require bootstrapping all over again (defeating the purpose for the most part). Unfortunately, as far as I can tell, the address expiries are controlled by the peer store, and the time for which a routing table entry may lie dormant is indeterminable (or at least should be for maximum reliability). The routing table should be able to maintain an up to date set of addresses for a peer, but I don't think that's trivial with the existing abstractions (there would need to be racy periodic updates and propagation). Possibly by implementing routing table pinging/grooming (per a standard implementation), any entry still in the routing table when it is persisted could be relied upon to have have unexpired addresses in the peer store. |
https://github.com/libp2p/go-libp2p-kad-dht/tree/decouple-routing-table tracks my efforts to implement this. |
This is a very complex issue, but I'd say it's evidence that we need to rethink the interfaces between the connection manager, peer store, and routing table. In general, peer store and routing table entries (for nodes that aren't client only) should have the same lifetime, right? If we have a routing table entry, there is no reason to throw out its addresses, and if we have addresses for a node (that isn't client only) we might as well have it in the routing table. This really doesn't fit the current structure, but what about combining the peer store and routing table into one data structure that we add any node we communicate with to? The eviction policy would then be to remove nodes that have expired TTL (like the peerstore does now) AND are in a sufficiently full bucket. There's really no need to only allow routing to at most K specific nodes like in the traditional kademlia as long as we aren't short on memory. |
Thanks @jhiesey. Getting the peerstore addresses and routing table to play nicely together is definitely a tricky part here. The other tricky thing is where to handle prioritization of peers in each bucket, the current solution is inadequate. |
@bigs assigning this to you in case you wanna tackle it as discussed elsewhere! ;-) |
@raulk i've had some time to dig into this today. i'm thinking of tackling in stages. first, i'll take care of the routing table persistence. after that, it gets foggier. it's clear we need to persist dht peers and their addresses, but obviously we don't want to let that grow unbounded. perhaps in a first iteration, peers are kept in the peerstore with the "permanent" ttl until they're evicted from the routing table. it seems like we have an actual use for ping, now, as a culling mechanism. |
I was reading through the wip PR for the kad-dht spec and the Kad paper, and it really drove home the importance of this issue. A full routing table in our DHT needs 5120 entries (256 buckets * k=20), but the practical connection limit is much lower than that. I don't know what people are running in production, but the default high water mark for the connmgr in go-ipfs seems to be 900 connections, or ~17% of a full kad table. In practice, it seems nobody ever has a full routing table anyway, but since we're pulling peers out of the table randomly, we can't really satisfy this property from the paper:
Since there's nothing stopping us from removing the only peer in a bucket if they temporarily disconnect. The quickest and dirtiest fix I can think of would be to still use connectedness as a heuristic for eviction, but move the check so that it only happens when deciding whether to evict a node from a full k-bucket. In other words, don't remove a peer from the table when they disconnect, but if a bucket is full, evict the least-recently-seen node that isn't currently connected. Overall though, I think connectedness is the wrong way to think about this. We care about whether we can get a message to the peer, not if we're currently connected to them. But it might provide some short-term benefit while we work on a better eviction policy. |
Agreed, IPFS-Kad requiring peers to stay connected misses a CRITICAL property of Kad design. Kad has an Extreme bias (only property it cares about) towards long lived peers. The only eviction policy for a peer is that the peer fails an explicit health check. Even peers that fail to respond to queries does not cause immediate eviction in order to be resistant to temporary connection hiccups as truly dead peers will quickly become eligible to the health check mechanism. As for the health check the Kad spec is kinda weird as its initial description in section 2.2 is soon altered in section 4.1 for being problematic, specifically "the algorithm as described would require a large number of network messages". This is likely why most Kad implementations have different solutions to health checks.
Except for IPFS-Kad a property that is shared by all these different implementations is to only remove peers that have been proven to be dead. This miss-attribution that Disconnected = Dead, combined with the connection manager itself often having thrashing issues is causing the routing table to be quite unstable. |
Just stopping by to provide some production values, the current IPFS gateway deployments have |
Thanks for those numbers @lanzafame, that's very interesting. It helps explain why the DHT works in practice, since there are bound to be some "infrastructural" nodes that have high connection limits and are generally well-connected. Also thanks for the writeup on the different health check strategies @Chris-R-111, that's very interesting. I had been thinking along the lines of putting questionable nodes into a "testing" queue, which sounds like it's basically the Bittorrent approach. The sad thing is that the connection limits are asymmetric. Even if our gateway nodes have high connection limits, the other nodes in their routing tables may not. If a "regular node" is in the routing table of a gateway node and needs to close the connection because they're at their limit, the gateway node will then forget about the regular node completely and damage its own routing table as a result. If participating well in the DHT requires maintaining thousands of long-lived connections without interruption, it seems like that puts the bar too high for anyone not located in a data center 😞 |
@yusefnapora - The routing table should never be full. That would mean there are 20 * 2^256 nodes on the network. The current 900 peer limit means we could have 45 full buckets or 20 * 2^45 nodes on the network. However, that doesn't change the fact that the assumption that disconnected = dead causes some critical problems with the DHT. |
@jacobheun We always mark a peer we lose connection with as missing right ? Irrespective of whether we/the other party is a client/server & irrespective of whether the disconnection was caused by the connection manager/transient network issue/ actively closed by the other peer. |
@aarshkshah1992 we should only be having DHT servers in our routing table, following the other planned changes. |
Considering the design discussions above and this great blog post by Arvid Norberg on RT maintenance based on solving the same problem for libtorrent (which in turn is based on section V of this paper), I'm tempted to try out the following approach: Approach:
Notes on address management
Why I like this approach
|
@aarshkshah1992 Thanks for doing this research. Indeed these mechanisms are necessary since day 0 in pure Kademlia because the underlying protocol is unreliable (UDP); therefore you rely on Taking one step back, this problem/solution domain can be broken off in various abstract pieces that, together, make a coherent solution:
Going back to the concrete proposal, @aarshkshah1992:
|
To paint the background a little bit more, there are items of (S/)Kademlia we can take as prescriptive, and others that we can't. The algorithmic elements we are taking as prescriptive, but the items related to connectivity and transports, we really can't. One example: merely using LQT as a heuristic is suboptimal, because it assumes a disconnected scenario (UDP). In our case, we might actually hold a connection with that peer, which is kept alive by TCP and the multiplexer, so even if we haven't queried that peer in a long time, we have a high confidence that it's alive (albeit not healthy?). |
@raulk Thanks for the great reply.
This is a great point and I agree that the problem has a "generic" scaffolding & lends itself to a pluggable strategies solution very well and this should indeed be the design we go ahead with.
Even the current DHT implementation & the original idea suffers from the same problem. We do not currently limit how many peers suggested by a single peer are inserted into the K-bucket as long as we are able to connect to them. IMO, this deals with a larger scope of work of making DHT resistant to eclipse/sybil attacks. We could impose a limit here but this problem shouldn't guide which approach we go ahead with here as it is a shortcoming of both the approaches.
Like I said, I am all in for going ahead with a pluggable event-based strategies solution. However, is there any specific reason we'd prefer the original idea over the one proposed here as the default implementation?
|
Sure, all the latter approach does is send it a |
We had some off-band discussion. Both solutions (Arvid and ours) are fundamentally akin; they do not compete. They basically vary in two parameters: frequency of validation / bootstrapping, and the peer selection function. We can make those aspects (and others) parameterisable. On a second order of things, the points that you raise about the peerstore and address management are all valid. If we stay disconnected from a peer too long, two things can happen:
In both cases we’d need to find that peer. Kademlia favours older peers, so we'd rather validate that peer than find a new one. |
It's definitely not. This is a common misconception that I want to make sure doesn't get propagated. The long-term directions are:
Really, the long term direction is likely packet-switched overlays over arbitrary network transports. |
Copying over notes from a call with @aarshkshah1992 earlier today:
|
ping @Stebalien |
We split it off because we were planning on (or did?) create alternative DHTs. However, I don't have any strong objections to merging it back in if the separation is causing problems. Note, we're not the only users: https://godoc.org/github.com/libp2p/go-libp2p-kbucket?importers.
Makes total sense.
Great idea!
👍
👍 |
Currently routing table entries are evicted if a persistent connection to them is lost. Due to the unpredictable nature of connectivity, this causes several problems. The routing table becomes sparse over time and without constant activity, as peers disconnect and leave buckets partially full. This directly weakens a critical property of a functioning Kademlia-style network. Furthermore, disconnects across the board can empty the routing table entirely, and the implementation currently doesn't recover from a bad routing table state.
Edit (@jacobheun):
Design notes
Testing mechanics
Success Criteria
The text was updated successfully, but these errors were encountered: