Skip to content
This repository has been archived by the owner on Apr 21, 2022. It is now read-only.

Discuss: batch vs. on-demand connection pruning #19

Open
raulk opened this issue Aug 27, 2018 · 9 comments
Open

Discuss: batch vs. on-demand connection pruning #19

raulk opened this issue Aug 27, 2018 · 9 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/community-input Needs input from the wider community

Comments

@raulk
Copy link
Member

raulk commented Aug 27, 2018

The BasicConnMgr terminates connections every time adding a new connection makes the connection count exceed the high watermark. This is done via the TrimOpenConns() method. The trimming heuristics are as follows:

  • At least 10 seconds have elapsed from the previous trim.
  • The algorithm terminates connections with lowest rating/value first.
  • Connections opened within a grace period (default: 30 seconds) are pardoned from eviction.
  • As many connections as needed are terminated to make connection count == low watermark (batch termination).

Possible effects of the above behaviour

For the sake of illustration, let's assume:

{ LowWatermark, HighWatermark } := { 100, 300 }

Effects:

  • Aggressive changes in connectedness of a node, e.g. it may suddenly go down from 300 conns to 100 conns.
  • Resource wastage. We now have 200 empty slots for connections that are not being used.
  • Haphazardness. We don't have any guarantee when – or if – those 200 slots will be filled with peers that were healthier than the evicted ones.

An observer would see a chainsaw pattern in connection count.

Alternative approaches

  1. Evict connections on demand, like the JS implementation does: https://github.com/libp2p/js-libp2p-connection-manager/blob/master/src/index.js#L155
  2. (discuss)
@whyrusleeping
Copy link
Contributor

Nodes under heavy load will waste a lot of resources doing the thing that the javascript implementation does, especially as implemented in the javascript code (making an array of all peers we're connected to and resorting it each time we get a new connection). (Another note, the javscript impl appears to be broken, it tries to limit peer per protocol, but if it breaks that limit, it goes and disconnects any random peer).

The 'hard limit' approach isnt great, as it tends to behave poorly for usecases such as performing a dht query, or fetching some data over bitswap. It leads to us sitting all the time at the high end of how many connections we want to have, meaning every single new connect causes a disconnect (and things like DHT queries tend to cause many connects). Ideally, we would have some target number, and the connection manager would exert pressure to always move towards that number (this is the direction i was hoping to go with an 'upgrade' to the basic conn manager). Something like, while the number of connections we have is greater than our target, remove the least valued streams periodically at a rate proportional to how far away from the target our current count is. Meaning that if our connection count gets really high, we should be pruning connections really fast, but if its less than 5% higher than our target, do things more slowly.

btw, we should probably add in a priority queue for all peer values, should improve perf on both js and go.

Resource wastage. We now have 200 empty slots for connections that are not being used.

I wouldn't call it wastage, you should set your low water mark to the amount of connections you are okay with having persistently open. If you are lamenting the fact that you don't have 300 open connections, set your low water mark to 300.

Haphazardness. We don't have any guarantee when – or if – those 200 slots will be filled with peers that were healthier than the evicted ones.

Yeah, I'd love to have some other metrics, like bandwidth and latency (and maybe even libp2p version :P ) to help weight the decisions. but that is 1, tricky to measure accurately and 2, could lead to weird network partitions where peers with worse connections are segmented off into their own graph.

@whyrusleeping
Copy link
Contributor

Also, the javscript implementations lack of a grace period is worrying. If all existing peers are valued, and a new peer joins at the limit, the new peer will not have any value, and if i'm reading this code correctly, it will be dropped immediately.

@daviddias
Copy link
Member

Inviting @pgte, @jacobheun and @vasco-santos to join this thread.

@Stebalien
Copy link
Member

Killing off connections one-by-one as we add new connections can lead to a bunch of really short-lived connections. By trimming down to a low watermark, we give ourselves some time to establish new connections and figure out which ones will likely be useful before we need to start trimming connections again. Basically, we actually want a zig-zag pattern.

As @whyrusleeping noted, this also saves CPU cycles (which is why most GCs operate this way).

Personally, I'd think of the low watermark as the target and the high watermark as the "time to go back to the target" mark. Really, as @whyrusleeping said, we should probably have an explicit "target"

  1. LowWater - aggressively bootstrap.
  2. Target - normal operation.
  3. HighWater - aggressively trim to Target.

A better connection manager would also "politely" trim old connections when above Target.

@raulk
Copy link
Member Author

raulk commented Aug 29, 2018

Thanks for your feedback! I haven't concerned myself with the JS implementation, only with its behaviour. (Not proposing to use the same data structures or algorithms)

What I'm trying to postulate is that abrupt actions affecting connectivity can be harmful and can lead to undesirable scenarios and patterns. A steady flux can lead to more stable networks (conjecture).

As @whyrusleeping noted, this also saves CPU cycles (which is why most GCs operate this way).

I regard our problem definition as different than GC. In GC you are cleaning up stuff that's used by nobody, so you let it accumulate until those resources are needed.

In our case – IIUC – those connections are always in use. So it becomes a problem of optimising vs. cleaning. I suspect the system should push towards full resource utilisation, yet continuously optimising for high-value connections. In fact, I'm inclined to think that, in our case, max. # of connections is only a proxy for available bandwidth and sockets (as well as the memory and CPU that handling those require), which are the true resources to optimise for.

More generally, IMO there's no point in reserving unused capacity in this case, unless you have some protocols that require elasticity/stretch (see below).

On a related note, conn manager's decisions can impact affect overall network topology, e.g. terminating a seemingly low-value connection could lead to a network partition on the global level. But this is a different chapter.

btw, we should probably add in a priority queue for all peer values, should improve perf on both js and go.

+1. A heap-backed priority queue would help in both scenarios (batch or flux).

@whyrusleeping The 'hard limit' approach isnt great, as it tends to behave poorly for usecases such as performing a dht query, or fetching some data over bitswap. [...]

Completely agree, and I like where you're heading. It introduces the concept of elasticity. But rather than reserving a spare capacity, I'd like to introduce a mechanism whereby protocols can acquire temporary boosted allowances from the conn manager, for a limited time window.

Another property we could think about is plasticity: how the protocols themselves can shape the behaviour of the connection manager. After all, constraints applicable to beacon nodes, signalling servers or rendezvous servers in a network (requiring lots of short-lived connections) will differ to those that make sense for an IPFS node.

trying to optimise limited resources for more efficiency. If we're allowed to run 300 connections, we want those to be highly-scored (although we want to leave room for newcomers to the network).

Killing off connections one-by-one as we add new connections can lead to a bunch of really short-lived connections.

On some level, this is what the grace period is there for, correct? Allowing sufficient time for a protocol to calculate the weight/usefulness/score/value of a given connection, after its establishment.


Altogether I like the concepts of backpressure, polite trimming / disconnection notices, and the elasticity around a target, vs. hard limits.

There's definitely more research and thinking that needs to go into the overall design – lots of good ideas pouring in 💯

@Stebalien
Copy link
Member

For context, there are two reasons we keep so many connections open:

  1. It makes DHT queries faster (in theory) as we'll already be connected to many of the peers we need to query.
  2. It makes bitswap work as our current content discovery system doesn't work very well in practice.

We can improve (1) by:

  1. Improving NAT traversal/relaying/address-discovery.
  2. Switching to QUIC (fewer round trips).
  3. Improving protocol negotiation (working on it).
  4. Introducing connection suspension/resumption (e.g., using QUIC's 0-RTT) and keeping a bunch of "suspended" connections.

Unfortunately, improving the situation around (2) is a hard problem.

In theory, we should be able to get away with ~160 connections (conservatively), fewer for lighter clients. Really, the only connections we need to keep (other than those actively in use by, e.g., bitswap), are the DHT routing connections.


Note: Much of what I'm saying (except for the fact that keeping tones of connections kills routers) is untested. It should be correct (if my assumptions hold) but we should also try different approaches to see if they work better.

What I'm trying to postulate is that abrupt actions affecting connectivity can be harmful and can lead to undesirable scenarios and patterns. A steady flux can lead to more stable networks (conjecture).

The idea is that we keep way more connections than we really need and only trim them when we have way too many connections.

I regard our problem definition as different than GC. In GC you are cleaning up stuff that's used by nobody, so you let it accumulate until those resources are needed.

In theory, that should be the case here as well. If all goes well, we should only end up trimming connections that we aren't actively using. Unfortunately, the current "weight" system doesn't really capture this. Ideally, we'd have some way to mark connections as "in-use" and then never kill them.

Note: We don't just close these connections up-front because establishing a connection can be expensive. We keep them around in case we need them in the future.

In our case – IIUC – those connections are always in use. So it becomes a problem of optimising vs. cleaning. I suspect the system should push towards full resource utilisation, yet continuously optimising for high-value connections. In fact, I'm inclined to think that, in our case, max. # of connections is only a proxy for available bandwidth and sockets (as well as the memory and CPU that handling those require), which are the true resources to optimise for.

Unfortunately, routers suck. Take a look at ipfs/kubo#3320 for some context. Basically, keeping a bunch of connections open can make routers crash, ISPs send RSTs, etc.

More generally, IMO there's no point in reserving unused capacity in this case, unless you have some protocols that require elasticity/stretch (see below).

We definitely want to reserve "unused" bandwidth/memory/cpu capacity in most cases. Otherwise, we can't run on laptops, mobile, etc. As a matter of fact, users tend to complain about bandwidth overhead when running on cloud servers because bandwidth is expensive.

On a related note, conn manager's decisions can impact affect overall network topology, e.g. terminating a seemingly low-value connection could lead to a network partition on the global level. But this is a different chapter.

If we use it correctly, the probability of this should be vanishingly small due to DHT. The DHT forms an overlay network that's designed to avoid network partitions and we tag all peers in this overlay network with a non-zero weight.

On some level, this is what the grace period is there for, correct? Allowing sufficient time for a protocol to calculate the weight/usefulness/score/value of a given connection, after its establishment.

Yes. Unfortunately, it isn't always sufficient and tends to work best for connections we initiate (easier to determine that they're useful). This is exacerbated by the fact that bitswap doesn't currently give the connection manager enough information about which peers likely need us (need to get better about telling the connection manager about peers with outstanding requests).

Beyond this, batch killing should (untested) lead to longer, more useful connection lifetimes. For example, let's say we're serving some really popular content:

  • Without batch killing, we'll get new peers, keep them for 30s, and then immediately kill them because we have too many connections. While 30s is usually enough for us to determine if we want the connection, we want all the connections in this case so that doesn't help much.
  • With batch killing, we'll get new peers until we have way too many, kill a ton of connections, and then we'll hopefully be able to service the remaining connections for a slightly more useful period of time. This also leads to a bunch of churn but allows us to keep the connections open a bit longer.

Note: Currently, our batch-killing mechanism is actually kind of broken. When batch killing, we should be telling peers to back-off. Unfortunately, we just disconnect and then the peers immediately reconnect. Fixing this would probably drastically improve the behavior or the connection manager.

This is obviously sub-optimal and we could improve the situation by:

  • Introducing connection "locking" (i.e., telling the connection manger to leave the connection open until we remove unlock).
  • Introducing a polite "go away" protocol. That is, some way to tell all of our peers that we're under heavy load and we'd like them to disconnect unless they actively need us.
  • Introducing a less polite "backoff" protocol to tell peers to go away and not talk to us for some time.
  • Giving the connection manager better information.
  • Put backpressure on accepting new connections.

@Stebalien
Copy link
Member

By the way, thanks for discussing this in such detail. Quite a bit of this hasn't really been thought though thoroughly and almost none of the thinking that has happened has been recorded anywhere.

@whyrusleeping
Copy link
Contributor

Relevant:

Having a 'disconnect' 'back-off' or 'go-away' type protocol will help out dramatically. Also, a softer 'suspend' protocol, where we keep the connection, but neither side uses it, should help out in many low resource scenarios (i imagine this helping a lot for mobile and laptop use). A proper disconnect protocol should also help us avoid any sort of partitions during a batch disconnect, if a peer receives a disconnect and that disconnection would be detrimental to their topology, they could maybe send back a 'please let me stay connected' type request, which might give them a reset grace period, allowing them time to fill up their connection table.

I do like the idea of 'stretch', giving protocols the ability to temporarily expand the limits for certain operations.

Also worth noting: I initially wrote this code while the network was on fire and i was supposed to be on vacation. The room for improvement here is uncontested :P

@anacrolix
Copy link

My on-demand pruning algorithm.

@Stebalien Stebalien added kind/enhancement A net-new feature or improvement to an existing feature and removed feature labels Mar 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/community-input Needs input from the wider community
Projects
None yet
Development

No branches or pull requests

5 participants