Update on libp2p work #4029

whyrusleeping · 2017-07-03T16:43:28Z

The ipfs network is growing pretty quickly, and we need to move fast to avoid issues caused by ipfs not actually closing any connection. I did some thinking on this this weekend and came up with this issue.

Connection Closing

Currently, ipfs does not close any of the connections it makes, as the network
grows larger and larger this is becoming a very big problem. We need to
implement a way for ipfs nodes (and more generally, any application using
libp2p) to manage the number of connections they maintain. To make a decision
on whether or not to close a given connection, we need some information about
how valuable it is. For example, peers that are in the DHTs routing table, or
ones that are very frequent bitswap partners should be prioritized over
connections made just for an infrequent wantlist update or dht query. However,
we don't want to go around closing all but a very few 'valuable' connections,
It is preferable to keep as many connections open as resource constraints
allow. The cost of initiating a new connection is rather expensive
latency-wise. It is also useful to maintain a larger number of connections to
keep the effectiveness of bitswap wantlist broadcasts high.

To achieve this, We need a system to keep track of connections, sort out which
ones should stay open and which ones should be closed. This system should
accept hints from upper level callers about the connections. The DHT code
should be able to signal that a given connection is in its routing table,
bitswap should be able to mark favored partners, pubsub should be able to hold
connections to the peers in its swarms and so on.

My proposal is to add this functionality to the 'host' abstraction. A method
TagConn(conn, tag, val) should be added, that accepts a connection, a tag to
add to the connection, and an importance value for that tag. Also needed is an
UntagConn(conn, tag) method. A worker routine would periodically scan through
the connections, check if we have more than our limit, and close off
connections that have the least assigned value.

Lite Connections

Since it is advantageous to hold open more connections (even if they are
infrequently used), I propose we add the concept of a 'lite connection' that
doesnt count against the connection limit in the same way as normal
connections. These would be low cost connections such as a relayed connection
through another peer, a standard connection to a peer in the local area
network, or even a BLE connection to some other nearby node. The key here is
that these connections are cheaper to maintain. These connections should not be
used for any 'high bandwidth' applications, if high bandwidth is needed, a new
'heavy' connection should be created (so as not to abuse relays).
The ideal usage of these could be bitswap announcements, small amounts of dht
queries, pubsub or pings.

Usage of lite connections

The aim is to maintain fewer 'heavy' connections, and to close out connections
periodically as they become less useful to us. For this, nodes should maintain
a set of connections to peers that are willing to relay connections for them,
and relay multiple connections for other peers. In this way you can be
'connected' to many unique peers per physical 'heavy' connection.

Things to be done

Relay needs to be finished up and integrated https://github.com/libp2p/go-libp2p-circuit
ipfs swarm connect needs a more detailed 'verbose' mode to aid in connection debugging Add verbose mode to ipfs swarm connect #3746
ipfs dht print-table (name TBD) command to print dht routing tables
ipfs swarm gc to manually trigger a cleanup of connections
ipfs swarm limits to view and change swarm limits, perhaps select preset 'modes'
ipfs swarm peers -v should show which peer opened each stream (remote vs local)
connection tagging functionality needs to be added to 'host'
peers listing should show connection tagging information
'host' needs configurable routine to close out connections periodically
NewStream should add hints as context values to select things like 'Dont Dial' or 'Prefer Lite Stream'
Bitswap needs to periodically remove peers from its 'partners' list after inactivity
Think about stream management for pubsub

The text was updated successfully, but these errors were encountered:

whyrusleeping · 2017-08-31T18:17:37Z

A short update, we've been making some really good progress on libp2p lately and I want to recognize some of the hard work thats been done.

@vyzo pushed and got the initial circuit relay code merged into master. You can now relay ipfs libp2p connections through other peers! This allows easier (though still manual) NAT traversal, as well as interesting future work on using fewer connections. This gets us a bit closer to the 'lite connections' ideas above.

@Stebalien and I debugged and fixed a particularly nasty stream multiplexing issue that was being triggered by bitswap. Given some combination of disconnects and reconnects and timeouts, bitswap would write endlessly to a stream that the other side had neglected to close. This has mostly be resolved, but some deeper work is being done to eliminate this class of errors (hopefully) entirely. ref #3651

Finally, @magik6k has fixed a bug in the dialing limiter that significantly reduces the number of open file descriptors used when dialing out to peers. With this patch applied, users should notice an improvement in system load and a drop in 'too many open files' errors. This helps mitigate some of the urgency around connection closing, but doesnt address that problem.

whyrusleeping · 2017-10-28T19:18:00Z

Another exciting update:

We have implemented and shipped connection closing! This new feature will ship
in the upcoming 0.4.12 release, but you can try it out now in the 0.4.12
release candidate. The connection manager will be enabled by default, even for
nodes with config files from older repos. Users should notice a reduction in
memory, cpu and bandwidth usage, as well as a significant reduction in the
occurance of 'too many open files' errors.

So, what's next?

Now that we have the tools to defeat NAT, it's time to optimize how we create
and accept connections. In order to do this, I think its useful to categorize
the different situations that a node on the network might be in.

Nodes with public IP addresses

Nodes who are easily dialable have no need for any sort of NAT traversal, and
should have very little if any ambiguity on which addresses might work for
connecting to them. For these nodes, we should do the following:

Disable NAT traversal utilities
Announce only 'known' ip addresses
Set higher connection limits
Set up netscan prevention 'local dial' blocks
Disable TCP reuseport
Advertise relay addresses (for javascript nodes to connect)

Nodes behind a traversable NAT

These nodes are generally on home internet connections, they are generally
slower connections, and many consumer routers do not enjoy higher numbers of
connections (and neither do the roommates of ipfs users). For these nodes, we
should:

Advertise external NAT mapping
Advertise relay addresses (to minimize number of connections)
Set lower connection limits

Undialable nodes

These nodes, due to a restrictive NAT or firewall, cannot be connected to
externally. For these, we should make sure that we don't encourage other nodes
to waste resources trying to connect to them.

Only advertise relay addresses
Run the dht in 'client' mode
Disable upnp (it crashes some routers :/ )
Disable TCP reuseport

Stebalien · 2017-10-30T19:29:42Z

Disable TCP reuseport

Why?

Nodes behind a traversable NAT

Eventually, we may want to make these nodes DHT clients as well. DHTs with many ephemeral nodes but not a lot of data are really inefficient.

whyrusleeping · 2017-10-31T16:09:00Z

@Stebalien disable TCP reuseport in some situations simply because it tends to cause issues unexpectedly, and if we don't need it then its probably best not to have it.

Eventually, we may want to make these nodes DHT clients as well. DHTs with many ephemeral nodes but not a lot of data are really inefficient.

Yeah, i would be open to making that the default for them too.

whyrusleeping mentioned this issue Jul 3, 2017

captain.log - go-ipfs #2247

Closed

daviddias mentioned this issue Aug 26, 2017

Connection Closing ipfs/js-ipfs#962

Closed

This was referenced Oct 29, 2017

Integrate nat testing tool into ipfs daemon #4352

Open

Running IPFS on VPS providers triggers netscan detection, risking account termination #4343

Closed

whyrusleeping mentioned this issue Sep 1, 2018

Discuss: batch vs. on-demand connection pruning libp2p/go-libp2p-connmgr#19

Open

whyrusleeping mentioned this issue Feb 23, 2019

Implement bandwidth limiting #3065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update on libp2p work #4029

Update on libp2p work #4029

whyrusleeping commented Jul 3, 2017

whyrusleeping commented Aug 31, 2017

whyrusleeping commented Oct 28, 2017

Stebalien commented Oct 30, 2017

whyrusleeping commented Oct 31, 2017

Update on libp2p work #4029

Update on libp2p work #4029

Comments

whyrusleeping commented Jul 3, 2017

Connection Closing

Lite Connections

Usage of lite connections

Things to be done

whyrusleeping commented Aug 31, 2017

whyrusleeping commented Oct 28, 2017

Another exciting update:

So, what's next?

Nodes with public IP addresses

Nodes behind a traversable NAT

Undialable nodes

Stebalien commented Oct 30, 2017

whyrusleeping commented Oct 31, 2017