Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update on libp2p work #4029

Open
whyrusleeping opened this issue Jul 3, 2017 · 4 comments
Open

Update on libp2p work #4029

whyrusleeping opened this issue Jul 3, 2017 · 4 comments

Comments

@whyrusleeping
Copy link
Member

The ipfs network is growing pretty quickly, and we need to move fast to avoid issues caused by ipfs not actually closing any connection. I did some thinking on this this weekend and came up with this issue.

Connection Closing

Currently, ipfs does not close any of the connections it makes, as the network
grows larger and larger this is becoming a very big problem. We need to
implement a way for ipfs nodes (and more generally, any application using
libp2p) to manage the number of connections they maintain. To make a decision
on whether or not to close a given connection, we need some information about
how valuable it is. For example, peers that are in the DHTs routing table, or
ones that are very frequent bitswap partners should be prioritized over
connections made just for an infrequent wantlist update or dht query. However,
we don't want to go around closing all but a very few 'valuable' connections,
It is preferable to keep as many connections open as resource constraints
allow. The cost of initiating a new connection is rather expensive
latency-wise. It is also useful to maintain a larger number of connections to
keep the effectiveness of bitswap wantlist broadcasts high.

To achieve this, We need a system to keep track of connections, sort out which
ones should stay open and which ones should be closed. This system should
accept hints from upper level callers about the connections. The DHT code
should be able to signal that a given connection is in its routing table,
bitswap should be able to mark favored partners, pubsub should be able to hold
connections to the peers in its swarms and so on.

My proposal is to add this functionality to the 'host' abstraction. A method
TagConn(conn, tag, val) should be added, that accepts a connection, a tag to
add to the connection, and an importance value for that tag. Also needed is an
UntagConn(conn, tag) method. A worker routine would periodically scan through
the connections, check if we have more than our limit, and close off
connections that have the least assigned value.

Lite Connections

Since it is advantageous to hold open more connections (even if they are
infrequently used), I propose we add the concept of a 'lite connection' that
doesnt count against the connection limit in the same way as normal
connections. These would be low cost connections such as a relayed connection
through another peer, a standard connection to a peer in the local area
network, or even a BLE connection to some other nearby node. The key here is
that these connections are cheaper to maintain. These connections should not be
used for any 'high bandwidth' applications, if high bandwidth is needed, a new
'heavy' connection should be created (so as not to abuse relays).
The ideal usage of these could be bitswap announcements, small amounts of dht
queries, pubsub or pings.

Usage of lite connections

The aim is to maintain fewer 'heavy' connections, and to close out connections
periodically as they become less useful to us. For this, nodes should maintain
a set of connections to peers that are willing to relay connections for them,
and relay multiple connections for other peers. In this way you can be
'connected' to many unique peers per physical 'heavy' connection.

Things to be done

  • Relay needs to be finished up and integrated https://github.com/libp2p/go-libp2p-circuit
  • ipfs swarm connect needs a more detailed 'verbose' mode to aid in connection debugging Add verbose mode to ipfs swarm connect  #3746
  • ipfs dht print-table (name TBD) command to print dht routing tables
  • ipfs swarm gc to manually trigger a cleanup of connections
  • ipfs swarm limits to view and change swarm limits, perhaps select preset 'modes'
  • ipfs swarm peers -v should show which peer opened each stream (remote vs local)
  • connection tagging functionality needs to be added to 'host'
  • peers listing should show connection tagging information
  • 'host' needs configurable routine to close out connections periodically
  • NewStream should add hints as context values to select things like 'Dont Dial' or 'Prefer Lite Stream'
  • Bitswap needs to periodically remove peers from its 'partners' list after inactivity
  • Think about stream management for pubsub
@whyrusleeping
Copy link
Member Author

A short update, we've been making some really good progress on libp2p lately and I want to recognize some of the hard work thats been done.

@vyzo pushed and got the initial circuit relay code merged into master. You can now relay ipfs libp2p connections through other peers! This allows easier (though still manual) NAT traversal, as well as interesting future work on using fewer connections. This gets us a bit closer to the 'lite connections' ideas above.

@Stebalien and I debugged and fixed a particularly nasty stream multiplexing issue that was being triggered by bitswap. Given some combination of disconnects and reconnects and timeouts, bitswap would write endlessly to a stream that the other side had neglected to close. This has mostly be resolved, but some deeper work is being done to eliminate this class of errors (hopefully) entirely. ref #3651

Finally, @magik6k has fixed a bug in the dialing limiter that significantly reduces the number of open file descriptors used when dialing out to peers. With this patch applied, users should notice an improvement in system load and a drop in 'too many open files' errors. This helps mitigate some of the urgency around connection closing, but doesnt address that problem.

@whyrusleeping
Copy link
Member Author

Another exciting update:

We have implemented and shipped connection closing! This new feature will ship
in the upcoming 0.4.12 release, but you can try it out now in the 0.4.12
release candidate. The connection manager will be enabled by default, even for
nodes with config files from older repos. Users should notice a reduction in
memory, cpu and bandwidth usage, as well as a significant reduction in the
occurance of 'too many open files' errors.

So, what's next?

Now that we have the tools to defeat NAT, it's time to optimize how we create
and accept connections. In order to do this, I think its useful to categorize
the different situations that a node on the network might be in.

Nodes with public IP addresses

Nodes who are easily dialable have no need for any sort of NAT traversal, and
should have very little if any ambiguity on which addresses might work for
connecting to them. For these nodes, we should do the following:

  • Disable NAT traversal utilities
  • Announce only 'known' ip addresses
  • Set higher connection limits
  • Set up netscan prevention 'local dial' blocks
  • Disable TCP reuseport
  • Advertise relay addresses (for javascript nodes to connect)

Nodes behind a traversable NAT

These nodes are generally on home internet connections, they are generally
slower connections, and many consumer routers do not enjoy higher numbers of
connections (and neither do the roommates of ipfs users). For these nodes, we
should:

  • Advertise external NAT mapping
  • Advertise relay addresses (to minimize number of connections)
  • Set lower connection limits

Undialable nodes

These nodes, due to a restrictive NAT or firewall, cannot be connected to
externally. For these, we should make sure that we don't encourage other nodes
to waste resources trying to connect to them.

  • Only advertise relay addresses
  • Run the dht in 'client' mode
  • Disable upnp (it crashes some routers :/ )
  • Disable TCP reuseport

@Stebalien
Copy link
Member

Disable TCP reuseport

Why?

Nodes behind a traversable NAT

Eventually, we may want to make these nodes DHT clients as well. DHTs with many ephemeral nodes but not a lot of data are really inefficient.

@whyrusleeping
Copy link
Member Author

@Stebalien disable TCP reuseport in some situations simply because it tends to cause issues unexpectedly, and if we don't need it then its probably best not to have it.

Eventually, we may want to make these nodes DHT clients as well. DHTs with many ephemeral nodes but not a lot of data are really inefficient.

Yeah, i would be open to making that the default for them too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants