Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HighWater ConnMgr setting not respected #5248

Closed
interfect opened this issue Jul 18, 2018 · 2 comments
Closed

HighWater ConnMgr setting not respected #5248

interfect opened this issue Jul 18, 2018 · 2 comments
Labels
status/duplicate This issue or pull request already exists

Comments

@interfect
Copy link

Version information:

go-ipfs version: 0.4.16-
Repo version: 7
System version: amd64/linux
Golang version: go1.10.2

Type:

Bug

Description:

IPFS is opening thousands of TCP connections, and overloading my ISP's router's tiny brain. This gets me periodically booted from the Internet (as the router seems to think taking its DSL uplink down and then up again is a strategy to deal with connection overload), and I get approximately nil IPFS throughput between me and the node I am trying to send files to.

I changed the default ConnMgr config, with its dangerously high 900 (!) HighWater mark, to this:

"ConnMgr": {
      "GracePeriod": "5s",
      "HighWater": 50,
      "LowWater": 10,
      "Type": "basic"
    },

Even with this, I'm getting IPFS reporting that it has ~100 or more peers after a few minutes, and OpenWRT (between me and the ISP router) counts 2000-3000 connections, most of which have only 2 or 3 packets sent over them. The system isn't Internet-accessible, so IPFS is for some reason making all these connections.

  • Why is the ConnMgr tolerating 100 or more peers when I asked it to disconnect peers over 50? Did I set the GracePeriod too low, and really it needs to be higher?
  • When there are more than 50 peers already, why does the node persist in trying to open new connections?
  • Similarly, can a hard cap on connections, or time between connection attempts, be set, to prevent IPFS taking the whole network offline?
  • Why do I see so many more connections at the router than IPFS reports? Is the ConnMgr maybe only tracking successfully set-up connections, and not counting half-open connections that the router still needs to think about?
  • Is there some way to cap the outgoing or half-open connections at the OS level, maybe with Linux containers?

This is related to #4950, and the long but inconclusive #3320.

@Stebalien
Copy link
Member

When there are more than 50 peers already, why does the node persist in trying to open new connections?

The DHT. We aren't intentionally connecting to a bunch of nodes, just trying to query the DHT and connecting to those nodes in the process.

Similarly, can a hard cap on connections, or time between connection attempts, be set, to prevent IPFS taking the whole network offline?

We could add this but it wouldn't be a true hard cap and could degrade performance significantly (i.e., may make ipfs not work).

Why is the ConnMgr tolerating 100 or more peers when I asked it to disconnect peers over 50? Did I set the GracePeriod too low, and really it needs to be higher?

That shouldn't be happening but it may be that you just have a lot of "young" connections. Also note: We don't start tracking connections until they've been established (i.e., we've finished all of our security handshakes).

Why do I see so many more connections at the router than IPFS reports? Is the ConnMgr maybe only tracking successfully set-up connections, and not counting half-open connections that the router still needs to think about?

Yes, see above. However, I'd guess that the router is also lying. Try running ss -nat '( dport = :4001 or sport = :4001 )' | wc -l.

Is there some way to cap the outgoing or half-open connections at the OS level, maybe with Linux containers?

You may be able to do it with a container, network namespace, or a firewall rule. However, that'll likely cause IPFS to stop working correctly as we won't be able to query the DHT.


One solution is a relay-only mode: #4197. We're also working on adding support for QUIC (a UDP based protocol) to get crappy routers off our back (when possible).

@Stebalien
Copy link
Member

Closing as a duplicate of #4718.

@Stebalien Stebalien added the status/duplicate This issue or pull request already exists label Dec 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants