Goroutine count, Node Performance, and correlation with peer count #5420

bonedaddy · 2018-09-02T23:14:38Z

Version information:

go-ipfs version: 0.4.17-
Repo version: 7
System version: amd64/linux
Golang version: go1.10.3

Note that I'm also running an IPFS Cluster daemon on the same node which is connected to one other peer:

ipfs-cluster-service version 0.5.0

Type:

Possible goroutine leak or other bug

Description:

The number of goroutines running on my IPFS nodes appears to be highly correlated with the number of peers my node is connected to. In the past I would normally have between 600 -> 1000 peers, and I noticed overall "slow" performance to my node. One thing I noticed is that my nodes always had significantly high numbers of goroutines running constantly (10K+).

I suspected the poor performance was due to slow DHT querying with the large amount of peers so I lowered my peer count range (200 -> 500). Subsequently my node's performance was significantly better than before, and that my goroutine count was lower. Today while checking up on my monitoring system I noticed a pattern that was extremely interesting, and that the peer count on one of my nodes dropped sharply, and as the peer count dropped, so did the number of goroutines running on that node:

I'm unsure what the underlying issue would be, but it appears that the more peers you are connected too, the more goroutines you are running, which on the surface looks to be what was causing the poor performance for my nodes.

System Specs:

CPU: E5-2680 v2 - 12 cores
Memory: 16GB DDR3
Disk: 750GB 10K RPM
Disk Format: ext4
IPFS Repo: BadgerDS

The text was updated successfully, but these errors were encountered:

hsanjuan · 2018-09-04T23:06:01Z

@postables my first impression is that it is perfectly normal. The more peers you're connected to, the more communication streams need to be handled, and each of them will be a goroutine (or several).

Now perhaps you can ellaborate on "poor performance". Is go-ipfs causing abnormal load on your 12 cores? Is the disk IO super slow? or is it DHT query/resolving?

bonedaddy · 2018-09-04T23:14:26Z

@hsanjuan Ah okay never mind about that then. The Disk IO isn't what's slow, but the DHT query/resolving. From everything I can tell our Disk IO is fine, and if anything, under-loaded.

Stebalien · 2018-09-05T22:24:47Z

DHT querying/resolving shouldn't slow down when you have more connections. That's really interesting.

Stebalien · 2018-09-05T22:28:50Z

How about the CPU usage? If it's high, could you try dumping a CPU profile (https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md#beginning)?

bonedaddy · 2018-09-05T22:58:42Z

Interesting, why shouldn't it be slower? Would it be because as hsan pointed out, the more communication streams there are, allowing for data to be fetched from multiple sources faster? As for CPU usage, it's quite low in general I'm actually surprised as to how low it is.

I don't have much explanation for why my performance increased when I decreased my connection count range. The only change I made within the same time frame, was that any time my nodes "add" files to IPFS, they do so without pinning, and the pinning occurs in a seperate process after the file has been successfully uploaded.

So perhaps the root issue was the concurrent add/pin issue, and I'm just mixing up the solution that resulted in higher performance?

Stebalien · 2018-09-05T23:11:37Z

Interesting, why shouldn't it be slower? Would it be because as hsan pointed out, the more communication streams there are, allowing for data to be fetched from multiple sources faster? As for CPU usage, it's quite low in general I'm actually surprised as to how low it is.

DHT lookups should be faster because we're already likely to be connected to DHT servers with the information we need. However, bitswap may be slower as bitswap currently asks all connected peers for the objects we're looking for.

But this is interesting. We should look into this more to make sure we aren't making any incorrect assumptions.

bonedaddy · 2018-09-05T23:26:06Z

Ah okay, perhaps it was bitswap then. Agreed! It's been very interesting and kind of fun to see what tricks work to increase node performance.

I should hopefully have time around the weekend to test this out. Are there any particular metrics which would help to resolve this? I've got Zabbix, Grafana, and Prometheus at my disposal for metric collection, and could possibly, at least temporarily implement another tool if it would help to get some additional information not available otherwise.

Stebalien · 2020-05-29T17:47:26Z

Update: We've reduced bitswap chattiness through better use of sessions. This should, in general, reduce the impact of having many connections. However:

We still have goroutines linear in the number of peers. Unfortunately, this is unavoidable as we need a goroutine to listen on every stream/connection to our peers.
Bitswap is still pretty chatty.

Stebalien added the kind/bug A bug in existing code (including security flaws) label Sep 5, 2018

magik6k added the topic/perf Performance label Nov 4, 2018

momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019

DonaldTsang mentioned this issue May 16, 2019

Performace, or How IPFS will be better than BitTorrent #6342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goroutine count, Node Performance, and correlation with peer count #5420

Goroutine count, Node Performance, and correlation with peer count #5420

bonedaddy commented Sep 2, 2018

hsanjuan commented Sep 4, 2018

bonedaddy commented Sep 4, 2018 •

edited

Loading

Stebalien commented Sep 5, 2018

Stebalien commented Sep 5, 2018

bonedaddy commented Sep 5, 2018

Stebalien commented Sep 5, 2018

bonedaddy commented Sep 5, 2018 •

edited

Loading

Stebalien commented May 29, 2020

Goroutine count, Node Performance, and correlation with peer count #5420

Goroutine count, Node Performance, and correlation with peer count #5420

Comments

bonedaddy commented Sep 2, 2018

Version information:

Type:

Description:

hsanjuan commented Sep 4, 2018

bonedaddy commented Sep 4, 2018 • edited Loading

Stebalien commented Sep 5, 2018

Stebalien commented Sep 5, 2018

bonedaddy commented Sep 5, 2018

Stebalien commented Sep 5, 2018

bonedaddy commented Sep 5, 2018 • edited Loading

Stebalien commented May 29, 2020

bonedaddy commented Sep 4, 2018 •

edited

Loading

bonedaddy commented Sep 5, 2018 •

edited

Loading