-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goroutine count, Node Performance, and correlation with peer count #5420
Comments
@postables my first impression is that it is perfectly normal. The more peers you're connected to, the more communication streams need to be handled, and each of them will be a goroutine (or several). Now perhaps you can ellaborate on "poor performance". Is go-ipfs causing abnormal load on your 12 cores? Is the disk IO super slow? or is it DHT query/resolving? |
@hsanjuan Ah okay never mind about that then. The Disk IO isn't what's slow, but the DHT query/resolving. From everything I can tell our Disk IO is fine, and if anything, under-loaded. |
DHT querying/resolving shouldn't slow down when you have more connections. That's really interesting. |
How about the CPU usage? If it's high, could you try dumping a CPU profile (https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md#beginning)? |
Interesting, why shouldn't it be slower? Would it be because as hsan pointed out, the more communication streams there are, allowing for data to be fetched from multiple sources faster? As for CPU usage, it's quite low in general I'm actually surprised as to how low it is. I don't have much explanation for why my performance increased when I decreased my connection count range. The only change I made within the same time frame, was that any time my nodes "add" files to IPFS, they do so without pinning, and the pinning occurs in a seperate process after the file has been successfully uploaded. So perhaps the root issue was the concurrent add/pin issue, and I'm just mixing up the solution that resulted in higher performance? |
DHT lookups should be faster because we're already likely to be connected to DHT servers with the information we need. However, bitswap may be slower as bitswap currently asks all connected peers for the objects we're looking for. But this is interesting. We should look into this more to make sure we aren't making any incorrect assumptions. |
Ah okay, perhaps it was bitswap then. Agreed! It's been very interesting and kind of fun to see what tricks work to increase node performance. I should hopefully have time around the weekend to test this out. Are there any particular metrics which would help to resolve this? I've got Zabbix, Grafana, and Prometheus at my disposal for metric collection, and could possibly, at least temporarily implement another tool if it would help to get some additional information not available otherwise. |
Update: We've reduced bitswap chattiness through better use of sessions. This should, in general, reduce the impact of having many connections. However:
|
Version information:
Note that I'm also running an IPFS Cluster daemon on the same node which is connected to one other peer:
Type:
Possible goroutine leak or other bug
Description:
The number of goroutines running on my IPFS nodes appears to be highly correlated with the number of peers my node is connected to. In the past I would normally have between 600 -> 1000 peers, and I noticed overall "slow" performance to my node. One thing I noticed is that my nodes always had significantly high numbers of goroutines running constantly (10K+).
I suspected the poor performance was due to slow DHT querying with the large amount of peers so I lowered my peer count range (200 -> 500). Subsequently my node's performance was significantly better than before, and that my goroutine count was lower. Today while checking up on my monitoring system I noticed a pattern that was extremely interesting, and that the peer count on one of my nodes dropped sharply, and as the peer count dropped, so did the number of goroutines running on that node:
I'm unsure what the underlying issue would be, but it appears that the more peers you are connected too, the more goroutines you are running, which on the surface looks to be what was causing the poor performance for my nodes.
System Specs:
The text was updated successfully, but these errors were encountered: