-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runaway connections/resources #6297
Comments
Possibly related: #6237 |
I'm having a similar issue, on 0.4.19 |
I'm see the same connection runaways. Attaching some profiles from a machine that had about ~12k peers in its swarm ( A machine will be totally fine until it starts taking on tons of connections, as if it’s discovered by a rogue swarm (too dramatic?). Once that happens, it’s just a matter of time before it goes under w/ OOM. Version: 5fd5d44 textile-profile-ip-172-31-14-164.us-east-2.compute.internal-2019-05-06T02_28_04+0000.tar.gz |
Aside from the random connection spike being really bizarre (we should investigate this) we should probably start looking at hard limits on connections. We've avoided doing it before now because it gets really messy when reasoning through DoS protection, but I think having some true maximum above the high water mark makes good sense. Some open questions include:
|
I've been arguing before and I am going to argue again that we should have hard internal resource constraints. I think it's really bad design if any daemon will just swallow up infinite system memory until the OOM kills it, it should really listen to OS' signals and/or have its internal resource management - particularly for something as resource hungry as IPFS. Also note that this event takes place on timescales under which the memory manager really should have kicked in. With a grace period of 2 seconds it should have started killing connections way earlier, so there's something freaky going on there. |
Have you been looking at the kernel memory through |
The memory graph above is strictly the IPFS daemon's systemd slice. Not sure if that includes kernel allocations, but it seems to be that it shouldn't (perhaps I'm wrong though). Regardless, we shouldn't see this many sockets. |
This is definitely #6237. I'm still working on a fix. |
@dokterbob Out of curiosity, what machine specs are you running that allows you to connect to around 14400 peers at once? |
Version information:
Type: bug
Description:
Running a high-load node (ipfs-search.com crawler), we've been seeing sudden seemingly exponential growth in connection counts, paired with equal increases in connection count (after which our memlimit kills
go-ipfs
at 12G(!), which is the sudden recovery in the graph).Notably, we're running with the connection manager configured for a maximum of
14400
connections.Full config:
The text was updated successfully, but these errors were encountered: