Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected high cpu load and memory consumption #7263

Closed
RubenKelevra opened this issue May 1, 2020 · 6 comments
Closed

unexpected high cpu load and memory consumption #7263

RubenKelevra opened this issue May 1, 2020 · 6 comments
Labels
effort/hours Estimated to take one or several hours exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP topic/libp2p Topic libp2p
Milestone

Comments

@RubenKelevra
Copy link
Contributor

Version information:

I was running 0.5.0-rc4 from commit 116999a

Repo version: 9
System version: amd64/linux
Golang version: go1.14.2

Description:

While there was just single digit MBit/s traffic on the server, the CPU was under substantial load by IPFS and it used nearly half of the RAM.

ipfs       13567  6.1  6.2 4970396 1017004 ?     Ssl  Apr21 884:15 /usr/local/bin/ipfs-cluster-service daemon
ipfs     2403637  223 43.9 16769580 7202860 ?    Ssl  Apr26 16229:31 /usr/bin/ipfs daemon --enable-pubsub-experiment --enable-namesys-pubsub

The VM has 16 G memory and 4 cores of an Intel Xeon Gold 6140 CPU @ 2.30GHz.

The debug-files are attached.
ipfs_debug.tar.gz

@RubenKelevra RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels May 1, 2020
@Stebalien
Copy link
Member

It looks like you have ~243 QUIC connections. Is that correct? (ipfs swarm peers | grep -c quic).

cc @marten-seemann I'm seeing a lot of timers firing and a lot of timers being held by QUIC connections.

@Stebalien Stebalien added exp/expert Having worked on the specific codebase is important effort/hours Estimated to take one or several hours P0 Critical: Tackled by core team ASAP topic/libp2p Topic libp2p and removed need/triage Needs initial labeling and prioritization labels May 2, 2020
@Stebalien Stebalien added this to the go-ipfs 0.6 milestone May 2, 2020
@marten-seemann
Copy link
Member

@Stebalien Looking at ipfs.heap, it looks like the QUIC sessions are holding 41 MB of timers in total. That doesn't seem right for just 243 connections.

It seems like the session doesn't reset or stop the timer when it is closed (I just created quic-go/quic-go#2515), so maybe we're leaking timers there? Although the timer is set to a maximum of 30s (the idle timeout), so leaked timers should be garbage collected after that time frame.

@Stebalien
Copy link
Member

The timer is getting created with a timeout of "max int".

@marten-seemann
Copy link
Member

I'm not sure if that's the problem, since the timer would get reset as soon as the first packet is either sent or received. In any case, stopping the timer when the QUIC connection is closed seems like the right thing do.

@RubenKelevra
Copy link
Contributor Author

RubenKelevra commented May 2, 2020

It looks like you have ~243 QUIC connections. Is that correct? (ipfs swarm peers | grep -c quic).

This could be right, the server has enough power to handle a lot of connections, so the limiter is basically "open", to avoid that my server is cutting connections, but leave the other side handle the disconnects (if this makes any sense).

These are the settings I use:

loki_config.txt

@Stebalien
Copy link
Member

This has since been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours exp/expert Having worked on the specific codebase is important kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP topic/libp2p Topic libp2p
Projects
None yet
Development

No branches or pull requests

3 participants