Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requesting node metrics from the API in a short interval causes a high CPU load #7528

Open
RubenKelevra opened this issue Jul 8, 2020 · 3 comments
Labels
effort/days Estimated to take multiple days, but less than a week exp/expert Having worked on the specific codebase is important kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now status/ready Ready to be worked topic/perf Performance

Comments

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Jul 8, 2020

Version information:

go-ipfs version: 0.7.0-dev
Repo version: 10
System version: amd64/linux
Golang version: go1.14.4

master@7ce1d751f

Description:

I'm running ipfs on a new server with an SSD storage. I'm writing a lot of individual files with ipfs add --chunker 'buzhash' --cid-version 1 --hash 'blake2b-256' to the node, copy them to the right location in the MFS and unpin them again (since ipfs files write doesn't support setting a non-standard chunker).

Afterwards, the MFS-folder-CID is pinned on ipfs-cluster, which runs on the same node.

ipfs-cluster shows that all cluster-pins are locally pinned, which are part of the pinset.

Another remote server has also all pins of the cluster set pinned, two other servers still catch up - so they are receiving blocks from the local node.

The low bandwidth use, while it should send a somewhat large folder to two other nodes brought a possible issue to my attention - the outgoing network speed was shown as around 4 MBit/s which is extremely slow for a server basically doing nothing else.

The CPU usage (around 200%) is extremely high for the network usage, so I thought it might still publish CIDs, and went to sleep.

System specs:
4 dedicated cores for the VM from an AMD EPYC 7702P 64-Core Processor; 16 GB of memory.

There are no background tasks running, just ipfs and ipfs-cluster. ipfs-cluster uses like no CPU resources at all.

I tried changing the dht type to dhtclient, but this resulted in no change. Restarting the service also resulted in no change, the CPU usage just jumps up again to around 200%.

The debug data (I forgot to collect the last ones) - and the binary since it's built from the master. When I read the cpu-profile right, it leads to lot of CPU-time being used by go-ds-badger and go-ipfs-blockstore and functions called by them (flame graph). The debug data was collected some minutes after a restart of the IPFS-daemon, while the ipfs-cluster-service was turned off.

debug.tar.gz

Here are some performance numbers collected on the system, which basically shows no difference in load, while there's only very low network traffic.

Screenshot_20200708_174457
Screenshot_20200708_174428
Screenshot_20200708_174318
Screenshot_20200708_171025
Screenshot_20200708_170937
Screenshot_20200708_170907
Screenshot_20200708_170850
Screenshot_20200708_170803
Screenshot_20200708_170230
Screenshot_20200708_170047
Screenshot_20200708_165957
Screenshot_20200708_165927
Screenshot_20200708_165639
Screenshot_20200708_165452
Screenshot_20200708_165352
Screenshot_20200708_165323

Config

DisableBandwidthMetrics and DisableNatPortMap are true, EnableAutoRelay and EnableRelayHop are false. I use the server-profile and routing.type is dhtclient. I use the badgerds, StorageGCWatermark is 90, StorageMax is 280GB.

$ ipfs repo stat
NumObjects: 610148
RepoSize:   98410186788
StorageMax: 280000000000
RepoPath:   /var/lib/ipfs
Version:    fs-repo@10

I use the systemd-hardening.service file from the repo, but changed the ExecStart to

/usr/bin/ipfs daemon --enable-gc --enable-pubsub-experiment --enable-namesys-pubsub

@RubenKelevra RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Jul 8, 2020
@RubenKelevra
Copy link
Contributor Author

Okay, I found the reason:

Netdata is polling the object count, the repo size, and the peers from the IPFS node via the API. IPFS doesn't seem to cache the values and update them when they change (write-through-cache-strategy).

Since Netdata is polling metrics quite often, this is causing the issue. As a temporary workaround, the plugin for IPFS can be configured to use a larger data collection frequency...

So this turns into an improvement request, that polling the API for those metrics doesn't cause large CPU loads.

@RubenKelevra RubenKelevra changed the title go-ds-badger/go-ipfs-blockstore is using a lot of CPU requesting node metrics from the API in a short interval causes a high CPU load Jul 8, 2020
@Stebalien
Copy link
Member

The repo size is memoized, the number of objects is not. Try polling ipfs repo stat --size-only.

@Stebalien Stebalien added exp/expert Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now topic/perf Performance and removed kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Jul 13, 2020
@thienpow
Copy link

thienpow commented Jul 15, 2020

pooling for peers count by requesting a full list of peers is not making sense also

const peerInfos = await ipfs.swarm.peers({ timeout: 2500 })
return peerInfos.length

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/expert Having worked on the specific codebase is important kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now status/ready Ready to be worked topic/perf Performance
Projects
None yet
Development

No branches or pull requests

4 participants