requesting node metrics from the API in a short interval causes a high CPU load #7528

RubenKelevra · 2020-07-08T16:16:31Z

Version information:

go-ipfs version: 0.7.0-dev
Repo version: 10
System version: amd64/linux
Golang version: go1.14.4

master@7ce1d751f

Description:

I'm running ipfs on a new server with an SSD storage. I'm writing a lot of individual files with ipfs add --chunker 'buzhash' --cid-version 1 --hash 'blake2b-256' to the node, copy them to the right location in the MFS and unpin them again (since ipfs files write doesn't support setting a non-standard chunker).

Afterwards, the MFS-folder-CID is pinned on ipfs-cluster, which runs on the same node.

ipfs-cluster shows that all cluster-pins are locally pinned, which are part of the pinset.

Another remote server has also all pins of the cluster set pinned, two other servers still catch up - so they are receiving blocks from the local node.

The low bandwidth use, while it should send a somewhat large folder to two other nodes brought a possible issue to my attention - the outgoing network speed was shown as around 4 MBit/s which is extremely slow for a server basically doing nothing else.

The CPU usage (around 200%) is extremely high for the network usage, so I thought it might still publish CIDs, and went to sleep.

System specs:
4 dedicated cores for the VM from an AMD EPYC 7702P 64-Core Processor; 16 GB of memory.

There are no background tasks running, just ipfs and ipfs-cluster. ipfs-cluster uses like no CPU resources at all.

I tried changing the dht type to dhtclient, but this resulted in no change. Restarting the service also resulted in no change, the CPU usage just jumps up again to around 200%.

The debug data (I forgot to collect the last ones) - and the binary since it's built from the master. When I read the cpu-profile right, it leads to lot of CPU-time being used by go-ds-badger and go-ipfs-blockstore and functions called by them (flame graph). The debug data was collected some minutes after a restart of the IPFS-daemon, while the ipfs-cluster-service was turned off.

debug.tar.gz

Here are some performance numbers collected on the system, which basically shows no difference in load, while there's only very low network traffic.

Config

DisableBandwidthMetrics and DisableNatPortMap are true, EnableAutoRelay and EnableRelayHop are false. I use the server-profile and routing.type is dhtclient. I use the badgerds, StorageGCWatermark is 90, StorageMax is 280GB.

$ ipfs repo stat
NumObjects: 610148
RepoSize:   98410186788
StorageMax: 280000000000
RepoPath:   /var/lib/ipfs
Version:    fs-repo@10

I use the systemd-hardening.service file from the repo, but changed the ExecStart to

/usr/bin/ipfs daemon --enable-gc --enable-pubsub-experiment --enable-namesys-pubsub

The text was updated successfully, but these errors were encountered:

RubenKelevra · 2020-07-08T21:21:53Z

Okay, I found the reason:

Netdata is polling the object count, the repo size, and the peers from the IPFS node via the API. IPFS doesn't seem to cache the values and update them when they change (write-through-cache-strategy).

Since Netdata is polling metrics quite often, this is causing the issue. As a temporary workaround, the plugin for IPFS can be configured to use a larger data collection frequency...

So this turns into an improvement request, that polling the API for those metrics doesn't cause large CPU loads.

Stebalien · 2020-07-13T08:12:18Z

The repo size is memoized, the number of objects is not. Try polling ipfs repo stat --size-only.

thienpow · 2020-07-15T10:23:51Z

pooling for peers count by requesting a full list of peers is not making sense also

const peerInfos = await ipfs.swarm.peers({ timeout: 2500 })
return peerInfos.length

RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Jul 8, 2020

RubenKelevra changed the title ~~go-ds-badger/go-ipfs-blockstore is using a lot of CPU~~ requesting node metrics from the API in a short interval causes a high CPU load Jul 8, 2020

RubenKelevra mentioned this issue Jul 8, 2020

[IPFS-plugin] netdata's fast polling causes massive CPU-load by IPFS netdata/netdata#9505

Closed

thienpow mentioned this issue Jul 15, 2020

need Peers Count api #7539

Open

jacobheun added the status/ready Ready to be worked label Jul 31, 2020

ilyam8 mentioned this issue Aug 7, 2020

Fix high CPU in IPFS collector by disabling call to the /api/v0/stats/repo endpoint by default netdata/netdata#9687

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

requesting node metrics from the API in a short interval causes a high CPU load #7528

requesting node metrics from the API in a short interval causes a high CPU load #7528

RubenKelevra commented Jul 8, 2020 •

edited

RubenKelevra commented Jul 8, 2020

Stebalien commented Jul 13, 2020

thienpow commented Jul 15, 2020 •

edited

requesting node metrics from the API in a short interval causes a high CPU load #7528

requesting node metrics from the API in a short interval causes a high CPU load #7528

Comments

RubenKelevra commented Jul 8, 2020 • edited

Version information:

Description:

Config

RubenKelevra commented Jul 8, 2020

Stebalien commented Jul 13, 2020

thienpow commented Jul 15, 2020 • edited

RubenKelevra commented Jul 8, 2020 •

edited

thienpow commented Jul 15, 2020 •

edited