Processes() is extremely CPU intensive #842

xxxserxxx · 2020-02-19T20:37:08Z

Describe the bug
Processes() is significantly more resource intensive than even forking ps with exec.Command and parsing the results.

To Reproduce
Download the attached go source code (and rename to perf.go it because github is dumb) and run it on the command line with:

go run ./perf.go 500

The argument is the number of iterations; you may need to start with fewer if it runs too long.

Expected behavior
I'd expect the native code to be more resource efficient than spawning a child ps process. On my machine, I consistently see CPU use of between 2x and 4x CPU use, and much slower results:

➜  gopsutil go run perf.go 500
Forking ##########################################
....X....X....X....X....X....X....X....X....X....
Average CPU: 802
Time: 6.463249387s
GoPsUtil #########################################
....X....X....X....X....X....X....X....X....X....
Average CPU: 5563
Time: 13.694322005s

Environment (please complete the following information):

Linux: [paste contents of /etc/os-release and the result of uname -a]

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
HOME_URL="https://www.archlinux.org/"
➜  gopsutil uname -a
Linux glamdring 5.5.4-arch1-1 #1 SMP PREEMPT Sat, 15 Feb 2020 00:36:29 +0000 x86_64 GNU/Linux

(trimmed some attributes out)

github.com/shirou/gopsutil v2.20.1+incompatible

Edit

I noticed this while doing some development work on gotop; running both branches, for about an hour, it didn't stray off this:

The text was updated successfully, but these errors were encountered:

AtakanColak · 2020-03-02T06:45:56Z

@xxxserxxx could you also try go-sysinfo? I've been wondering but not much time to try it atm.

xxxserxxx · 2020-03-02T23:22:37Z

Yeah! Thanks; this gives me some impetus to build out proper benchmarks for devices.

AtakanColak · 2020-03-03T06:55:39Z

Np, please feel free to share the results, it would be very helpful to me -and anyone needing a process data library in Go- as well.

Lomanic · 2020-03-03T19:30:03Z

I may guess that ps is more efficient than gopsutil because it knows before-hand which information to gather from /proc/$PID/status in a single file read while gopsutil first creates a Process variable (also checks for process existence as we can create this type for an arbitrary PID number, could you maybe benchmark before this change #821?) and then re-reads /proc/$PID/status file for each new information asked (see how many times fillFromStatusWithContext is called).

In fact, we already have the following issues that aim to optimize this: #286, #413.

xxxserxxx · 2020-03-03T22:18:06Z

I wrote a benchmark, and am attaching it here; again, github has a restricted set of suffixes, so resuffix the file to .go first.

The attached file is a proper benchmark containing all three implementations: forking, gopsutil, and go-sysinfo. Each function performs similar operations, getting some data about the processes. Optimizing-out is addressed by counting errors in the return values.

➜  ps go test -benchtime=20s -bench  .
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkForking-8          2437           9663078 ns/op
--- BENCH: BenchmarkForking-8
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
BenchmarkGoPSUtil-8         1162          21010525 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 2
    perf_test.go:32: GoPsUtil errors: 11
BenchmarkGoSysInfo-8        2671           8590463 ns/op
--- BENCH: BenchmarkGoSysInfo-8
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 1
    perf_test.go:39: GoSysInfo errors: 27
PASS
ok      ser1.net/perf   74.941s

Of note is that the forking version is the only one that doesn't generate errors when processing data. What I saw of the errors, they appear to be related to processes disappearing while the libraries are gathering information about the process. go-sysinfo consistently generated more errors than gopsutil, across all runs.

I'm see vaguely conflicting results. According to the benchmark, go-sysutils is, indeed, faster than forking. However, in an application the CPU use of go-sysutils is double that of forking. I said "vaguely" because this is easily explained: forking is more IO bound (I'm guessing), which takes longer with less CPU impact than a pure syscall interface. However, given that the application isn't in a tight loop (it loops once per second), I wouldn't expect to notice the difference, and top reports higher CPU use for go-sysinfo when both are running side-by-side.

Edit

How about I go ahead and attach that benchmark, huh?

perf_test.go.txt

xxxserxxx · 2020-03-03T22:20:26Z

I meant to @AtakanColak. Also, @Lomanic -- what did you mean, "benchmark before this change?" Do you want me to fix the version at the commit prior to #821 and benchmark that?

The commit immediately prior to the merge was v2.20.1, which made things easy. I pinned the version in go.mod to that and re-ran the benchmark. It's not perfect, because I running them side-by-side in the same test would be more effort, but here are the results:

##### v2.20.1
➜  ps go test -benchtime=20s -bench  .
go: downloading github.com/shirou/gopsutil v2.20.1+incompatible
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkForking-8          2430           9777497 ns/op
--- BENCH: BenchmarkForking-8
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
BenchmarkGoPSUtil-8         1186          21953451 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 1
    perf_test.go:32: GoPsUtil errors: 16
BenchmarkGoSysInfo-8        2703           8751875 ns/op
--- BENCH: BenchmarkGoSysInfo-8
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 28
PASS
ok      ser1.net/perf   77.424s

The difference is within an easy margin of error, so I ran it again with just the gopsutil benchmark, switching the go.mod version in between:

##### v2.20.1
➜  ps go test -benchtime=20s -bench=BenchmarkGoPSUtil  .
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkGoPSUtil-8         1164          20295012 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 7
PASS
ok      ser1.net/perf   25.732s
##### v2.20.2
➜  ps go test -benchtime=20s -bench=BenchmarkGoPSUtil  .
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkGoPSUtil-8         1131          21108605 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 13
PASS
ok      ser1.net/perf   26.044s

xxxserxxx · 2020-03-03T23:14:25Z

Another hiccup with go-sysinfo: it isn't making syscalls; it's looking directly in /proc. This means it's unusable for (e.g.) gotop, because unless it's running as root it can't get information about many of the processes running on the system. ps can, so forking that and parsing the results can provide a system overview to non-root users that go-sysinfo can't.

Now I'm wondering how gopsutil gets it's info 🤔

AtakanColak · 2020-03-04T07:04:18Z

@xxxserxxx First of all thank you for your beautiful work. It has also given me few ideas. However, do let me know if I'm wrong, when you execute a exec.Cmd, it doesn't fork in C sense, it spawns another process. Unless you have been monitoring the CPU/Mem usage of that process as well, simply observing the CPU usage of gotop will yield inaccurate results; work is given to the ps process while gotop is parsing the results whereas parsing is extra work.

To my understanding gopsutil also looks to /proc. However I haven't come across a case where I couldn't get information about processes even though I didn't run it as root. Perhaps I simply didn't realize it..

EDIT: I have taken some look into how ps works and it supposedly also reads from /proc. I don't understand the difference now :(

xxxserxxx · 2020-03-05T14:53:20Z

You're right: I'm not tracking the ps process CPU usage at all. It enters and exits far too quickly, and I'm not able to monitor total average CPU usage granular enough to notice if it's overall higher or lower either way.

Yeah, I don't know about ps; unless it's running in some sort of limited elevated privilege mode, I can't think of why it works. It may be because it's pulling limited, permitted data from /proc -- just general CPU and memory utilization -- whereas the libs are trying to gather more restricted information like environment. The former is not (necessarily) sensitive; accessing environment variables of running processes is. But I'm totally guessing.

I do know that some errors are because processes are disappearing between the time the library is pulling a list of procs and reading data on the procs. So the handles are literally disappearing, like if you read a directory glob and something deleted files before you processed each file. Again, I haven't dug into internals, but this could be mitigated by recognizing that the handle has disappeared and just swallowing it rather than returning errors on it. I believe go-sysinfo is particularly susceptible to this because the list and stat pulls are entirely different calls by the client using the API, rather than a single call that pulls and returns everything.

That said, I'm less worried about the disappearing handles. That can be easily recognized and handled in client code. I'm more concerned about the permission errors I see. Also, I only know the permission issue is happening with go-sysinfo; I haven't looked for or observed that error in gopsutil yet.

Lomanic · 2020-03-06T08:53:45Z

what did you mean, "benchmark before this change?" Do you want me to fix the version at the commit prior to #821 and benchmark that?

Exactly, just out of curiosity, I was not expecting a spectacular difference. Sorry for this noise.

I believe go-sysinfo is particularly susceptible to this because the list and stat pulls are entirely different calls by the client using the API, rather than a single call that pulls and returns everything.

That's also the case for gopsutil, see my previous comment.

ps, go-sysinfo and gopsutil all get their data about processes from procfs on Linux. There is no special syscalls involved. Hence they should report the same errors. Maybe gopsutil and go-sysinfo don't use the exact same files to get the same data, files with different permissions then.

About the core issue reported (comparatively high CPU usage compared to ps), see my previous message. Determining how go-sysinfo is able to get this data more efficiently than gopsutil would need some profiling to check what both libraries do differently, this benchmark provides an interesting data point, thanks for this.

xxxserxxx · 2020-04-29T15:52:04Z

I've been reading back through the thread, and I just realized that it doesn't matter whether we're counting the ps process or not. The benchmarks are measuring the number of call to gather information that can be made per run; that's an absolute number. Also, a persistent 2% CPU average use, vs a 0.5s CPU use with a once-per-second possible spike because of ps (it's hard to see because it's transient and it never exists long enough to show up in process lists), is still a net loss.

I don't think we can explain away the performance because of the off-load onto a separate child process which (a) isn't persistent, and (b) is irrelevant for the benchmark.

ssoroka · 2020-09-22T23:02:02Z

We are seeing a major performance regression on linux between 2.20.4 and 2.20.5. With CPU jumping in a test case from 14% with about 280 processes in 2.20.4, to 195% in 2.20.5. Looks like there was some cache code removed which may be suspect.

Lomanic · 2020-09-23T08:05:53Z

We are seeing a major performance regression on linux between 2.20.4 and 2.20.5. With CPU jumping in a test case from 14% with about 280 processes in 2.20.4, to 195% in 2.20.5. Looks like there was some cache code removed which may be suspect.

We don't cache BootTime anymore to fix #837. We retrieve this value for each new process instantiation as a consequence to this call to CreateTime() (call to BootTime()), so we can later compare this value in a future call to IsRunning(). #945 may bring some improvements by getting uptime through the sysinfo syscall.

xxxserxxx · 2021-09-13T21:56:18Z

Here's an update from nearly a year later: I updated to v3 to see if anything changed, and it appears performance is about the same:

10642» go test -benchtime=20s -bench . 
goos: linux
goarch: amd64
pkg: ser1.net/perf
cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
BenchmarkForking-8          1028          23357252 ns/op
--- BENCH: BenchmarkForking-8
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
BenchmarkGoPSUtil-8           40         556564316 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 0
BenchmarkGoSysInfo-8         964          25360231 ns/op
--- BENCH: BenchmarkGoSysInfo-8
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 7
PASS
ok      ser1.net/perf   76.207s
10643» grep shirou go.mod
        github.com/shirou/gopsutil/v3 v3.21.8

cforce · 2024-01-24T21:49:52Z

A significant performance issue has been observed in the 'gopsutil' library on Linux when comparing versions 2.20.4 and 2.20.5. In version 2.20.4, a particular test case with approximately 280 processes exhibited a CPU usage of 14%. However, in version 2.20.5, the CPU usage skyrocketed to 195%. This performance degradation can be attributed to the removal of BootTime caching in order to address a specific scenario where a device initially reports an incorrect date (e.g., epoch or a more recent date set by a tool like 'fake_hwclock') and later corrects it using NTP. This results in 'gopsutil' reporting incorrect system uptime.
The removal of this cache has caused the OpenTelemetry Collector (otel collector), which relies on 'gopsutil', to repeatedly access BootTime for every process and host metric with a high frequency, leading to increased resource usage. To mitigate this issue, one potential solution is to reintroduce BootTime caching in the collector's code. This change is expected to greatly improve performance since hostmetric continually calls 'gopsutil' in an "endless loop."

You can find further details and discussions regarding this issue in the following link:

#1283 (comment)
It is suggested that 'gopsutil' may need to support caching for such high-demand use cases, and the synchronization of NTP and BootTime should occur at defined intervals or be handled differently to address this specific scenario effectively.

shirou · 2024-01-26T02:55:06Z

@cforce
For now, please use the latest version, which is v3.23.12. v2 is no longer maintained.

And #1579 introduces the host.EnableBootTimeCache() function to cache boot time. As you commented, caching BootTime can cause problems, so this cache is disabled by default, and it is up to the user to turn it on. This feature will be released the end of this month.

xxxserxxx mentioned this issue Feb 19, 2020

Processes widget fails with ps from BusyBox xxxserxxx/gotop#20

Open

Lomanic added os:linux package:process labels Mar 3, 2020

ssoroka mentioned this issue Mar 17, 2020

procstat can not find pid of process using pattern lookup in dockerized telegraf influxdata/telegraf#6813

Closed

danielnelson mentioned this issue Jun 6, 2020

Procstat input with exe or pattern options are slow on Linux influxdata/telegraf#7642

Closed

xxxserxxx mentioned this issue Sep 15, 2020

Unit tests and benchmarks xxxserxxx/gotop#104

Open

This was referenced Sep 23, 2020

[host][linux] Change to use unix.SysInfo_t to get Uptime in Linux #945

Merged

Cache common/common_linux.Virtualization() #942

Merged

Lomanic mentioned this issue Apr 1, 2022

BootTimeWithContext is being called for every pid causing high CPU consumption #1283

Open

5 tasks

Lomanic added the performance label Oct 22, 2022

Lomanic mentioned this issue Oct 23, 2022

High cpu on unoptimized code in process package #1343

Closed

1 task

This was referenced Nov 1, 2023

reduce cpu consumption of hostmetric receiver (process & processes) open-telemetry/opentelemetry-collector#8789

Closed

reduce cpu consumption of hostmetric receiver (process & processes) open-telemetry/opentelemetry-collector-contrib#28849

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processes() is extremely CPU intensive #842

Processes() is extremely CPU intensive #842

xxxserxxx commented Feb 19, 2020 •

edited

AtakanColak commented Mar 2, 2020

xxxserxxx commented Mar 2, 2020

AtakanColak commented Mar 3, 2020

Lomanic commented Mar 3, 2020

xxxserxxx commented Mar 3, 2020 •

edited

xxxserxxx commented Mar 3, 2020 •

edited

xxxserxxx commented Mar 3, 2020

AtakanColak commented Mar 4, 2020 •

edited

xxxserxxx commented Mar 5, 2020

Lomanic commented Mar 6, 2020

xxxserxxx commented Apr 29, 2020

ssoroka commented Sep 22, 2020

Lomanic commented Sep 23, 2020

xxxserxxx commented Sep 13, 2021

cforce commented Jan 24, 2024 •

edited

shirou commented Jan 26, 2024

Processes() is extremely CPU intensive #842

Processes() is extremely CPU intensive #842

Comments

xxxserxxx commented Feb 19, 2020 • edited

Edit

AtakanColak commented Mar 2, 2020

xxxserxxx commented Mar 2, 2020

AtakanColak commented Mar 3, 2020

Lomanic commented Mar 3, 2020

xxxserxxx commented Mar 3, 2020 • edited

Edit

xxxserxxx commented Mar 3, 2020 • edited

xxxserxxx commented Mar 3, 2020

AtakanColak commented Mar 4, 2020 • edited

xxxserxxx commented Mar 5, 2020

Lomanic commented Mar 6, 2020

xxxserxxx commented Apr 29, 2020

ssoroka commented Sep 22, 2020

Lomanic commented Sep 23, 2020

xxxserxxx commented Sep 13, 2021

cforce commented Jan 24, 2024 • edited

shirou commented Jan 26, 2024

xxxserxxx commented Feb 19, 2020 •

edited

xxxserxxx commented Mar 3, 2020 •

edited

xxxserxxx commented Mar 3, 2020 •

edited

AtakanColak commented Mar 4, 2020 •

edited

cforce commented Jan 24, 2024 •

edited