-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming algorithm for MemoryMapsWithContext
on linux
#1477
Streaming algorithm for MemoryMapsWithContext
on linux
#1477
Conversation
e52ed95
to
ed8ad0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a simple Benchmark.
func BenchmarkMemoryMapsGroupedTrue(b *testing.B) {
p := testGetProcess()
ctx := context.Background()
for i := 0; i < b.N; i++ {
p.MemoryMapsWithContext(ctx, true)
}
}
func BenchmarkMemoryMapsGroupedFalse(b *testing.B) {
p := testGetProcess()
ctx := context.Background()
for i := 0; i < b.N; i++ {
p.MemoryMapsWithContext(ctx, false)
}
}
Then, run with go test -bench BenchmarkMemoryMaps -benchmem
, I got those.
goos: linux
goarch: amd64
pkg: github.com/shirou/gopsutil/v3/process
cpu: AMD Ryzen 7 5800HS with Radeon Graphics
- current
BenchmarkMemoryMapsGroupedTrue-4 6064 168223 ns/op 6090 B/op 64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4 685 1725689 ns/op 632609 B/op 3955 allocs/op
- This PR
BenchmarkMemoryMapsGroupedTrue-4 6742 159646 ns/op 7993 B/op 80 allocs/op
BenchmarkMemoryMapsGroupedFalse-4 906 1339468 ns/op 313716 B/op 5813 allocs/op
When I specify a process which has 13478 lines smap file(This is my largest in my current environment), I got those.
- current
BenchmarkMemoryMapsGroupedTrue-4 861 1620421 ns/op 6099 B/op 64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4 68 14987541 ns/op 4359101 B/op 27601 allocs/op
- This PR
BenchmarkMemoryMapsGroupedTrue-4 642 2090199 ns/op 8008 B/op 80 allocs/op
BenchmarkMemoryMapsGroupedFalse-4 66 21847487 ns/op 2194337 B/op 41144 allocs/op
The results show that the memory size used has been reduced by about half, but the number of memory allocations and total duration has increased.
Since the objective of this PR is to reduce the total number of memory allocations, the objective has been achieved. However, the number of memory allocations has increased, and the total execution time has increased.
More detailed analysis of memory allocation is needed, but are these results as you expect?
I have not been able to properly track down the cause, but is it possible to reduce the number of memory allocations more?
Anyway, I currently consider these increases in execution time and memory allocation to be acceptable, so I am thinking of merging the PRs.
I'm not a huge fan of the CPU usage increase, let me take a look if I can improve things, thanks for taking a look |
085f6ba
to
a188c99
Compare
a188c99
to
6dbc2f7
Compare
With the latest changes I made I have the following benchmark result:
I'm still working on understand how to improve this, especially the amount of allocs/op |
The current algorithm behind
MemoryMapsWithContext
reads the whole file ahead of time, but the smaps file can be quite huge in some cases. This PR switches the function to a streaming algorithm that should use less memory.This PR also extracts the
getBlock
function out of the function body to make reading/understanding the function easier.