Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming algorithm for MemoryMapsWithContext on linux #1477

Closed

Conversation

paulcacheux
Copy link

@paulcacheux paulcacheux commented May 31, 2023

The current algorithm behind MemoryMapsWithContext reads the whole file ahead of time, but the smaps file can be quite huge in some cases. This PR switches the function to a streaming algorithm that should use less memory.

This PR also extracts the getBlock function out of the function body to make reading/understanding the function easier.

@paulcacheux paulcacheux marked this pull request as ready for review May 31, 2023 15:35
Copy link
Owner

@shirou shirou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a simple Benchmark.

func BenchmarkMemoryMapsGroupedTrue(b *testing.B) {
	p := testGetProcess()
	ctx := context.Background()
	for i := 0; i < b.N; i++ {
		p.MemoryMapsWithContext(ctx, true)
	}
}
func BenchmarkMemoryMapsGroupedFalse(b *testing.B) {
	p := testGetProcess()
	ctx := context.Background()
	for i := 0; i < b.N; i++ {
		p.MemoryMapsWithContext(ctx, false)
	}
}

Then, run with go test -bench BenchmarkMemoryMaps -benchmem, I got those.

goos: linux
goarch: amd64
pkg: github.com/shirou/gopsutil/v3/process
cpu: AMD Ryzen 7 5800HS with Radeon Graphics
- current
BenchmarkMemoryMapsGroupedTrue-4            6064            168223 ns/op            6090 B/op         64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4            685           1725689 ns/op          632609 B/op       3955 allocs/op
- This PR
BenchmarkMemoryMapsGroupedTrue-4            6742            159646 ns/op            7993 B/op         80 allocs/op
BenchmarkMemoryMapsGroupedFalse-4            906           1339468 ns/op          313716 B/op       5813 allocs/op

When I specify a process which has 13478 lines smap file(This is my largest in my current environment), I got those.

- current
BenchmarkMemoryMapsGroupedTrue-4             861           1620421 ns/op            6099 B/op         64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4             68          14987541 ns/op         4359101 B/op      27601 allocs/op

- This PR
BenchmarkMemoryMapsGroupedTrue-4             642           2090199 ns/op            8008 B/op         80 allocs/op
BenchmarkMemoryMapsGroupedFalse-4             66          21847487 ns/op         2194337 B/op      41144 allocs/op

The results show that the memory size used has been reduced by about half, but the number of memory allocations and total duration has increased.

Since the objective of this PR is to reduce the total number of memory allocations, the objective has been achieved. However, the number of memory allocations has increased, and the total execution time has increased.

More detailed analysis of memory allocation is needed, but are these results as you expect?
I have not been able to properly track down the cause, but is it possible to reduce the number of memory allocations more?

Anyway, I currently consider these increases in execution time and memory allocation to be acceptable, so I am thinking of merging the PRs.

@paulcacheux
Copy link
Author

I'm not a huge fan of the CPU usage increase, let me take a look if I can improve things, thanks for taking a look

@paulcacheux paulcacheux force-pushed the paulcacheux/streaming-smaps branch 2 times, most recently from 085f6ba to a188c99 Compare June 12, 2023 19:18
@paulcacheux
Copy link
Author

paulcacheux commented Jun 26, 2023

With the latest changes I made I have the following benchmark result:

small smaps (the one from the test binary)
before this PR:
BenchmarkMemoryMapsGroupedTrue-4    	   18906	     64160 ns/op	    6056 B/op	      64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	    2958	    402972 ns/op	  464957 B/op	    2978 allocs/op

with this PR:
BenchmarkMemoryMapsGroupedTrue-4    	   19773	     62090 ns/op	    2600 B/op	      58 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	    3195	    370271 ns/op	  128443 B/op	    2852 allocs/op
big smaps (~7MiB)
before this PR:
BenchmarkMemoryMapsGroupedTrue-4    	  186759	      6107 ns/op	    4720 B/op	      44 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	     100	  42532282 ns/op	17828483 B/op	      40 allocs/op

with this PR:
BenchmarkMemoryMapsGroupedTrue-4    	  153559	      7562 ns/op	    2784 B/op	      58 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	      28	  42078254 ns/op	21347260 B/op	  409934 allocs/op

I'm still working on understand how to improve this, especially the amount of allocs/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants