Streaming algorithm for `MemoryMapsWithContext` on linux #1477

paulcacheux · 2023-05-31T15:18:31Z

The current algorithm behind MemoryMapsWithContext reads the whole file ahead of time, but the smaps file can be quite huge in some cases. This PR switches the function to a streaming algorithm that should use less memory.

This PR also extracts the getBlock function out of the function body to make reading/understanding the function easier.

shirou

I wrote a simple Benchmark.

func BenchmarkMemoryMapsGroupedTrue(b *testing.B) {
	p := testGetProcess()
	ctx := context.Background()
	for i := 0; i < b.N; i++ {
		p.MemoryMapsWithContext(ctx, true)
	}
}
func BenchmarkMemoryMapsGroupedFalse(b *testing.B) {
	p := testGetProcess()
	ctx := context.Background()
	for i := 0; i < b.N; i++ {
		p.MemoryMapsWithContext(ctx, false)
	}
}

Then, run with go test -bench BenchmarkMemoryMaps -benchmem, I got those.

goos: linux
goarch: amd64
pkg: github.com/shirou/gopsutil/v3/process
cpu: AMD Ryzen 7 5800HS with Radeon Graphics
- current
BenchmarkMemoryMapsGroupedTrue-4            6064            168223 ns/op            6090 B/op         64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4            685           1725689 ns/op          632609 B/op       3955 allocs/op
- This PR
BenchmarkMemoryMapsGroupedTrue-4            6742            159646 ns/op            7993 B/op         80 allocs/op
BenchmarkMemoryMapsGroupedFalse-4            906           1339468 ns/op          313716 B/op       5813 allocs/op

When I specify a process which has 13478 lines smap file(This is my largest in my current environment), I got those.

- current
BenchmarkMemoryMapsGroupedTrue-4             861           1620421 ns/op            6099 B/op         64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4             68          14987541 ns/op         4359101 B/op      27601 allocs/op

- This PR
BenchmarkMemoryMapsGroupedTrue-4             642           2090199 ns/op            8008 B/op         80 allocs/op
BenchmarkMemoryMapsGroupedFalse-4             66          21847487 ns/op         2194337 B/op      41144 allocs/op

The results show that the memory size used has been reduced by about half, but the number of memory allocations and total duration has increased.

Since the objective of this PR is to reduce the total number of memory allocations, the objective has been achieved. However, the number of memory allocations has increased, and the total execution time has increased.

More detailed analysis of memory allocation is needed, but are these results as you expect?
I have not been able to properly track down the cause, but is it possible to reduce the number of memory allocations more?

Anyway, I currently consider these increases in execution time and memory allocation to be acceptable, so I am thinking of merging the PRs.

paulcacheux · 2023-06-05T13:19:07Z

I'm not a huge fan of the CPU usage increase, let me take a look if I can improve things, thanks for taking a look

paulcacheux · 2023-06-26T13:41:25Z

With the latest changes I made I have the following benchmark result:

small smaps (the one from the test binary)
before this PR:
BenchmarkMemoryMapsGroupedTrue-4    	   18906	     64160 ns/op	    6056 B/op	      64 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	    2958	    402972 ns/op	  464957 B/op	    2978 allocs/op

with this PR:
BenchmarkMemoryMapsGroupedTrue-4    	   19773	     62090 ns/op	    2600 B/op	      58 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	    3195	    370271 ns/op	  128443 B/op	    2852 allocs/op

big smaps (~7MiB)
before this PR:
BenchmarkMemoryMapsGroupedTrue-4    	  186759	      6107 ns/op	    4720 B/op	      44 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	     100	  42532282 ns/op	17828483 B/op	      40 allocs/op

with this PR:
BenchmarkMemoryMapsGroupedTrue-4    	  153559	      7562 ns/op	    2784 B/op	      58 allocs/op
BenchmarkMemoryMapsGroupedFalse-4   	      28	  42078254 ns/op	21347260 B/op	  409934 allocs/op

I'm still working on understand how to improve this, especially the amount of allocs/op

github-actions bot added the package:process label May 31, 2023

paulcacheux force-pushed the paulcacheux/streaming-smaps branch from e52ed95 to ed8ad0a Compare May 31, 2023 15:23

paulcacheux marked this pull request as ready for review May 31, 2023 15:35

shirou reviewed Jun 4, 2023

View reviewed changes

paulcacheux force-pushed the paulcacheux/streaming-smaps branch 2 times, most recently from 085f6ba to a188c99 Compare June 12, 2023 19:18

paulcacheux added 5 commits June 26, 2023 14:10

add benchmark

d000822

extract getBlock from MemoryMapsWithContext

1c0ca02

use scanner instead of reading the whole file ahead of file

81e9c11

improve allocation pattern

a00bb68

rework reading algorithm

6dbc2f7

paulcacheux force-pushed the paulcacheux/streaming-smaps branch from a188c99 to 6dbc2f7 Compare June 26, 2023 12:11

paulcacheux added 3 commits June 26, 2023 14:14

compute fields only if needed

547c54e

use strings.Cut

ca87d80

backward compatible strings.Cut

a3b0f42

paulcacheux closed this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming algorithm for `MemoryMapsWithContext` on linux #1477

Streaming algorithm for `MemoryMapsWithContext` on linux #1477

paulcacheux commented May 31, 2023 •

edited

Loading

shirou left a comment

paulcacheux commented Jun 5, 2023

paulcacheux commented Jun 26, 2023 •

edited

Loading

Streaming algorithm for MemoryMapsWithContext on linux #1477

Streaming algorithm for MemoryMapsWithContext on linux #1477

Conversation

paulcacheux commented May 31, 2023 • edited Loading

shirou left a comment

Choose a reason for hiding this comment

paulcacheux commented Jun 5, 2023

paulcacheux commented Jun 26, 2023 • edited Loading

Streaming algorithm for `MemoryMapsWithContext` on linux #1477

Streaming algorithm for `MemoryMapsWithContext` on linux #1477

paulcacheux commented May 31, 2023 •

edited

Loading

paulcacheux commented Jun 26, 2023 •

edited

Loading