Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: scavenger not freeing all possible memory on darwin #47656

Open
josharian opened this issue Aug 11, 2021 · 0 comments
Open

runtime: scavenger not freeing all possible memory on darwin #47656

josharian opened this issue Aug 11, 2021 · 0 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Aug 11, 2021

This is effectively a follow-up to #29844. I am attempting to reduce our memory usage on iOS, where we are severely memory-constrained.

On darwin, sysUnused calls madvise(v, n, _MADV_FREE_REUSABLE). This marks the pages as reclaimable by the OS. However, unexpectedly, it does not mark all the pages as reclaimable. I do not understand why, but here's a way to reproduce it.

The following program makes and then frees a single large byte slice. It pauses three times: before the allocation, after the allocation, and after the allocation has been freed.

package main

import (
	"runtime/debug"
	"time"
)

var b []byte

func main() {
	// Call time.Sleep and debug.FreeOSMemory once up front,
	// so that all basic runtime structures get set up
	// and all relevant pages get dirtied.
	time.Sleep(time.Millisecond)
	debug.FreeOSMemory()
	println("start")
	time.Sleep(5 * time.Second)

	b = make([]byte, 4_000_000)
	for i := range b {
		b[i] = 1
	}
	println("allocated")
	time.Sleep(5 * time.Second)

	b = nil
	debug.FreeOSMemory()
	time.Sleep(3 * time.Second) // wait for the scavenger's effects to be visible
	println("freed")
	time.Sleep(3 * time.Hour)
}

Running this on macOS, I use footprint to measure the app's footprint and vmmap to get the memory usage details, at each pause point. Concretely, I run go build -o jjj x.go && GODEBUG=allocfreetrace=1 ./jjj to run it and footprint jjj && vmmap -pages -interleaved -submap jjj to measure it.

Before the alloc, for the Go heap, footprint reports:

  Dirty      Clean  Reclaimable    Regions    Category
    ---        ---          ---        ---    ---
1168 KB        0 B          0 B         38    untagged ("VM_ALLOCATE")

After the alloc:

  Dirty      Clean  Reclaimable    Regions    Category
    ---        ---          ---        ---    ---
5344 KB        0 B          0 B         39    untagged ("VM_ALLOCATE")

After the free:

  Dirty      Clean  Reclaimable    Regions    Category
    ---        ---          ---        ---    ---
4192 KB        0 B      1344 KB         40    untagged ("VM_ALLOCATE")

Note that 4192KB-1344KB=2848KB, which is considerably higher than the 1168KB we began with.

(The exact numbers vary slightly from run to run.)

We can get a glimpse into the details of the accounting using vmmap (with flags listed above). For the Go heap, before the alloc:

REGION TYPE                    START - END         [   VSIZE    RSDNT    DIRTY     SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
VM_ALLOCATE               14000000000-14000400000  [  256       40       40        0   ] rw-/rwx SM=ZER  
VM_ALLOCATE               14000400000-14004000000  [ 3840        0        0        0   ] ---/rwx SM=NUL  

After the alloc:

REGION TYPE                    START - END         [   VSIZE    RSDNT    DIRTY     SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
VM_ALLOCATE               14000000000-14000400000  [  256      200      200        0   ] rw-/rwx SM=ZER  
VM_ALLOCATE               14000400000-14000800000  [  256       85       85        0   ] rw-/rwx SM=PRV  
VM_ALLOCATE               14000800000-14004000000  [ 3584        0        0        0   ] ---/rwx SM=NUL  

After the free:

REGION TYPE                    START - END         [   VSIZE    RSDNT    DIRTY     SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
VM_ALLOCATE               14000000000-14000400000  [  256      202      202        0   ] rw-/rwx SM=ZER  
VM_ALLOCATE               14000400000-14000800000  [  256       85        1        0   ] rw-/rwx SM=PRV  
VM_ALLOCATE               14000800000-14004000000  [ 3584        0        0        0   ] ---/rwx SM=NUL  

This lines up with what tracealloc said:

tracealloc(0x14000180000, 0x3d2000, uint8)

and then

tracefree(0x14000180000, 0x3d2000)

The large byte slice spans the 14000000000-14000400000 and the 14000400000-14000800000 regions. However, the free appears only to have marked the pages in the 14000400000-14000800000 region as reclaimable. (84 pages = 1344KB, which is exactly what footprint reported as reclaimable.) The pages in the 14000000000-14000400000 region are still marked as dirty.

As an experiment, I changed sysUnused to also call mprotect(v, n, _PROT_NONE) then mprotect(v, n, _PROT_READ|_PROT_WRITE). See tailscale@38ab03e.

Running again with this change, the unreclaimable space reported by footprint disappears. At the three pause points:

  Dirty      Clean  Reclaimable    Regions    Category
    ---        ---          ---        ---    ---
1168 KB        0 B          0 B         37    untagged ("VM_ALLOCATE")
  Dirty      Clean  Reclaimable    Regions    Category
    ---        ---          ---        ---    ---
5328 KB        0 B          0 B         38    untagged ("VM_ALLOCATE")
  Dirty      Clean  Reclaimable    Regions    Category
    ---        ---          ---        ---    ---
1584 KB        0 B          0 B         39    untagged ("VM_ALLOCATE")

We're not back down to 1168KB (I wish I knew why), but it's considerably better than 2848KB. vmmap shows more or less the same pattern as before:

REGION TYPE                    START - END         [   VSIZE    RSDNT    DIRTY     SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
VM_ALLOCATE               14000000000-14000400000  [  256       40       40        0   ] rw-/rwx SM=ZER  
VM_ALLOCATE               14000400000-14004000000  [ 3840        0        0        0   ] ---/rwx SM=NUL  
REGION TYPE                    START - END         [   VSIZE    RSDNT    DIRTY     SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
VM_ALLOCATE               14000000000-14000400000  [  256      188      188       12   ] rw-/rwx SM=ZER  
VM_ALLOCATE               14000400000-14000800000  [  256       85       85        0   ] rw-/rwx SM=PRV  
VM_ALLOCATE               14000800000-14004000000  [ 3584        0        0        0   ] ---/rwx SM=NUL  
REGION TYPE                    START - END         [   VSIZE    RSDNT    DIRTY     SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
VM_ALLOCATE               14000000000-14000400000  [  256      197      197        5   ] rw-/rwx SM=ZER  
VM_ALLOCATE               14000400000-14000800000  [  256       85        1        0   ] rw-/rwx SM=PRV  
VM_ALLOCATE               14000800000-14004000000  [ 3584        0        0        0   ] ---/rwx SM=NUL  

(Note that if you add the dirty and swapped pages together in the mprotect run, they match the madvise run dirty pages count exactly.)

I don't know how to interpret all of this. But it looks a bit like madvise(v, n, _MADV_FREE_REUSABLE) isn't sufficient to fully return memory to the OS, perhaps because of something having to do with allocation regions.

I'm out of ideas for what/how to investigate from here, but I'm happy to follow up on suggestions.

cc @mknyszek @bradfitz @randall77

@dmitshur dmitshur added this to the Backlog milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants