Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: reclaim memory used by huge array that is no longer referenced #14045

Closed
ibukanov opened this Issue Jan 20, 2016 · 28 comments

Comments

Projects
None yet
8 participants
@ibukanov
Copy link

ibukanov commented Jan 20, 2016

Consider the following program that I run with Go 1.5.2 on 64-bit Fedora Linux:

package main
import "fmt"
func main() {
    a := make([]byte, 185 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}

It allocates 185MB byte array and then forces OS to commit memory to it via touching all the pages. This programs runs OK and prints expected x even if I restrict the size of available virtual memory per process to 200MB using ulimit:

~/s> ulimit -S -v 204800
~/s> go run test.go
x

Now consider its modification like:

package main
import "fmt"
func main() {
    a := make([]byte, 85 * 1024 * 1024)
    a = nil
    a = make([]byte, 150 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}

It allocates first 85MB, then clears the reference to the slice, and then allocates 150MB. This time under the same 200MB limit as set with ulimit it fails:

~/s> go run test.go
fatal error: runtime: out of memory

The same failure happens even with the explicit GC call after a = nil:

package main
import "fmt"
import "runtime"
func main() {
    a := make([]byte, 85 * 1024 * 1024)
    a = nil
    runtime.GC()
    a = make([]byte, 150 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}

Is it just a runtime bug? If not, how can I force the runtime to release a large allocation?

@ibukanov ibukanov changed the title huge arrays allocatuion issue with Linux under ulimit -v huge arrays allocation issue with Linux under ulimit -v Jan 20, 2016

@ianlancetaylor ianlancetaylor changed the title huge arrays allocation issue with Linux under ulimit -v runtime: reclaim memory used by huge array that is no longer referenced Jan 21, 2016

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jan 21, 2016

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Jan 21, 2016

This is sort of a dup of #5049.

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Jan 21, 2016

This seems like a different bug to me.

Say the heap ends in 10 free pages. We try to allocate something 20 pages in size. Currently the code allocates 20 pages from the system to satisfy this request, because the request won't fit in 10 pages. We really only should ask for 10 pages, as they will (probably) be coalesced with the 10 free pages at the end of the heap, resulting in 20 contiguous free pages.

The "probably" in there depends on what the OS decides to give us. If it isn't coalesceable, then asking for only 10 pages doesn't help. In fact it hurts.

Maybe only if we would otherwise throw an out-of-memory error should we try and do this.

@ibukanov

This comment has been minimized.

Copy link
Author

ibukanov commented Jan 21, 2016

I thought that this was caused by imprecise GC or something. In any case, why Go cannot release memory back to the system if it knows that there is a big chunk of unused pages and the new allocation cannot be satisfied? Is it because by design heap never shrinks?

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Jan 21, 2016

No, it is just that we wait some time before releasing pages back to the system, to amortize the cost of allocating them in the first place. What if we add another 85MB allocation to the end of your example?

@ibukanov

This comment has been minimized.

Copy link
Author

ibukanov commented Jan 21, 2016

@randall77 My comments was about releasing memory back to the OS when a new allocation fails, not in general. Also, how long Go waits before releasing the memory back to the system? If I modify the program to wait a minute before allocating again, it still fails:

~/s> cat test.go 
package main

import (
    "fmt"
    "runtime"
    "time"
)

func main() {
    a := make([]byte, 85 * 1024 * 1024)
    a = nil
    runtime.GC()
    time.Sleep(time.Minute)
    a = make([]byte, 150 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}
~/s> ulimit -S -v 204800
~/s> go run test.go 
fatal error: runtime: out of memory

runtime stack:
@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Jan 21, 2016

I see, if an allocation fails we should run a GC, trigger giving memory back to the OS, then try again. That seems reasonable.

I think we currently wait 5 minutes (after being freed by GC) before giving unused memory back to the OS.

@bradfitz bradfitz modified the milestones: Go1.7, Unplanned Jan 21, 2016

@ibukanov

This comment has been minimized.

Copy link
Author

ibukanov commented Jan 21, 2016

I modified the program to wait 20 minutes, not one, after GC before trying to allocate again. It still reported out of memory.

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Jan 22, 2016

That's because you're ulimiting the virtual address space, not the resident set size. When we give memory back to the OS, that just reduces the RSS, not the virtual memory use.

The runtime will need some sort of fix to reuse the virtual address space for the second allocation.

Virtual address size limits are in any case probably not what you want. Virtual address space is essentially free on a 64-bit system.

Also, you've allocated an 85MB chunk and then freed it again. There's no guarantee that the runtime (during runtime.GC or time.Sleep) doesn't allocate something small just after that 85MB chunk. In that case, there's no way to reuse the 85MB of virtual address space for the 150MB allocation.

@ibukanov

This comment has been minimized.

Copy link
Author

ibukanov commented Jan 23, 2016

@randall77 Right, restricting just the total amount of physical memory the process can use, not VM space, leads to the example working with timeouts slightly above 5 minutes. So this bug is really about releasing the memory to the OS on out-of-memory without waiting for 5 minutes.

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented May 18, 2016

Does debug.FreeOSMemory help?

@quentinmit quentinmit added the NeedsFix label Oct 11, 2016

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Nov 8, 2016

I found that reproducing this is slightly more complicated now. To grow the RSS, it's necessary to touch the first allocation (otherwise the pages don't get faulted in):

package main

import (
    "fmt"
    "time"
)

func main() {
    a := make([]byte, 85*1024*1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    a = nil
    //debug.FreeOSMemory()
    time.Sleep(time.Second)
    a = make([]byte, 150*1024*1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
    time.Sleep(time.Minute)
}

With this code, the process' RSS grows to ~244 MB.

If you comment out the sleep (which allocates) between the two big allocations, we will in fact reuse the whole 85 MB region for the 150 MB region, so the RSS grows to just over 150 MB. I suspect that didn't happen in 1.5.2 when this was originally reported because we didn't trigger the GC aggressively enough before satisfying the second big allocation.

Uncommenting the FreeOSMemory does keep the RSS down, since this GCs the 85MB allocation and immediately scavenges those pages.

However, fixing this is not as simple as just scavenging if we get ENOMEM. For one, there are actually about a dozen places in the runtime where we check for out of memory (most are where we allocate off-heap runtime structures). We could (and possibly should) consolidate these, but then the problem becomes that the scavenger has locking requirements and out-of-memory is detected and handled at a very low level, when we're holding various different locks and when the heap structures may be in an inconsistent state. This mean we'd have to thread the failure up the stack further than we currently do in many places until we reached a point where we could reliably drop locks, call the scavenger, and retry the operation.

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Nov 8, 2016

An alternative approach would be to get out in front of the problem. I haven't convinced myself if this is a good idea or not yet, but when sweeping frees a large object (for some value of large, probably at least 1 MB and maybe much more), it would be quite easy to release that memory back to the OS immediately. That would actively decouple the physical and virtual memory around large objects so the virtual address space fragmentation caused by large objects didn't cause over-retention of physical memory.

The cost would be releasing and later re-faulting that memory, which would be amortized over the cost of allocating and populating a large object. Faulting on my laptop takes about 70–200µs / MB, and releasing memory takes about 4–30µs / MB (the lower bound is for large pages, the upper bound for regular pages). In contrast, allocating a large object from already faulted memory and fully populating it takes about 450–500µs / MB. Hence, in the worst case, this would increase the cost of allocating large objects by ~46%, but if large pages are available the worst case is more like 16%. Costs in practice would probably be lower.

This solution would also be likely to reduce the running RSS of applications with large heaps, since it effectively makes the scavenger much more aggressive for large objects. This alone would be worth some cost.

Benchmark source

package main

import (
    "fmt"
    "log"
    "runtime"
    "runtime/debug"
    "syscall"
    "time"
)

func main() {
    const bytes = 64 << 20 // Use 1 MB for small pages, 32 MB+ for large pages
    const toMB = bytes / (1 << 20)

    debug.SetGCPercent(-1)

    for i := 0; i < 10; i++ {
        b, err := syscall.Mmap(-1, 0, bytes, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_PRIVATE|syscall.MAP_ANONYMOUS)
        if err != nil {
            log.Fatal(err)
        }

        start := time.Now()
        for i := 0; i < len(b); i += 4096 {
            b[i] = 'x'
        }
        fault := time.Since(start)

        start = time.Now()
        syscall.Madvise(b, syscall.MADV_DONTNEED)
        release := time.Since(start)

        start = time.Now()
        for i := 0; i < len(b); i += 4096 {
            b[i] = 'x'
        }
        refault := time.Since(start)

        fmt.Println(fault/toMB, "/ MB fault,", release/toMB, "/ MB release,", refault/toMB, "/ MB refault")

        syscall.Munmap(b)
    }

    for i := 0; i < 10; i++ {
        runtime.GC()
        start := time.Now()
        b := make([]byte, bytes)
        for i := 0; i < len(b); i++ {
            b[i] = 'x'
        }
        alloc := time.Since(start)
        fmt.Println(alloc/toMB, "/ MB alloc")
    }
}
@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jun 9, 2017

An alternative approach would be to get out in front of the problem.

Yet another way to get out in front of the problem would be to eagerly scavenge when we grow the heap or reuse freed memory. This would be less aggressive than my previous sweeping idea, so it would be less likely to free memory that's about to get reused. I think we're in a good position to do this now thanks to @RLH's free span treap, since that should let us quickly find a right-sized chunk of unused memory we can free.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/139719 mentions this issue: runtime: scavenge large spans before heap growth

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/139737 mentions this issue: runtime: remove npreleased in favor of boolean

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/140197 mentions this issue: runtime: mark a freshly sysAlloc'd span scavenged

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/139298 mentions this issue: runtime: separate scavenged spans

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/138918 mentions this issue: runtime: add physical memory scavenging test

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/139837 mentions this issue: runtime: use only treaps for tracking spans

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/140198 mentions this issue: runtime: mark a span as sysUsed after trimming

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 8, 2018

Change https://golang.org/cl/139297 mentions this issue: runtime: de-duplicate span scavenging

gopherbot pushed a commit that referenced this issue Oct 17, 2018

runtime: de-duplicate span scavenging
Currently, span scavenging was done nearly identically in two different
locations. This change deduplicates that into one shared routine.

For #14045.

Change-Id: I15006b2c9af0e70b7a9eae9abb4168d3adca3860
Reviewed-on: https://go-review.googlesource.com/c/139297
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Oct 17, 2018

runtime: use only treaps for tracking spans
Currently, mheap tracks spans in both mSpanLists and mTreaps, but
mSpanLists, while they tend to be smaller, complicate the
implementation. Here we simplify the implementation by removing
free and busy from mheap and renaming freelarge -> free and busylarge
-> busy.

This change also slightly changes the reclamation policy. Previously,
for allocations under 1MB we would attempt to find a small span of the
right size. Now, we just try to find any number of spans totaling the
right size. This may increase heap fragmentation, but that will be dealt
with using virtual memory tricks in follow-up CLs.

For #14045.

Garbage-heavy benchmarks show very little change, except what appears
to be a decrease in STW times and peak RSS.

name                      old STW-ns/GC       new STW-ns/GC       delta
Garbage/benchmem-MB=64-8           263k ±64%           217k ±24%  -17.66%  (p=0.028 n=25+23)

name                      old STW-ns/op       new STW-ns/op       delta
Garbage/benchmem-MB=64-8          9.39k ±65%          7.80k ±24%  -16.88%  (p=0.037 n=25+23)

name                      old peak-RSS-bytes  new peak-RSS-bytes  delta
Garbage/benchmem-MB=64-8           281M ± 0%           249M ± 4%  -11.40%  (p=0.000 n=19+18)

https://perf.golang.org/search?q=upload:20181005.1

Go1 benchmarks perform roughly the same, the most notable regression
being the JSON encode/decode benchmark with worsens by ~2%.

name                     old time/op    new time/op    delta
BinaryTree17-8              3.02s ± 2%     2.99s ± 2%  -1.18%  (p=0.000 n=25+24)
Fannkuch11-8                3.05s ± 1%     3.02s ± 2%  -1.20%  (p=0.000 n=25+25)
FmtFprintfEmpty-8          43.6ns ± 5%    43.4ns ± 3%    ~     (p=0.528 n=25+25)
FmtFprintfString-8         74.9ns ± 3%    73.4ns ± 1%  -2.03%  (p=0.001 n=25+24)
FmtFprintfInt-8            79.3ns ± 3%    77.9ns ± 1%  -1.73%  (p=0.003 n=25+25)
FmtFprintfIntInt-8          119ns ± 6%     116ns ± 0%  -2.68%  (p=0.000 n=25+18)
FmtFprintfPrefixedInt-8     134ns ± 4%     132ns ± 1%  -1.52%  (p=0.004 n=25+25)
FmtFprintfFloat-8           240ns ± 1%     241ns ± 1%    ~     (p=0.403 n=24+23)
FmtManyArgs-8               543ns ± 1%     537ns ± 1%  -1.00%  (p=0.000 n=25+25)
GobDecode-8                6.88ms ± 1%    6.92ms ± 4%    ~     (p=0.088 n=24+22)
GobEncode-8                5.92ms ± 1%    5.93ms ± 1%    ~     (p=0.898 n=25+24)
Gzip-8                      267ms ± 2%     266ms ± 2%    ~     (p=0.213 n=25+24)
Gunzip-8                   35.4ms ± 1%    35.6ms ± 1%  +0.70%  (p=0.000 n=25+25)
HTTPClientServer-8          104µs ± 2%     104µs ± 2%    ~     (p=0.686 n=25+25)
JSONEncode-8               9.67ms ± 1%    9.80ms ± 4%  +1.32%  (p=0.000 n=25+25)
JSONDecode-8               47.7ms ± 1%    48.8ms ± 5%  +2.33%  (p=0.000 n=25+25)
Mandelbrot200-8            4.87ms ± 1%    4.91ms ± 1%  +0.79%  (p=0.000 n=25+25)
GoParse-8                  3.59ms ± 4%    3.55ms ± 1%    ~     (p=0.199 n=25+24)
RegexpMatchEasy0_32-8      90.3ns ± 1%    89.9ns ± 1%  -0.47%  (p=0.000 n=25+21)
RegexpMatchEasy0_1K-8       204ns ± 1%     204ns ± 1%    ~     (p=0.914 n=25+24)
RegexpMatchEasy1_32-8      84.9ns ± 0%    84.6ns ± 1%  -0.36%  (p=0.000 n=24+25)
RegexpMatchEasy1_1K-8       350ns ± 1%     348ns ± 3%  -0.59%  (p=0.007 n=25+25)
RegexpMatchMedium_32-8      122ns ± 1%     121ns ± 0%  -1.08%  (p=0.000 n=25+18)
RegexpMatchMedium_1K-8     36.1µs ± 1%    34.6µs ± 1%  -4.02%  (p=0.000 n=25+25)
RegexpMatchHard_32-8       1.69µs ± 2%    1.65µs ± 1%  -2.38%  (p=0.000 n=25+25)
RegexpMatchHard_1K-8       50.8µs ± 1%    49.4µs ± 1%  -2.69%  (p=0.000 n=25+24)
Revcomp-8                   453ms ± 2%     449ms ± 3%  -0.74%  (p=0.022 n=25+24)
Template-8                 63.2ms ± 2%    63.4ms ± 1%    ~     (p=0.127 n=25+24)
TimeParse-8                 313ns ± 1%     315ns ± 3%    ~     (p=0.924 n=24+25)
TimeFormat-8                294ns ± 1%     292ns ± 2%  -0.65%  (p=0.004 n=23+24)
[Geo mean]                 49.9µs         49.6µs       -0.65%

name                     old speed      new speed      delta
GobDecode-8               112MB/s ± 1%   110MB/s ± 4%  -1.00%  (p=0.036 n=24+24)
GobEncode-8               130MB/s ± 1%   129MB/s ± 1%    ~     (p=0.894 n=25+24)
Gzip-8                   72.7MB/s ± 2%  73.0MB/s ± 2%    ~     (p=0.208 n=25+24)
Gunzip-8                  549MB/s ± 1%   545MB/s ± 1%  -0.70%  (p=0.000 n=25+25)
JSONEncode-8              201MB/s ± 1%   198MB/s ± 3%  -1.29%  (p=0.000 n=25+25)
JSONDecode-8             40.7MB/s ± 1%  39.8MB/s ± 5%  -2.23%  (p=0.000 n=25+25)
GoParse-8                16.2MB/s ± 4%  16.3MB/s ± 1%    ~     (p=0.211 n=25+24)
RegexpMatchEasy0_32-8     354MB/s ± 1%   356MB/s ± 1%  +0.47%  (p=0.000 n=25+21)
RegexpMatchEasy0_1K-8    5.00GB/s ± 0%  4.99GB/s ± 1%    ~     (p=0.588 n=24+24)
RegexpMatchEasy1_32-8     377MB/s ± 1%   378MB/s ± 1%  +0.39%  (p=0.000 n=25+25)
RegexpMatchEasy1_1K-8    2.92GB/s ± 1%  2.94GB/s ± 3%  +0.65%  (p=0.008 n=25+25)
RegexpMatchMedium_32-8   8.14MB/s ± 1%  8.22MB/s ± 1%  +0.98%  (p=0.000 n=25+24)
RegexpMatchMedium_1K-8   28.4MB/s ± 1%  29.6MB/s ± 1%  +4.19%  (p=0.000 n=25+25)
RegexpMatchHard_32-8     18.9MB/s ± 2%  19.4MB/s ± 1%  +2.43%  (p=0.000 n=25+25)
RegexpMatchHard_1K-8     20.2MB/s ± 1%  20.7MB/s ± 1%  +2.76%  (p=0.000 n=25+24)
Revcomp-8                 561MB/s ± 2%   566MB/s ± 3%  +0.75%  (p=0.021 n=25+24)
Template-8               30.7MB/s ± 2%  30.6MB/s ± 1%    ~     (p=0.131 n=25+24)
[Geo mean]                120MB/s        121MB/s       +0.48%

https://perf.golang.org/search?q=upload:20181004.6

Change-Id: I97f9fee34577961a116a8ddd445c6272253f0f95
Reviewed-on: https://go-review.googlesource.com/c/139837
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 25, 2018

Change https://golang.org/cl/144717 mentions this issue: runtime: add predecessor method to treap

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 25, 2018

Change https://golang.org/cl/144718 mentions this issue: runtime: add successor method to treap

gopherbot pushed a commit that referenced this issue Oct 30, 2018

runtime: add predecessor method to treap
This change adds a method for computing a treap node's predecessor
to the treap, which will simplify the implementation of algorithms
used for heap growth scavenging.

For #14045.

Change-Id: Id203e4bd246db3504f2f0c5163ec36f4579167df
Reviewed-on: https://go-review.googlesource.com/c/144717
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Oct 30, 2018

runtime: add successor method to treap
This change adds a method for computing a treap node's successor
to the treap, which will simplify the implementation of algorithms
used for heap growth scavenging.

For #14045.

Change-Id: If2af3f2707dbcbef5fb6e42cb2712061f9da5129
Reviewed-on: https://go-review.googlesource.com/c/144718
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Oct 30, 2018

runtime: separate scavenged spans
This change adds a new treap to mheap which contains scavenged (i.e.
its physical pages were returned to the OS) spans.

As of this change, spans may no longer be partially scavenged.

For #14045.

Change-Id: I0d428a255c6d3f710b9214b378f841b997df0993
Reviewed-on: https://go-review.googlesource.com/c/139298
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Oct 30, 2018

runtime: remove npreleased in favor of boolean
This change removes npreleased from mspan since spans may now either be
scavenged or not scavenged; how many of its pages were actually scavenged
doesn't matter. It saves some space in mpsan overhead too, as the boolean
fits into what would otherwise be struct padding.

For #14045.

Change-Id: I63f25a4d98658f5fe21c6a466fc38c59bfc5d0f5
Reviewed-on: https://go-review.googlesource.com/c/139737
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Oct 30, 2018

runtime: sysUsed spans after trimming
Currently, we mark a whole span as sysUsed before trimming, but this
unnecessarily tells the OS that the trimmed section from the span is
used when it may have been scavenged, if s was scavenged. Overall,
this just makes invocations of sysUsed a little more fine-grained.

It does come with the caveat that now heap_released needs to be managed
a little more carefully in allocSpanLocked. In this case, we choose to
(like before this change) negate any effect the span has on
heap_released before trimming, then add it back if the trimmed part is
scavengable.

For #14045.

Change-Id: Ifa384d989611398bfad3ca39d3bb595a5962a3ea
Reviewed-on: https://go-review.googlesource.com/c/140198
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

@gopherbot gopherbot closed this in c803ffc Oct 30, 2018

gopherbot pushed a commit that referenced this issue Oct 30, 2018

runtime: add physical memory scavenging test
This change introduces a test to malloc_test which checks for overuse
of physical memory in the large object treap. Due to fragmentation,
there may be many pages of physical memory that are sitting unused in
large-object space.

For #14045.

Change-Id: I3722468f45063b11246dde6301c7ad02ae34be55
Reviewed-on: https://go-review.googlesource.com/c/138918
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 30, 2018

Change https://golang.org/cl/145999 mentions this issue: runtime: scavenge an mspan's heapBits

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 4, 2019

Change https://golang.org/cl/156417 mentions this issue: runtime: scavenge before heap growth, not after

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 14, 2019

Change https://golang.org/cl/157837 mentions this issue: runtime: don't coalesce scavenged spans with unscavenged spans

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 16, 2019

Change https://golang.org/cl/158078 mentions this issue: runtime: don't coalesce scavenged spans with unscavenged spans

gopherbot pushed a commit that referenced this issue Jan 17, 2019

runtime: don't coalesce scavenged spans with unscavenged spans
As a result of changes earlier in Go 1.12, the scavenger became much
more aggressive. In particular, when scavenged and unscavenged spans
coalesced, they would always become scavenged. This resulted in most
spans becoming scavenged over time. While this is good for keeping the
RSS of the program low, it also causes many more undue page faults and
many more calls to madvise.

For most applications, the impact of this was negligible. But for
applications that repeatedly grow and shrink the heap by large amounts,
the overhead can be significant. The overhead was especially obvious on
older versions of Linux where MADV_FREE isn't available and
MADV_DONTNEED must be used.

This change makes it so that scavenged spans will never coalesce with
unscavenged spans. This  results in fewer page faults overall. Aside
from this, the expected impact of this change is more heap growths on
average, as span allocations will be less likely to be fulfilled. To
mitigate this slightly, this change also coalesces spans eagerly after
scavenging, to at least ensure that all scavenged spans and all
unscavenged spans are coalesced with each other.

Also, this change adds additional logic in the case where two adjacent
spans cannot coalesce. In this case, on platforms where the physical
page size is larger than the runtime's page size, we realign the
boundary between the two adjacent spans to a physical page boundary. The
advantage of this approach is that "unscavengable" spans, that is, spans
which cannot be scavenged because they don't cover at least a single
physical page are grown to a size where they have a higher likelihood of
being discovered by the runtime's scavenging mechanisms when they border
a scavenged span. This helps prevent the runtime from accruing pockets
of "unscavengable" memory in between scavenged spans, preventing them
from coalescing.

We specifically choose to apply this logic to all spans because it
simplifies the code, even though it isn't strictly necessary. The
expectation is that this change will result in a slight loss in
performance on platforms where the physical page size is larger than the
runtime page size.

Update #14045.

Change-Id: I64fd43eac1d6de6f51d7a2ecb72670f10bb12589
Reviewed-on: https://go-review.googlesource.com/c/158078
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 29, 2019

Change https://golang.org/cl/159500 mentions this issue: runtime: scavenge memory upon allocating scavenged memory

gopherbot pushed a commit that referenced this issue Jan 31, 2019

runtime: scavenge memory upon allocating from scavenged memory
Because scavenged and unscavenged spans no longer coalesce, memory that
is freed no longer has a high likelihood of being re-scavenged. As a
result, if an application is allocating at a fast rate, it may work fast
enough to undo all the scavenging work performed by the runtime's
current scavenging mechanisms. This behavior is exacerbated by the
global best-fit allocation policy the runtime uses, since scavenged
spans are just as likely to be chosen as unscavenged spans on average.

To remedy that, we treat each allocation of scavenged space as a heap
growth, and scavenge other memory to make up for the allocation.

This change makes performance of the runtime slightly worse, as now
we're scavenging more often during allocation. The regression is
particularly obvious with the garbage benchmark (3%) but most of the Go1
benchmarks are within the margin of noise. A follow-up change should
help.

Garbage: https://perf.golang.org/search?q=upload:20190131.3
Go1:     https://perf.golang.org/search?q=upload:20190131.2

Updates #14045.

Change-Id: I44a7e6586eca33b5f97b6d40418db53a8a7ae715
Reviewed-on: https://go-review.googlesource.com/c/159500
Reviewed-by: Austin Clements <austin@google.com>

nebulabox added a commit to nebulabox/go that referenced this issue Feb 18, 2019

runtime: scavenge memory upon allocating from scavenged memory
Because scavenged and unscavenged spans no longer coalesce, memory that
is freed no longer has a high likelihood of being re-scavenged. As a
result, if an application is allocating at a fast rate, it may work fast
enough to undo all the scavenging work performed by the runtime's
current scavenging mechanisms. This behavior is exacerbated by the
global best-fit allocation policy the runtime uses, since scavenged
spans are just as likely to be chosen as unscavenged spans on average.

To remedy that, we treat each allocation of scavenged space as a heap
growth, and scavenge other memory to make up for the allocation.

This change makes performance of the runtime slightly worse, as now
we're scavenging more often during allocation. The regression is
particularly obvious with the garbage benchmark (3%) but most of the Go1
benchmarks are within the margin of noise. A follow-up change should
help.

Garbage: https://perf.golang.org/search?q=upload:20190131.3
Go1:     https://perf.golang.org/search?q=upload:20190131.2

Updates golang#14045.

Change-Id: I44a7e6586eca33b5f97b6d40418db53a8a7ae715
Reviewed-on: https://go-review.googlesource.com/c/159500
Reviewed-by: Austin Clements <austin@google.com>

nebulabox added a commit to nebulabox/go that referenced this issue Feb 20, 2019

runtime: scavenge memory upon allocating from scavenged memory
Because scavenged and unscavenged spans no longer coalesce, memory that
is freed no longer has a high likelihood of being re-scavenged. As a
result, if an application is allocating at a fast rate, it may work fast
enough to undo all the scavenging work performed by the runtime's
current scavenging mechanisms. This behavior is exacerbated by the
global best-fit allocation policy the runtime uses, since scavenged
spans are just as likely to be chosen as unscavenged spans on average.

To remedy that, we treat each allocation of scavenged space as a heap
growth, and scavenge other memory to make up for the allocation.

This change makes performance of the runtime slightly worse, as now
we're scavenging more often during allocation. The regression is
particularly obvious with the garbage benchmark (3%) but most of the Go1
benchmarks are within the margin of noise. A follow-up change should
help.

Garbage: https://perf.golang.org/search?q=upload:20190131.3
Go1:     https://perf.golang.org/search?q=upload:20190131.2

Updates golang#14045.

Change-Id: I44a7e6586eca33b5f97b6d40418db53a8a7ae715
Reviewed-on: https://go-review.googlesource.com/c/159500
Reviewed-by: Austin Clements <austin@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.