Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: long pauses STW (sweep termination) on massive block allocation #31222

Open
un000 opened this issue Apr 3, 2019 · 3 comments
Labels
Milestone

Comments

@un000
Copy link

@un000 un000 commented Apr 3, 2019

What version of Go are you using (go version)?

~ ᐅ go version
go version go1.12.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

go env Output
~ ᐅ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/un0/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/un0/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.12.1/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.12.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/bf/sm9v_xtn4tj6xz_rmmwp5tzm0000gn/T/go-build942797956=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

We have a big entity cache inside an application. So it's loading in background goroutine.
And when the application allocates a slice in the background the latency inside all response handlers increases.

Got a reproducer

package main

import (
	"fmt"
	"math/rand"
	"os"
	"runtime/trace"
	"time"
)

type Rates struct {
	a, b, c, d float64
}

var maxWait time.Duration

func main() {
	const size = 100e6

	out, err := os.OpenFile("prof", os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.ModePerm)
	if err != nil {
		panic(err)
	}
	defer func() {
		_ = out.Sync()
		_ = out.Close()
	}()
	err = trace.Start(out)
	if err != nil {
		panic(err)
	}

	// simulate some work
	go loop()

	time.Sleep(3 * time.Second)
	// make a huge allocation in parallel
	go func() {
		_ = make([]Rates, size)		// << huge alloc
	}()
	time.Sleep(3 * time.Second)

	trace.Stop()

	fmt.Println("maxWait =", maxWait.String())
}

func loop() {
	for {
		now := time.Now()
		r := make([]Rates, 0)
		for i := 0; i < 100; i++ {
			r = append(r, Rates{
				rand.Float64(), rand.Float64(), rand.Float64(), rand.Float64(),
			})
		}
		d := time.Since(now)
		if maxWait < d {
			maxWait = d
		}
	}
}

image

What did you expect to see?

Worker gorotine(loop in example) should not pause more than 1-5ms.

What did you see instead?

43ms pauses

@un000 un000 changed the title runtime GC: long pauses STW (sweep termination) on massive allocation runtime GC: long pauses STW (sweep termination) on massive block allocation Apr 3, 2019
@agnivade agnivade changed the title runtime GC: long pauses STW (sweep termination) on massive block allocation runtime: long pauses STW (sweep termination) on massive block allocation Apr 3, 2019
@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Oct 8, 2019

@mknyszek

This comment has been minimized.

Copy link
Contributor

@mknyszek mknyszek commented Oct 29, 2019

To be clear, the reproducer is making a 3.2 GiB allocation which is going to take a while for the OS to fulfill.

However, I dug into this a bit and I'm seeing ~1s pauses, sometimes with only ~50ms pauses.

The 50ms pauses represent a lower bound for how fast the OS can give us back pages. The ~1s pauses actually come from the fact that we're zeroing the entire 3.2 GiB allocation, even though it just came from the OS.

We have an optimization which avoids re-zeroing fresh pages, but unfortunately this optimization is conservative. For example, if a needzero span at the end of the heap is contiguous with the new 3.2 GiB allocation, then those two are going to coalesce, and the 3.2 GiB free space will be marked needzero, when really only the first N KiB of it actually needs to be zeroed.

One way we could fix this is by making needzero a number, and if we can guarantee that only the first N KiB need zeroing, then we only zero that. If it's some weird swiss cheese of non-zeroed memory then we just assume the whole thing needs zeroing, rather than trying to keep track of which parts need zeroing.

I'm not sure if this fix is worth it though, since future allocations would still have to go back and zero this memory anyway, causing the 1s delay to return.

@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Oct 29, 2019

For 1.15, one of my planned follow-up tasks on non-cooperative preemption is to make memclrNoHeapPointers preemptible. For large pointer-free allocations, this should make the long zeroing operation preemptible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.