Skip to content

net/http/pprof: Negative allocation counts produced by /debug/pprof/allocs?seconds=1 #49171

@felixge

Description

@felixge

What version of Go are you using (go version)?

$ go version
go version go1.17.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes, but not previous releases, e.g. go1.16.9.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/felix.geisendoerfer/Library/Caches/go-build"
GOENV="/Users/felix.geisendoerfer/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS="-count=1"
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/felix.geisendoerfer/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/felix.geisendoerfer/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.2/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.2/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.2"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mg/xwzs985x35789zk2l14nk8k80000gn/T/go-build895031316=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Continuous allocation profiling of a production application using delta profiles similar to running /debug/pprof/allocs?seconds=60 in a loop.

Below is the smallest reproduction of the issue that I've managed to create so far. It executes an APM instrumentation workload and continuously checks the delta allocation profile for negative values.

main.go
package main

import (
	"context"
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"time"

	"strings"

	"net/http"
	"net/http/httptest"
	"net/http/pprof"

	"github.com/google/pprof/profile"
	"google.golang.org/grpc/metadata"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/ext"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
)

func main() {
	log.Printf("starting")
	go memProfile()
	go workload()
	time.Sleep(10 * time.Second)
	log.Printf("unable to reproduce issue, shutting down")
}

func workload() {
	prevCtx := context.Background()
	for {
		_, prevCtx = startSpanFromContext(prevCtx, "foo", "bar", "op", tracer.AnalyticsRate(100))
	}
}

func memProfile() {
	h := pprof.Handler("allocs")

	for {
		recorder := httptest.NewRecorder()
		request, err := http.NewRequest("GET", "/?seconds=1", nil)
		if err != nil {
			panic(err)
		}
		h.ServeHTTP(recorder, request)

		profData := recorder.Body.Bytes()
		prof, err := profile.ParseData(profData)
		if err != nil {
			panic(err)
		}

		for si, s := range prof.Sample {
			for i, v := range s.Value {
				if v < 0 && prof.SampleType[i].Type == "alloc_objects" {
					st := prof.SampleType[i]
					log.Printf("negative delta value for: %s/%s: %d\n", st.Type, st.Unit, v)
					printDetails(prof, si, s)

					log.Printf("writing profile to alloc.pprof")
					ioutil.WriteFile("alloc.pprof", profData, 0666)
					os.Exit(1)
				}
			}
		}
	}
}

func printDetails(prof *profile.Profile, negI int, negSample *profile.Sample) {
	printSample(negI, negSample)

	negativeKey := sampleKey(negSample)
	for si, otherSample := range prof.Sample {
		if otherSample == negSample {
			continue
		}

		otherKey := sampleKey(otherSample)
		if otherKey == negativeKey {
			fmt.Printf("\nfound another sample with the same addrs:\n\n")
			printSample(si, otherSample)
		}
	}
}

func printSample(idx int, s *profile.Sample) {
	fmt.Printf("sample %d values=%v labels=%#v numLabels=%#v\n", idx, s.Value, s.Label, s.NumLabel)
	for _, loc := range s.Location {
		for i, line := range loc.Line {
			fmt.Printf("loc %d addr %d line %d %s() %s:%d\n", loc.ID, loc.Address, i, line.Function.Name, line.Function.Filename, line.Line)
		}
	}
}

func sampleKey(s *profile.Sample) string {
	var addrs []string
	for _, loc := range s.Location {
		addrs = append(addrs, fmt.Sprintf("%d", loc.Address))
	}
	return strings.Join(addrs, ",")
}

func startSpanFromContext(
	ctx context.Context, method, operation, service string, opts ...tracer.StartSpanOption,
) (ddtrace.Span, context.Context) {
	opts = append(opts,
		tracer.ServiceName(service),
		tracer.ResourceName(method),
		tracer.Tag("method", method),
		tracer.SpanType(ext.AppTypeRPC),
	)
	metadata.FromIncomingContext(ctx) // nil is ok
	return tracer.StartSpanFromContext(ctx, operation, opts...)
}

type MDCarrier metadata.MD

var _ tracer.TextMapWriter = (*MDCarrier)(nil)
var _ tracer.TextMapReader = (*MDCarrier)(nil)

// Get will return the first entry in the metadata at the given key.
func (mdc MDCarrier) Get(key string) string {
	if m := mdc[key]; len(m) > 0 {
		return m[0]
	}
	return ""
}

// Set will add the given value to the values found at key. Key will be lowercased to match
// the metadata implementation.
func (mdc MDCarrier) Set(key, val string) {
	k := strings.ToLower(key) // as per google.golang.org/grpc/metadata/metadata.go
	mdc[k] = append(mdc[k], val)
}

// ForeachKey will iterate over all key/value pairs in the metadata.
func (mdc MDCarrier) ForeachKey(handler func(key, val string) error) error {
	for k, vs := range mdc {
		for _, v := range vs {
			if err := handler(k, v); err != nil {
				return err
			}
		}
	}
	return nil
}

Sorry about not being able to make it smaller than this, but so far I wasn't able to trigger the problem with simpler workloads.

What did you expect to see?

alloc_objects should never be negative. So the example program should exit after 10s printing unable to reproduce issue, shutting down.

What did you see instead?

The example program detects negative alloc_objects and prints debugging output like shown below.

2021/10/27 01:39:02 starting
2021/10/27 01:39:08 negative delta value for: alloc_objects/count: -10923
sample 0 values=[-10923 -524312 0 0] labels=map[string][]string(nil) numLabels=map[string][]int64{"bytes":[]int64{48}}
loc 1 addr 19853706 line 0 main.startSpanFromContext() /Users/felix.geisendoerfer/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.33.0/ddtrace/tracer/option.go:483
loc 2 addr 19850287 line 0 main.workload() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:34

found another sample with the same addrs:

sample 7 values=[10923 524312 0 0] labels=map[string][]string(nil) numLabels=map[string][]int64{"bytes":[]int64{48}}
loc 10 addr 19853706 line 0 main.startSpanFromContext() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:483
loc 2 addr 19850287 line 0 main.workload() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:34
exit status 1

Additionally go tool pprof -http=:6060 alloc.pprof throws JS errors when opening an alloc flamegraph for the profile.

image

A few things to note:

  • The produced profile contains two samples with the same addrs (19853706, 19850287), but the filename/line symbolization for 19853706 is different between the samples (option.go:483 vs main.go:483).
  • Neither filename/line symbolization for 19853706 appears to be correct.
  • The negative samples seem to be a result of the internal profile.Merge() call that can't match some stack traces with the same addrs from p0 and p1 because they got symbolized differently (i.e. the origin of this problem is unrelated to delta profiles, but delta profiles are where this is having a very bad impact, so I'm using it as the main example).
  • Different runs of the test program will produce different results, sometimes it will pass. I suspect this is due to the random sampling of the memory profiler.
  • Setting runtime.MemProfileRate = 1 on top of main() seems to make the test program always pass.
  • Building with -gcflags='-l' doesn't fix the issue. pprof also reports samples that claim that inlining is still happening, see example below (note how loc 17 has two lines). However -m doesn't output any inlining taking place 🤔
2021/10/27 01:50:55 starting
2021/10/27 01:50:59 negative delta value for: alloc_objects/count: -21846
sample 3 values=[-21846 -1048624 -21846 -1048624] labels=map[string][]string(nil) numLabels=map[string][]int64{"bytes":[]int64{48}}
loc 16 addr 17354981 line 0 context.WithValue() /usr/local/Cellar/go/1.17.2/libexec/src/context/context.go:533
loc 17 addr 19848324 line 0 gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.ContextWithSpan() /Users/felix.geisendoerfer/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.33.0/ddtrace/tracer/context.go:21
loc 17 addr 19848324 line 1 gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.StartSpanFromContext() /Users/felix.geisendoerfer/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.33.0/ddtrace/tracer/context.go:50
loc 18 addr 19855951 line 0 main.startSpanFromContext() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:111
loc 19 addr 19853062 line 0 main.workload() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:34

found another sample with the same addrs:

sample 11 values=[21846 1048624 21846 1048624] labels=map[string][]string(nil) numLabels=map[string][]int64{"bytes":[]int64{48}}
loc 16 addr 17354981 line 0 context.WithValue() /usr/local/Cellar/go/1.17.2/libexec/src/context/context.go:533
loc 37 addr 19848324 line 0 gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.ContextWithSpan() /Users/felix.geisendoerfer/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.33.0/ddtrace/tracer/context.go:21
loc 37 addr 19848324 line 1 gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.StartSpanFromContext() /Users/felix.geisendoerfer/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.33.0/ddtrace/tracer/option.go:50
loc 18 addr 19855951 line 0 main.startSpanFromContext() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:111
loc 19 addr 19853062 line 0 main.workload() /Users/felix.geisendoerfer/go/src/github.com/felixge/dump/pprof-symbol-bug/main.go:34

Theories

My best guess is that this is somehow related to go1.17 being able to inline functions containing closures.

Most samples with negative alloc values that I've seen involve functions that call functions containing closures. The issue also doesn't seem to be reproducible in go1.16.

However, the issue can still be reproduced in go 1.17 when disabling inlining with -l, so I'm not sure ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions