Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: sweep increased allocation count #31006

Closed
chathaway-codes opened this issue Mar 22, 2019 · 18 comments
Closed

runtime: sweep increased allocation count #31006

chathaway-codes opened this issue Mar 22, 2019 · 18 comments
Milestone

Comments

@chathaway-codes
Copy link

@chathaway-codes chathaway-codes commented Mar 22, 2019

What version of Go are you using (go version)?

We've encountered this issue with go versions:

  • go version go1.10.4 linux/amd64
  • go version go1.11.5 linux/amd64
  • go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

We've seen it on a number of boxes running x86_64 processors and various operating systems. To help debug this, we've reproduced it in a Docker container. Information below is from the Docker container.

go env Output
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/root/go"
GORACE=""
GOROOT="/usr/lib/go-1.10"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.10/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build162370947=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We (YottaDB) have written a Go wrapper for our NoSQL database. The codebase for the database is used in production in many places, the Go wrapper is currently declared "field test grade". We have a test which exercises various methods in our wrapper, and it seems to consistently produce an error message if GOGC is set low enough.

What did you expect to see?

A successful run

What did you see instead?

A crash. The crash was rare, until we found some other bug reports which suggested setting GOGC low to reproduce it.

root@4c74966bf3f4:~/go/src/random_walk# ./random_walk -test.bench=. &> out.txt
root@4c74966bf3f4:~/go/src/random_walk# head out.txt
goos: linux
goarch: amd64
pkg: random_walk
BenchmarkRW1threads10depth5tp-4       	   20000	     89962 ns/op
BenchmarkRW10threads10depth5tp-4      	   20000	    100239 ns/op
runtime: nelems=8 nalloc=4 previous allocCount=3 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x543fe8, 0x20)

We've also seen errors like this:

goos: linux
goarch: amd64
pkg: random_walk
BenchmarkRW1threads10depth5tp-4       	   20000	     77309 ns/op
BenchmarkRW10threads10depth5tp-4      	   20000	     68258 ns/op
BenchmarkRW100threads10depth5tp-4     	runtime: pointer 0xc42023316c to unallocated span idx=0x119 span.base()=0xc420232000 span.limit=0xc420233f80 span.state=3
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw(0x547467, 0x3e)

We've stashed the test environment in a Docker container which can be pulled with:

docker pull charlescodes/bug-containers:golang_alloc_sweep_bug
docker run -ti charlescodes/bug-containers:golang_alloc_sweep_bug bash
root@a677cf6e1ae5:/data# cd
root@a677cf6e1ae5:~# cd go/src/random_walk/
root@a677cf6e1ae5:~/go/src/random_walk# head out.txt
goos: linux
goarch: amd64
pkg: random_walk
BenchmarkRW1threads10depth5tp-4       	runtime: nelems=8 nalloc=8 previous allocCount=7 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x543fe8, 0x20)
	/usr/lib/go-1.10/src/runtime/panic.go:616 +0x81
runtime.(*mspan).sweep(0x7f26a90ac7d0, 0xc4200ed000, 0x41c400)

To rerun the test in that Docker container:

source $(pkg-config --variable=prefix yottadb)/ydb_env_set
wget https://gitlab.com/YottaDB/DB/YDBTest/raw/master/go/inref/random_walk_test.go
export GOGC=2
go test -bench=.

The source code for the test program which produces this crash is available at https://gitlab.com/YottaDB/DB/YDBTest/blob/master/go/inref/random_walk_test.go. The YDBGo wrapper (https://gitlab.com/YottaDB/Lang/YDBGo) has CI setup which includes running go test with race, and we manually verified that the test script has no race conditions.

With GODEBUG=gccheckmark=1 set, this doesn't seem to reproduce as often. I don't see anything extra in the output of a crash though.

Thanks in advance for the help!

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 22, 2019

This error is commonly caused by invalid use of unsafe or cgo. Do you use unsafe or cgo?

@bradfitz bradfitz changed the title sweep increased allocation count runtime: sweep increased allocation count Mar 22, 2019
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 22, 2019

To answer my own question: yes. A fair bit. e.g. https://gitlab.com/YottaDB/Lang/YDBGo/blob/master/buffer_t.go#L51

Have you audited it for correctness?

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 22, 2019

Please try again with GODEBUG=cgocheck=2 per https://golang.org/cmd/cgo/#hdr-Passing_pointers

@chathaway-codes
Copy link
Author

@chathaway-codes chathaway-codes commented Mar 22, 2019

Yes, we use cgo and unsafe a fair amount. We've audit'd it for correctness as part of our review process, and have a number of tests which pass without issue. Some of those tests are rather long living, and do a great deal of intensive operations against our API.

With your suggestion to use the more strict cgocheck, we get the same result.

root@5c4b6162eba5:~/go/src/random_walk# export GODEBUG=cgocheck=2
root@5c4b6162eba5:~/go/src/random_walk# go test -bench=. &> out.txt
root@5c4b6162eba5:~/go/src/random_walk# head out.txt
goos: linux
goarch: amd64
pkg: random_walk
BenchmarkRW1threads10depth5tp-4       	runtime: nelems=128 nalloc=26 previous allocCount=25 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x543fc8, 0x20)
	/usr/lib/go-1.10/src/runtime/panic.go:616 +0x81
runtime.(*mspan).sweep(0x7f61d98aed10, 0xc420000000, 0x547e00)

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 22, 2019

@bcmills bcmills added this to the Go1.13 milestone May 14, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@pauldotknopf
Copy link

@pauldotknopf pauldotknopf commented Dec 19, 2019

What was the outcome of this issue? I see it is "waiting more info", but it has also been added to some milestones.

Is there a legitimate bug in the runtime that needs fixing?

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 19, 2019

Paul, probably not. Not without a repro. Or several independent reports. This is usually caused by incorrect unsafe code. There's new validation functionality in Go 1.14 to help users find such code though.

@pauldotknopf
Copy link

@pauldotknopf pauldotknopf commented Dec 19, 2019

Great. Are those additional checks in 1.14 currently in master? Any docs yet?

@chathaway-codes
Copy link
Author

@chathaway-codes chathaway-codes commented Dec 19, 2019

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 19, 2019

@pauldotknopf, https://tip.golang.org/doc/go1.14#compiler ...

This release adds -d=checkptr as a compile-time option for adding instrumentation to check that Go code is following unsafe.Pointer safety rules dynamically. This option is enabled by default with the -race ....

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 19, 2019

@chathaway-codes, I'll close this for now until there's something actionable for us to look into. If YottaDB folk have a repro we'd love to look.

@bradfitz bradfitz closed this Dec 19, 2019
@pauldotknopf
Copy link

@pauldotknopf pauldotknopf commented Dec 19, 2019

Quick update.

I was experiencing this error with 1.9. The error is very rare (only once so far, over the course of many many hours), but I could get the error to happen immediately with these env vars.

export GODEBUG=gccheckmark=1
export GOGC=2

When I updated to 1.13, when using the above env vars, I couldn't get the error to reproduce.

@ksbhaskar
Copy link

@ksbhaskar ksbhaskar commented Dec 19, 2019

It's not clear whether we actually solved a problem, or just worked around one, but after running across #23576 we made sure that we were forwarding signals only when our code was at safe points. Owing to YottaDB's database engine that runs in the address space of a process and optimistic concurrency control (https://docs.yottadb.com/MultiLangProgGuide/MultiLangProgGuide.html#transaction-processing), runtime stacks with ACID transaction logic have Go calling C calling Go (https://docs.yottadb.com/MultiLangProgGuide/goprogram.html#go-tpe and https://docs.yottadb.com/MultiLangProgGuide/goprogram.html#go-tpst). In any case, the YottaDB Go wrapper is stable and considered production grade. So, it is appropriate to close this Issue. Thank you.

@TUTUBIG
Copy link

@TUTUBIG TUTUBIG commented Sep 4, 2020

i have got this problem too

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 4, 2020

@TUTUBIG This issue is closed. If you still have this problem with Go 1.14 or 1.15, please open a new issue with instructions for how to reproduce the problem. Thanks.

@childeYin
Copy link

@childeYin childeYin commented Oct 27, 2020

go env 1.15 , but I hvve this problem, and can't find way to reproduce this problem,

runtime: nelems=28 nalloc=2 previous allocCount=1 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x10ac828, 0x20)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.(*mspan).sweep(0x7f0a34c7e2e8, 0x7f0a34c7e200, 0x7f0a30175400)
/usr/local/go/src/runtime/mgcsweep.go:328 +0x8c6
runtime.(*mcentral).uncacheSpan(0x19eaef8, 0x7f0a34c7e2e8)
/usr/local/go/src/runtime/mcentral.go:197 +0x79
runtime.(*mcache).releaseAll(0x7f0a34da78b8)
/usr/local/go/src/runtime/mcache.go:158 +0x6b
runtime.(*mcache).prepareForSweep(0x7f0a34da78b8)
/usr/local/go/src/runtime/mcache.go:185 +0x46
runtime.gcMarkTermination.func4.1(0xc00004e000)
/usr/local/go/src/runtime/mgc.go:1728 +0x2f
runtime.forEachP(0x10d6280)
/usr/local/go/src/runtime/proc.go:1359 +0x119
runtime.gcMarkTermination.func4()
/usr/local/go/src/runtime/mgc.go:1727 +0x2d
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 27, 2020

@childeYin This issue is closed. If you still have this problem with Go 1.14 or 1.15, please open a new issue with instructions for how to reproduce the problem. Thanks.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 27, 2020

Even if you can't reproduce the problem, it's still a different problem. It just happens to have the same symptoms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.