New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: sweep increased allocation count, go1.9.x #22781

Closed
pciet opened this Issue Nov 17, 2017 · 40 comments

Comments

Projects
None yet
@pciet
Copy link
Contributor

pciet commented Nov 17, 2017

What version of Go are you using (go version)?

go version go1.9.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Ubuntu Server 16.04.3 LTS with PostgreSQL 9.5.10 and OpenSSH_7.2p2 selected during install.

Intel(R) Core(TM) i5-6400 CPU

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/pciet/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Load testing by acting as 40 clients automatically without delays between actions playing my chess game implemented as a Go HTTP server with a postgres database, which should partially simulate a load of hundreds of players. Crash within 30 seconds as the number of active clients ramps up. The clients are making HTTP calls from a separate 2015 MacBook Pro on the local network.

What did you expect to see?

The load test should run indefinitely.

What did you see instead?

Crash with the signature "fatal error: sweep increased allocation count" reproducible within 30 seconds. The crash does not happen with a client count of eight or less. The crash does not happen with go1.6.4, go1.7.6, or go1.8.4, but does with go1.9, go1.9.1, and go1.9.2.

A race condition is warned with go build -race set, but the memory indicated should not have been shared between goroutines.

My investigation is detailed here, with stack traces (with and without GODEBUG=gccheckmark=1) and the race warnings: pciet/wichess#19

My email to golang-nuts: https://groups.google.com/forum/#!topic/golang-nuts/7lcongdGOMM

I'll try to put together a case with less code and without the database.

@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Nov 17, 2017

A race condition is warned with go build -race set, but the memory indicated should not have been shared between goroutines.

If the code triggers a race condition warning, then there's a race condition in the code and the program is not valid. Can you explain why do you believe that the reported race is not actually a race? Are you suggesting there's a bug in the race detector?

@ALTree ALTree changed the title fatal error: sweep increased allocation count, go1.9.x runtime: fatal error: sweep increased allocation count, go1.9.x Nov 17, 2017

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Nov 17, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Nov 18, 2017

If the code triggers a race condition warning, then there's a race condition in the code and the program is not valid.

Nevertheless, Go tries not to crash just because there's a data race. (Go doesn't makes as many promises as Java in the face of a data race, but it does try not to crash.)

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 18, 2017

Read at 0x00c4201c4ef0 by goroutine 125:
  github.com/pciet/wichess/wichessing.Board.MovesFromPoint()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_moves.go:216 +0x538
...
Previous write at 0x00c4201c4ef0 by goroutine 49:
  [failed to restore the stack]
214	for movetype, unfilteredpaths := range TruncatedAbsPathsForKind(the.Piece.Kind, the.AbsPoint, the.Piece.Orientation) {
215		for path, _ := range b.ActualPaths(the, movetype, unfilteredpaths) {
216			for _, pt := range path.Points {
...
384 func (b Board) ActualPaths(the Point, movetype PathType, unfilteredpaths AbsPathSet) AbsPathSet {
...
388	actualPaths := make(AbsPathSet)
...
398			filteredPath := AbsPath{
399				Points: make([]AbsPoint, 0, len(path.Points)),
400			}
...
437				filteredPath.Points = append(filteredPath.Points, point)
438			}
439			if len(filteredPath.Points) > 0 {
440				actualPaths[filteredPath.Copy()] = struct{}{}
441			}

These path datatypes represent the moves a chess piece can make on the board. The function TruncatedAbsPathsForKind translates predefined move patterns relative from zero for a specific piece into a set of AbsPath made of AbsPoint (absolute path, absolute point) that fits in the range 0-7 (truncated means the edge of the board was detected before the RelPath could be completed).

Those path generating functions are called multiple times in these Board methods:

// Moves, Check, Checkmate.
func (b Board) Moves(active Orientation, previousFrom AbsPoint, previousTo AbsPoint) (map[AbsPoint]AbsPointSet, bool, bool) {
...
func (b Board) Checkmate(turn Orientation, previousFrom AbsPoint, previousTo AbsPoint) bool {
...
func (b Board) Check(turn Orientation, previousFrom AbsPoint, previousTo AbsPoint) bool {
...
func (b Board) Draw(turn Orientation, previousFrom AbsPoint, previousTo AbsPoint) bool {
...
func (b Board) AfterMove(from AbsPoint, to AbsPoint, turn Orientation, previousFrom AbsPoint, previousTo AbsPoint) Board {
...
// Diff, Taken Pieces
func (b Board) Move(from AbsPoint, to AbsPoint, turn Orientation, previousFrom AbsPoint, previousTo AbsPoint) (PointSet, map[AbsPoint]*Piece) {

The application code (such as what happens when a player requests what moves are currently available for all pieces via a HTTP GET request) calls into these methods after reading the latest representation of the board from the database without sharing these data structures - each request is one goroutine, and the only shared memory is the postgres database and game state diffs sent to a websocket goroutine for each player currently on the game page. This shared memory for sending diffs is not pointed to in these race conditions.

==================
WARNING: DATA RACE
Read at 0x00c4205df498 by goroutine 49:
  github.com/pciet/wichess/wichessing.Board.ActualPaths()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_moves.go:395 +0xde3
  github.com/pciet/wichess/wichessing.Board.MovesFromPoint()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_moves.go:215 +0x9bd
  github.com/pciet/wichess/wichessing.Board.Move()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_move.go:126 +0x12b1
  github.com/pciet/wichess/wichessing.Board.AfterMove()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_move.go:10 +0x18a
  github.com/pciet/wichess/wichessing.Board.Check()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_check.go:47 +0x2ae
  github.com/pciet/wichess/wichessing.Board.Moves()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_moves.go:50 +0x5ca
  main.game.moves()
      /home/pciet/go/src/github.com/pciet/wichess/game_moving.go:266 +0x862
  main.movesHandler()
      /home/pciet/go/src/github.com/pciet/wichess/web_moves.go:55 +0x4d0
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:1918 +0x51
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2254 +0xa2
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2619 +0xbc
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1801 +0x83b

Previous write at 0x00c4205df498 by goroutine 125:
  github.com/pciet/wichess/wichessing.TruncatedAbsPathsForKind()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/path.go:19 +0x3d2
  github.com/pciet/wichess/wichessing.Board.MovesFromPoint()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_moves.go:214 +0x4a4
  github.com/pciet/wichess/wichessing.Board.AllNaiveMovesFor()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_moves.go:191 +0x1f1
  github.com/pciet/wichess/wichessing.Board.Check()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_check.go:39 +0xc4
  github.com/pciet/wichess/wichessing.Board.Checkmate()
      /home/pciet/go/src/github.com/pciet/wichess/wichessing/board_check.go:10 +0xb2
  main.listeningToGame.func2()
      /home/pciet/go/src/github.com/pciet/wichess/game_listening.go:114 +0x31b

Goroutine 49 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/go/src/net/http/server.go:2720 +0x37c
  net/http.(*Server).ListenAndServe()
      /usr/local/go/src/net/http/server.go:2636 +0xc7
  net/http.ListenAndServe()
      /usr/local/go/src/net/http/server.go:2882 +0xfe
  main.main()
      /home/pciet/go/src/github.com/pciet/wichess/main.go:58 +0x5e6

Goroutine 125 (running) created at:
  main.listeningToGame()
      /home/pciet/go/src/github.com/pciet/wichess/game_listening.go:100 +0x7ec
  main.moveNotificationWebsocketHandler()
      /home/pciet/go/src/github.com/pciet/wichess/web_moven.go:74 +0x5e1
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:1918 +0x51
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2254 +0xa2
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2619 +0xbc
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1801 +0x83b
path.go:
18		for path, _ := range paths {
19			availablepath := AbsPath{
20				Points: make([]AbsPoint, 0, len(*path)),
21			}
...
82 type AbsPath struct {
83 	Points    []AbsPoint
84 	Truncated bool
85 }
board_moves.go:
394		for path, _ := range unfilteredpaths {
395			if path.Truncated && the.MustEnd {
396				continue
397			}

The race is that an AbsPath was written into memory, then Truncated is read.

In this case the writing goroutine is one of those listening goroutines - it has put together a representation of all moves possible for the board state because it is determining if a checkmate state exists and would have returned if so (the goroutine received a notification that a move was made, or has just started and is checking initial conditions).

The reading goroutine is a player requesting what all moves are for the current game state, likely because they just moved and need the next move set.

There is no shared memory here besides the base representation of the moves for each piece, which is used to create new AbsPaths and is not referenced outside of TruncatedAbsPathsForKind. A Board instance is created from each database read and operated on in the reading goroutine - there are more database reads than necessary and more garbage memory created than necessary, but I'm getting it to work first before sharing memory via caching. The database interactions are mutex locked at the game level.

If the code triggers a race condition warning, then there's a race condition in the code and the program is not valid. Can you explain why do you believe that the reported race is not actually a race? Are you suggesting there's a bug in the race detector?

I've asked in the golang-nuts thread for a review of the TruncatedAbsPathsForKind function as it seems like a common source of the data races. But since these paths are created only for the calling goroutine I don't understand how different goroutines race on them. The crash indicates a pointer problem, which I've thought could be incorrect garbage collection or some sort of memory clobbering that tricks the race detector. Most likely I'm doing something wrong of course.

Nevertheless, Go tries not to crash just because there's a data race. (Go doesn't makes as many promises as Java in the face of a data race, but it does try not to crash.)

The server does not crash but does indicate a data race if I simulate four games or less (on a four core processor) but does crash within 30 seconds if the number of games increases. My test client is making moves and requesting available moves as fast as possible (~5 milliseconds per request) unsynchronized except for the interlock of playing the game.

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 18, 2017

Crash on go1.9.2 darwin/amd64 testing localhost (macOS 10.12.6) with client count 40:

http: panic serving [::1]:62998: runtime error: invalid memory address or nil pointer dereference
goroutine 771 [running]:
net/http.(*conn).serve.func1(0xc420088f00)
	/usr/local/go/src/net/http/server.go:1697 +0xd0
panic(0x13bbe60, 0x164a3e0)
	/usr/local/go/src/runtime/panic.go:491 +0x283
github.com/pciet/wichess/wichessing.Board.ActualPaths(0xc4203b8de0, 0x0, 0xc4203b8e10, 0x1, 0xc4203b8e40, 0x2, 0xc4203b8e70, 0x3, 0xc4203b8ea0, 0x4, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/wichessing/board_moves.go:402 +0x209
github.com/pciet/wichess/wichessing.Board.MovesFromPoint(0xc4203b8de0, 0x0, 0xc4203b8e10, 0x1, 0xc4203b8e40, 0x2, 0xc4203b8e70, 0x3, 0xc4203b8ea0, 0x4, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/wichessing/board_moves.go:215 +0x708
github.com/pciet/wichess/wichessing.Board.AllNaiveMovesFor(0xc4203b8de0, 0x0, 0xc4203b8e10, 0x1, 0xc4203b8e40, 0x2, 0xc4203b8e70, 0x3, 0xc4203b8ea0, 0x4, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/wichessing/board_moves.go:191 +0x18c
github.com/pciet/wichess/wichessing.Board.Check(0xc4203b8de0, 0x0, 0xc4203b8e10, 0x1, 0xc4203b8e40, 0x2, 0xc4203b8e70, 0x3, 0xc4203b8ea0, 0x4, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/wichessing/board_check.go:39 +0xb6
github.com/pciet/wichess/wichessing.Board.Moves(0xc420253000, 0x0, 0xc420253038, 0x1, 0xc420253070, 0x2, 0xc4202530a8, 0x3, 0xc4202530e0, 0x4, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/wichessing/board_moves.go:50 +0x4f8
main.game.moves(0x7, 0x20, 0x1, 0xc420534cf0, 0x8, 0x0, 0x30b42b90, 0xed1a28276, 0xc42010a420, 0x3c75eac3, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/game_moving.go:266 +0x719
main.movesHandler(0x161d4a0, 0xc420165420, 0xc420150e00)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/web_moves.go:55 +0x31c
net/http.HandlerFunc.ServeHTTP(0x1448810, 0x161d4a0, 0xc420165420, 0xc420150e00)
	/usr/local/go/src/net/http/server.go:1918 +0x44
net/http.(*ServeMux).ServeHTTP(0x16580c0, 0x161d4a0, 0xc420165420, 0xc420150e00)
	/usr/local/go/src/net/http/server.go:2254 +0x130
net/http.serverHandler.ServeHTTP(0xc4200751e0, 0x161d4a0, 0xc420165420, 0xc420150e00)
	/usr/local/go/src/net/http/server.go:2619 +0xb4
net/http.(*conn).serve(0xc420088f00, 0x161dae0, 0xc42007d600)
	/usr/local/go/src/net/http/server.go:1801 +0x71d
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2720 +0x288
runtime: nelems=256 nalloc=9 previous allocCount=8 nfreed=65535
fatal error: sweep increased allocation count

Full trace: https://gist.github.com/pciet/37e676ab8cbeac40bf3c4b0daed7dca4

No crash for two games (four clients).

Crashes on three games (six clients): https://gist.github.com/pciet/8c2b29c07fd548254b1bc3b02984a0ba

Crashes on four games (eight clients):

runtime: nelems=256 nalloc=126 previous allocCount=125 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x143954b, 0x20)
	/usr/local/go/src/runtime/panic.go:605 +0x95 fp=0x700008a95cf0 sp=0x700008a95cd0 pc=0x102be55
runtime.(*mspan).sweep(0x1a09690, 0x1a09601, 0x0)
	/usr/local/go/src/runtime/mgcsweep.go:298 +0x911 fp=0x700008a95dc0 sp=0x700008a95cf0 pc=0x1021551
runtime.(*mcentral).cacheSpan(0x1661a90, 0xc420022560)
	/usr/local/go/src/runtime/mcentral.go:58 +0x2cd fp=0x700008a95e08 sp=0x700008a95dc0 pc=0x101671d
runtime.(*mcache).refill(0x18766c8, 0xc4203ed906, 0xc4204399f0)
	/usr/local/go/src/runtime/mcache.go:123 +0xa4 fp=0x700008a95e38 sp=0x700008a95e08 pc=0x10162e4
runtime.(*mcache).nextFree.func1()
	/usr/local/go/src/runtime/malloc.go:557 +0x32 fp=0x700008a95e60 sp=0x700008a95e38 pc=0x1053592
runtime.systemstack(0xc4202ca7b8)
	/usr/local/go/src/runtime/asm_amd64.s:344 +0x79 fp=0x700008a95e68 sp=0x700008a95e60 pc=0x10563a9
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1135 fp=0x700008a95e70 sp=0x700008a95e68 pc=0x1030650

goroutine 187 [running]:
runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:298 fp=0xc4205e3118 sp=0xc4205e3110 pc=0x1056320
runtime.(*mcache).nextFree(0x18766c8, 0x6, 0xc4205e3200, 0x122c0b0, 0x1617b20)
	/usr/local/go/src/runtime/malloc.go:556 +0xa9 fp=0xc4205e3170 sp=0xc4205e3118 pc=0x1011fd9
runtime.mallocgc(0x20, 0x13a8580, 0x1, 0x0)
	/usr/local/go/src/runtime/malloc.go:711 +0x6f0 fp=0xc4205e3218 sp=0xc4205e3170 pc=0x10128a0
runtime.newobject(0x13a8580, 0xc420576840)
	/usr/local/go/src/runtime/malloc.go:840 +0x38 fp=0xc4205e3248 sp=0xc4205e3218 pc=0x1012be8
github.com/lib/pq.(*conn).recv1(0xc42009ac80, 0xc4205e3331, 0xc420576840)
	/Users/aftermatt/Code/src/github.com/lib/pq/conn.go:793 +0x31 fp=0xc4205e3278 sp=0xc4205e3248 pc=0x122c3c1
github.com/lib/pq.(*conn).prepareTo(0xc42009ac80, 0xc4200ba080, 0x73, 0x0, 0x0, 0x0, 0x0, 0x1398900)
	/Users/aftermatt/Code/src/github.com/lib/pq/conn.go:566 +0x4bf fp=0xc4205e33e0 sp=0xc4205e3278 pc=0x122ad0f
github.com/lib/pq.(*conn).Exec(0xc42009ac80, 0xc4200ba080, 0x73, 0xc4201d40e0, 0x7, 0x7, 0x0, 0x0, 0x0, 0x0)
	/Users/aftermatt/Code/src/github.com/lib/pq/conn.go:660 +0x13b fp=0xc4205e3438 sp=0xc4205e33e0 pc=0x122babb
database/sql.ctxDriverExec(0x161db20, 0xc42007a1d8, 0x1a09220, 0xc42009ac80, 0xc4200ba080, 0x73, 0xc420270120, 0x7, 0x7, 0x102a729, ...)
	/usr/local/go/src/database/sql/ctxutil.go:43 +0x19c fp=0xc4205e34c0 sp=0xc4205e3438 pc=0x10f200c
database/sql.(*DB).execDC.func2()
	/usr/local/go/src/database/sql/sql.go:1253 +0x99 fp=0xc4205e3558 sp=0xc4205e34c0 pc=0x1100c39
database/sql.withLock(0x161b1e0, 0xc420134070, 0xc4205e36a8)
	/usr/local/go/src/database/sql/sql.go:2867 +0x65 fp=0xc4205e3580 sp=0xc4205e3558 pc=0x11004f5
database/sql.(*DB).execDC(0xc420089400, 0x161db20, 0xc42007a1d8, 0xc420134070, 0xc4205e3788, 0xc4200ba080, 0x73, 0xc420138580, 0x7, 0x8, ...)
	/usr/local/go/src/database/sql/sql.go:1252 +0x4e4 fp=0xc4205e3718 sp=0xc4205e3580 pc=0x10f83e4
database/sql.(*DB).exec(0xc420089400, 0x161db20, 0xc42007a1d8, 0xc4200ba080, 0x73, 0xc420138580, 0x7, 0x8, 0x1, 0xc4200ba080, ...)
	/usr/local/go/src/database/sql/sql.go:1238 +0x13e fp=0xc4205e37a8 sp=0xc4205e3718 pc=0x10f7eae
database/sql.(*DB).ExecContext(0xc420089400, 0x161db20, 0xc42007a1d8, 0xc4200ba080, 0x73, 0xc420138580, 0x7, 0x8, 0xc4200ba080, 0xc420256090, ...)
	/usr/local/go/src/database/sql/sql.go:1216 +0xe1 fp=0xc4205e3848 sp=0xc4205e37a8 pc=0x10f7ab1
database/sql.(*DB).Exec(0xc420089400, 0xc4200ba080, 0x73, 0xc420138580, 0x7, 0x8, 0x1, 0xc4205e3988, 0x14, 0x0)
	/usr/local/go/src/database/sql/sql.go:1230 +0x85 fp=0xc4205e38b8 sp=0xc4205e3848 pc=0x10f7d25
main.DB.updateGame(0xc420089400, 0x3, 0xc4205bd3b0, 0xc420285658, 0x7, 0x35, 0x25, 0x0)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/game.go:203 +0xd2f fp=0xc4205e3c20 sp=0xc4205e38b8 pc=0x133b27f
main.game.move(0x3, 0x24, 0x1, 0xc420285658, 0x7, 0x0, 0x38500bd0, 0xed1a2866b, 0xc42010bc80, 0x13730fdca, ...)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/game_moving.go:88 +0x952 fp=0xc4205e5c80 sp=0xc4205e3c20 pc=0x1341782
main.moveRequestHandler(0x161d4a0, 0xc42015a380, 0xc42012ca00)
	/Users/aftermatt/Code/src/github.com/pciet/wichess/web_move.go:104 +0xbf5 fp=0xc4205e7cb0 sp=0xc4205e5c80 pc=0x13501b5
net/http.HandlerFunc.ServeHTTP(0x1448808, 0x161d4a0, 0xc42015a380, 0xc42012ca00)
	/usr/local/go/src/net/http/server.go:1918 +0x44 fp=0xc4205e7cd8 sp=0xc4205e7cb0 pc=0x12bf724
net/http.(*ServeMux).ServeHTTP(0x16580c0, 0x161d4a0, 0xc42015a380, 0xc42012ca00)
	/usr/local/go/src/net/http/server.go:2254 +0x130 fp=0xc4205e7d18 sp=0xc4205e7cd8 pc=0x12c0ee0
net/http.serverHandler.ServeHTTP(0xc420075110, 0x161d4a0, 0xc42015a380, 0xc42012ca00)
	/usr/local/go/src/net/http/server.go:2619 +0xb4 fp=0xc4205e7d48 sp=0xc4205e7d18 pc=0x12c2454
net/http.(*conn).serve(0xc4201d20a0, 0x161dae0, 0xc420058980)
	/usr/local/go/src/net/http/server.go:1801 +0x71d fp=0xc4205e7fc8 sp=0xc4205e7d48 pc=0x12be5fd
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc4205e7fd0 sp=0xc4205e7fc8 pc=0x1058f11
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2720 +0x288

Full trace: https://gist.github.com/pciet/8189ffbb66bc46025186ceebccf2f6d6

This is a two core machine.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Nov 19, 2017

The race detector has very very few false positives. It is telling you that there is no synchronization between the code that creates the AbsPath value and the code that reads the Truncated field from it. You should look carefully at your locking.

@RLH

This comment has been minimized.

Copy link
Contributor

RLH commented Nov 19, 2017

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 19, 2017

Here's a simplified program that removes the application code and triggers the race and crash readily. The package wichessing does not contain the keyword 'go' and is an exact duplicate of the code in the original repository - the computation should have no concurrency. The only shared memory are global variables that are read and not written. My understanding is that there are no unsafe constructs.

https://github.com/pciet/debuggo22781

package main

import (
	"github.com/pciet/debuggo22781/wichessing"
)

const clients = 6

func main() {
	for i := 0; i < clients; i++ {
		go simulateClient()
	}
	select {} // block forever
}

func simulateClient() {
	for {
		board := wichessing.Board{}
		for i := 0; i < 64; i++ {
			board[i] = wichessing.Point{
				AbsPoint: wichessing.AbsPoint{wichessing.FileFromIndex(uint8(i)), wichessing.RankFromIndex(uint8(i))},
			}
		}
		setChessStart(&board)
		_, _, _ = board.Moves(wichessing.White, wichessing.AbsPoint{0, 8}, wichessing.AbsPoint{0, 8})
	}
}

func setChessStart(b *wichessing.Board) {
	b[0].Piece = &wichessing.Piece{
		Kind:        wichessing.Rook,
		Orientation: wichessing.White,
		Base:        wichessing.Rook,
	}
	b[1].Piece = &wichessing.Piece{
		Kind:        wichessing.Knight,
		Orientation: wichessing.White,
		Base:        wichessing.Knight,
	}
	b[2].Piece = &wichessing.Piece{
		Kind:        wichessing.Bishop,
		Orientation: wichessing.White,
		Base:        wichessing.Bishop,
	}
	b[3].Piece = &wichessing.Piece{
		Kind:        wichessing.Queen,
		Orientation: wichessing.White,
		Base:        wichessing.Queen,
	}
	b[4].Piece = &wichessing.Piece{
		Kind:        wichessing.King,
		Orientation: wichessing.White,
		Base:        wichessing.King,
	}
	b[5].Piece = &wichessing.Piece{
		Kind:        wichessing.Bishop,
		Orientation: wichessing.White,
		Base:        wichessing.Bishop,
	}
	b[6].Piece = &wichessing.Piece{
		Kind:        wichessing.Knight,
		Orientation: wichessing.White,
		Base:        wichessing.Knight,
	}
	b[7].Piece = &wichessing.Piece{
		Kind:        wichessing.Rook,
		Orientation: wichessing.White,
		Base:        wichessing.Rook,
	}
	for i := 0; i < 8; i++ {
		b[8+i].Piece = &wichessing.Piece{
			Kind:        wichessing.Pawn,
			Orientation: wichessing.White,
			Base:        wichessing.Pawn,
		}
	}
	b[63].Piece = &wichessing.Piece{
		Kind:        wichessing.Rook,
		Orientation: wichessing.Black,
		Base:        wichessing.Rook,
	}
	b[62].Piece = &wichessing.Piece{
		Kind:        wichessing.Knight,
		Orientation: wichessing.Black,
		Base:        wichessing.Knight,
	}
	b[61].Piece = &wichessing.Piece{
		Kind:        wichessing.Bishop,
		Orientation: wichessing.Black,
		Base:        wichessing.Bishop,
	}
	b[60].Piece = &wichessing.Piece{
		Kind:        wichessing.King,
		Orientation: wichessing.Black,
		Base:        wichessing.King,
	}
	b[59].Piece = &wichessing.Piece{
		Kind:        wichessing.Queen,
		Orientation: wichessing.Black,
		Base:        wichessing.Queen,
	}
	b[58].Piece = &wichessing.Piece{
		Kind:        wichessing.Bishop,
		Orientation: wichessing.Black,
		Base:        wichessing.Bishop,
	}
	b[57].Piece = &wichessing.Piece{
		Kind:        wichessing.Knight,
		Orientation: wichessing.Black,
		Base:        wichessing.Knight,
	}
	b[56].Piece = &wichessing.Piece{
		Kind:        wichessing.Rook,
		Orientation: wichessing.Black,
		Base:        wichessing.Rook,
	}
	for i := 0; i < 8; i++ {
		b[63-16+i].Piece = &wichessing.Piece{
			Kind:        wichessing.Pawn,
			Orientation: wichessing.Black,
			Base:        wichessing.Pawn,
		}
	}
	for i := 0; i < 64; i++ {
		if b[i].Piece == nil {
			continue
		}
		p := b[i].Piece.SetKindFlags()
		b[i].Piece = &p
	}
}

The race and crash are easy to trigger on my machine, although the crash signature is different.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1 pc=0x10f9884]

Races and crash trace here: https://gist.github.com/pciet/9c2251324184adbe3fdd25e6ded9160b

@RLH

This comment has been minimized.

Copy link
Contributor

RLH commented Nov 20, 2017

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 20, 2017

I am making the claim that there is no fixable race in this program as each goroutine is independent by only reading from shared memory (no shared writes).

The shared reads are in the first two functions of wichessing/path.go where written memory is created independently for each calling goroutine (this is also where the race is indicated). The shared memory is read for the for loop computation and otherwise unused (len, addition, then an addition or subtraction). The written memory is not shared between goroutines.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Nov 20, 2017

Normally when somebody blames the race detector, we're all very skeptical (see @ianlancetaylor's comment earlier), and I admit (from looking at this a few minutes) that the race detector isn't super helpful here.

Notes: (all on Linux)

-- there is only one "go" statement in the code.
-- the shared memory is set up in inits in paths.go
-- no reported race with Go 1.8
-- races reported (with missing stacks) with Go1.9 and master.

The missing stacks are usually lost:

Previous write at 0x00c4200140a7 by goroutine 9:
  [failed to restore the stack]

When they're not lost, they point to:

Previous write at 0x00c420480a78 by goroutine 8:
  github.com/pciet/debuggo22781/wichessing.TruncatedAbsPathsForKind()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/path.go:19 +0x3d2
  github.com/pciet/debuggo22781/wichessing.Board.MovesFromPoint()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/board_moves.go:214 +0x4a4
  github.com/pciet/debuggo22781/wichessing.Board.Move()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/board_move.go:126 +0x12b1
  github.com/pciet/debuggo22781/wichessing.Board.AfterMove()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/board_move.go:10 +0x18a
  github.com/pciet/debuggo22781/wichessing.Board.Check()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/board_check.go:47 +0x2ae
  github.com/pciet/debuggo22781/wichessing.Board.Checkmate()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/board_check.go:10 +0xb2
  github.com/pciet/debuggo22781/wichessing.Board.Moves()
      /home/bradfitz/src/github.com/pciet/debuggo22781/wichessing/board_moves.go:12 +0xd3
  main.simulateClient()
      /home/bradfitz/src/github.com/pciet/debuggo22781/main.go:28 +0xc9

Where line 19 is:

// This function will truncate moves that leave the board and convert the defined relative points to actual board points but does \
not remove points with other pieces or other similar constraints.                                                                  
func TruncatedAbsPathsForKind(the Kind, from AbsPoint, with Orientation) AbsPathSetMap {
        absmap := make(AbsPathSetMap)
        for movetype, paths := range RelPathMapForKind(the) {
                availablepaths := make(AbsPathSet)
                for path, _ := range paths {
                        availablepath := AbsPath{   // <---- Line 19
                                Points: make([]AbsPoint, 0, len(*path)),
                        }

I highly suspect the problem is this line:

         availablepath.Points = append(availablepath.Points, AbsPoint{File: uint8(absfile), Rank: uint8(absrank)})

Appends on shared memory are always a red flag in concurrent code.

I'm busy now for a few hours, but I can test that theory later unless others beat me to it.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Nov 20, 2017

Appends on shared memory are always a red flag in concurrent code.

(although I'm not convinced that memory is shared yet, either.)

The append was just my next area for investigation.

@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Nov 20, 2017

no reported race with Go 1.8

I also noticed this, so I started bisecting.

The first commit that triggers a race report on the reproducer above is ec091b6 (runtime: add mapassign_fast*).

It's possible that the race is real, but it's triggered by bad code generation and not by an error in the Go code itself. Next step: investigate changes in code generation introduced in the reproducer by that commit.

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 20, 2017

Here's what's happening in that trace (reversed):

main.simulateClient()

wichessing.Board.Moves()

The main goroutine put together the initial chess board state and then asks for all available moves for both players on that board. Each of the goroutines created is working on an independent board.

wichessing.Board.Checkmate()

Before determining moves it looks for a checkmate state.

wichessing.Board.Check()

If the king is not in check then its not a checkmate.

			// board_check.go
46			// some cases cause reactions that remove the King, such as an enemy detonator move and friendly guard adjacent
47			_, has := b.AfterMove(orig, *pt, orientation, previousFrom, previousTo).KingLocationFor(turn)
48			if has == false {
49				return true
50			}

The loop this is in is looking at every move the opponent can make and returning true if the king can be taken directly or indirectly (check). This AfterMove call is used to look for indirect takes.

wichessing.Board.AfterMove()

What does the board look like after a move is executed? The caller of AfterMove iterates over this looking for the king location.

wichessing.Board.Move()

Execute the move for the board returned by AfterMove. The call in Move is verifying the requested move is legal. Everything computed in here is discarded:

	// board_move.go
126	if b.MovesFromPoint(fromPoint, previousFrom, previousTo).Has(to) == false {
127		return PointSet{}, pieceset
128	}

wichessing.Board.MovesFromPoint()

What moves are available from this square on the board?

wichessing.TruncatedAbsPathsForKind()

What actual paths are there for this kind of piece at this square on the board?

Appends on shared memory are always a red flag in concurrent code.

The written memory in TruncatedAbsPathsForKind is discarded after two returns in this case. I do not believe those global variables make it past the reads in this function - the returned memory (written serially by a number of appends) is isolated in this call stack.

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 20, 2017

This smaller program crashes but does not race:

package main

import (
	"github.com/pciet/debuggo22781/wichessing"
)

const clients = 6

func main() {
	for i := 0; i < clients; i++ {
		go func() {
			for {
				// in wichessing/path.go
				_ = wichessing.TruncatedAbsPathsForKind(wichessing.Pawn, wichessing.AbsPoint{0, 1}, wichessing.White)
			}
		}()
	}
	select {} // block forever
}

https://gist.github.com/pciet/8ad292c81bdae9046bf5484797b333f2

TruncatedAbsPathsForKind only calls RelPathMapForKind, but looping over RelPathMapForKind calls instead doesn't trigger a race or crash.

type AbsPathSet map[*AbsPath]struct{}

type AbsPath struct {
	Points    []AbsPoint
	Truncated bool
}

// Absolute Point represents a specific point on the board.
type AbsPoint struct {
	File uint8
	Rank uint8
}

// The PathType is used for pieces with varying moves; the pawn is the base chess example with different first move, take moves, and regular moves.
type PathType int

const (
	First PathType = iota
	Move
	Take
	RallyMove
)

type AbsPathSetMap map[PathType]AbsPathSet

type RelPathSetMap map[PathType]RelPathSet

// All relative paths for a piece, used to calculate actual paths for a board state.
type RelPathSet map[*RelPath]struct{}

type RelPath []RelPoint

type RelPoint struct {
	XOffset int8
	YOffset int8
}

13 // This function will truncate moves that leave the board and convert the defined relative points to actual board points but does not remove points with other pieces or other similar constraints.
14 func TruncatedAbsPathsForKind(the Kind, from AbsPoint, with Orientation) AbsPathSetMap {
15	absmap := make(AbsPathSetMap)
16	for movetype, paths := range RelPathMapForKind(the) {
17		availablepaths := make(AbsPathSet)
18		for path, _ := range paths {
19			availablepath := AbsPath{
20				Points: make([]AbsPoint, 0, len(*path)),
21			}
22			truncated := false
23			for _, point := range *path {
24				absfile := int8(from.File) + point.XOffset
25				if (absfile > 7) || (absfile < 0) {
26					truncated = true
27					break
28				}
29				var absrank int8
30				if with == White {
31					absrank = int8(from.Rank) + point.YOffset
32				} else {
33					absrank = int8(from.Rank) - point.YOffset
34				}
35				if (absrank > 7) || (absrank < 0) {
36					truncated = true
37					break
38				}
39				availablepath.Points = append(availablepath.Points, AbsPoint{File: uint8(absfile), Rank: uint8(absrank)})
40			}
41			availablepath.Truncated = truncated
42			if len(availablepath.Points) != 0 {
43				availablepaths[&availablepath] = struct{}{}
44			}
45		}
46		if len(availablepaths) != 0 {
47			absmap[movetype] = availablepaths
48		}
49	}
50	return absmap
51 }
@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 21, 2017

clients = 1 and clients = 2 got my fan going but didn't crash. clients = 3 crashed after a few minutes (2-core MacBook Pro): https://gist.github.com/pciet/8ce5bf53455a22d09a06932ba4dc4f10

@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Nov 21, 2017

I got a slightly different crash message (always on 1.9.2):

runtime: pointer 0xc42023dfe0 to unused region of span idx=0x11e span.base()=0xc42023c000 span.limit=0xc42023df80 span.state=1
runtime: found in object at *(0xc420249cc0+0x8)
object=0xc420249cc0 k=0x6210124 s.base()=0xc420248000 s.limit=0xc420249fe0 s.spanclass=12 s.elemsize=80 s.state=_MSpanInUse
 *(object+0) = 0x6d
 *(object+8) = 0xc42023dfe0 <==
 *(object+16) = 0x0
 *(object+24) = 0x0
 *(object+32) = 0x0
 *(object+40) = 0x0
 *(object+48) = 0x0
 *(object+56) = 0x0
 *(object+64) = 0x0
 *(object+72) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

There is no unsafe or cgo in the code, and no race reports (apparently), so there's definitely something fishy going on here.

@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Nov 21, 2017

Here's a small autocontained reproducer:

package main

type Path struct {
	P []int
}

type ZMap map[*Path]struct{}

func F() {
	zm := make(ZMap)
	for i := 0; i < 10; i++ {
		pp := Path{make([]int, 0, 8)}
		zm[&pp] = struct{}{}
	}
}

func main() {
	for i := 0; i < 16; i++ {
		go func() {
			for true {
				F()
			}
		}()
	}
	select {}
}

It crashes fairly quickly (usually within one minute) when running GOMAXPROCS=4 go run test.go on two different machines I tried (both with go1.9.2, but I also see crashes on tip).

@chlunde

This comment has been minimized.

Copy link
Contributor

chlunde commented Nov 21, 2017

Is the map key freed prematurely when the only reference is within mapassign_fast64?

  • If I disable mapassign_fast64 it does not crash.
  • If I add runtime.KeepAlive after assigning to the map, it does not crash.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Nov 21, 2017

@chlunde Thanks. That makes this seem related to #21297.

CC @randall77

@chlunde

This comment has been minimized.

Copy link
Contributor

chlunde commented Nov 21, 2017

With 1.9 runtime.KeepAlive still crashes, but not on tip

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Nov 21, 2017

It does certainly seem similar to #21297. I have a core dump on a failure, and there is a map with a pointer key, where the referent has been GCd.
But #21297 has already been fixed. We always write the keys in question with the correct write barrier code.
My current suspicion is that it's the method signature of mapassign_fast64. The key is uint64 at that point, and if mapassign_fast64 is suspended mid-operation, maybe the key lives only there and as such can be GCd prematurely.

Maybe we need to separate the pointer and non-pointer versions of mapassign_fast64 and friends.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Nov 21, 2017

Looking at the generated code, the key pointer seems to be on the stack across the call to mapassign_fast64.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Nov 21, 2017

Ah, but according to the liveness information &pp is not live at the call to mapassign_fast64.

@aclements aclements modified the milestones: Go1.10, Go1.9.3 Nov 21, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Nov 21, 2017

@pciet, can you file a separate bug for the race detector false positive, if that remains an issue?

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 21, 2017

With https://go-review.googlesource.com/c/go/+/79018 patched onto the latest in the repo my application does not crash or race under the test load that triggered the crash initially. I'll run it overnight to be sure.

Thanks for the help, I'm looking forward to the next release and will stick with 1.8.5 until then.

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Nov 21, 2017

Confirmed that without the patch (go version devel +1e3f563) the race and crash happen readily. I'm ok to close this issue if the patch is accepted.

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Nov 22, 2017

Reopening for 1.9.3.

@oiooj

This comment has been minimized.

Copy link
Member

oiooj commented Jan 11, 2018

What version of Go are you using (go version)?

go version go1.9.2 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/data/golang"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build169412578=/tmp/go-build"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you see instead?

runtime: nelems=23 nalloc=3 previous allocCount=2 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x8ff942, 0x20)
        /usr/local/go/src/runtime/panic.go:605 +0x95
runtime.(*mspan).sweep(0x7fd0402fb5d8, 0xc420000f00, 0xc420000f00)
        /usr/local/go/src/runtime/mgcsweep.go:298 +0x911
runtime.sweepone(0x436c56)
        /usr/local/go/src/runtime/mgcsweep.go:113 +0x122
runtime.gosweepone.func1()
        /usr/local/go/src/runtime/mgcsweep.go:137 +0x2b
runtime.systemstack(0xc42001e600)
        /usr/local/go/src/runtime/asm_amd64.s:344 +0x79
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1135

goroutine 3 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:298 fp=0xc42002cf70 sp=0xc42002cf68 pc=0x45b7a0
runtime.gosweepone(0x0)
        /usr/local/go/src/runtime/mgcsweep.go:136 +0x4a fp=0xc42002cfa0 sp=0xc42002cf70 pc=0x422b8a
runtime.bgsweep(0xc42005c000)
        /usr/local/go/src/runtime/mgcsweep.go:55 +0xbd fp=0xc42002cfd8 sp=0xc42002cfa0 pc=0x4226ad
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc42002cfe0 sp=0xc42002cfd8 pc=0x45e3b1
created by runtime.gcenable
        /usr/local/go/src/runtime/mgc.go:216 +0x58

goroutine 1 [select (no cases), 2835 minutes]:
github.com/lodastack/agent/command.runStart(0xc4201ce0e0)
        /data/golang/src/github.com/lodastack/agent/command/start.go:58 +0x4e9
github.com/oiooj/cli.Command.Run(0x8e5395, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8f5684, 0xf, 0x8f878a, ...)
        /data/golang/src/github.com/oiooj/cli/command.go:118 +0x572
github.com/oiooj/cli.(*App).Run(0xc4201cc000, 0xc4200100c0, 0x4, 0x4, 0x0, 0x0)
        /data/golang/src/github.com/oiooj/cli/app.go:154 +0x55e
main.main()
        /data/golang/src/github.com/lodastack/agent/cmd/agent/main.go:66 +0x31d

goroutine 5 [sleep]:
time.Sleep(0x2540be400)
        /usr/local/go/src/runtime/time.go:65 +0x130
github.com/lodastack/agent/agent/sysinfo.init.0.func1()
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/collector.go:23 +0x36
created by github.com/lodastack/agent/agent/sysinfo.init.0
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/collector.go:15 +0x35

goroutine 6 [syscall, 2835 minutes]:
os/signal.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:131 +0xa6
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 9 [sleep]:
time.Sleep(0xb2d05e00)
        /usr/local/go/src/runtime/time.go:65 +0x130
github.com/lodastack/log.(*FileBackend).flushDaemon(0xc4201de160)
        /data/golang/src/github.com/lodastack/log/file.go:73 +0x32
created by github.com/lodastack/log.NewFileBackend
        /data/golang/src/github.com/lodastack/log/file.go:155 +0x4e7

goroutine 10 [chan receive]:
github.com/lodastack/log.(*FileBackend).monitorFiles(0xc4201de160)
        /data/golang/src/github.com/lodastack/log/file.go:79 +0x3ca
created by github.com/lodastack/log.NewFileBackend
        /data/golang/src/github.com/lodastack/log/file.go:156 +0x50c

goroutine 11 [chan receive]:
github.com/lodastack/agent/agent/outputs/nsq.(*NSQ).Write(0xc4201a20c0, 0xc42033a000)
        /data/golang/src/github.com/lodastack/agent/agent/outputs/nsq/nsq.go:37 +0x87
github.com/lodastack/agent/agent/outputs.(*Output).Start(0xc42000e0e0)
        /data/golang/src/github.com/lodastack/agent/agent/outputs/output.go:121 +0x10b
created by github.com/lodastack/agent/agent/agent.(*Agent).Start
        /data/golang/src/github.com/lodastack/agent/agent/agent/main.go:35 +0x8d

goroutine 12 [sleep, 15 minutes]:
time.Sleep(0x34630b8a000)
        /usr/local/go/src/runtime/time.go:65 +0x130
github.com/lodastack/agent/agent/scheduler.autoEnablePlugin()
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/plugin.go:94 +0xcc
github.com/lodastack/agent/agent/scheduler.Start()
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:52 +0x153
created by github.com/lodastack/agent/agent/agent.(*Agent).Start
        /data/golang/src/github.com/lodastack/agent/agent/agent/main.go:36 +0xa5

goroutine 19 [sleep]:
time.Sleep(0xdf8475800)
        /usr/local/go/src/runtime/time.go:65 +0x130
github.com/lodastack/agent/agent/scheduler.Start.func1()
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:48 +0x5c
created by github.com/lodastack/agent/agent/scheduler.Start
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:41 +0x14e

goroutine 13 [IO wait, 2835 minutes]:
internal/poll.runtime_pollWait(0x7fd048049df0, 0x72, 0xffffffffffffffff)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420180498, 0x72, 0xc42037ac00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc420180498, 0xffffffffffffff00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc420180480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e2
net.(*netFD).accept(0xc420180480, 0xc400000020, 0xc42037ade8, 0x414678)
        /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*UnixListener).accept(0xc42007db30, 0x4f505a, 0x8a5640, 0xc420392090)
        /usr/local/go/src/net/unixsock_posix.go:162 +0x32
net.(*UnixListener).Accept(0xc42007db30, 0xc420014038, 0x852da0, 0xd70960, 0x8d4a00)
        /usr/local/go/src/net/unixsock.go:241 +0x49
net/http.(*Server).Serve(0xc420362000, 0xd83620, 0xc42007db30, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2695 +0x1b2
net/http.Serve(0xd83620, 0xc42007db30, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2323 +0x73
github.com/lodastack/agent/agent/httpd.(*Service).serve(0xc420180380, 0xd83620, 0xc42007db30)
        /data/golang/src/github.com/lodastack/agent/agent/httpd/http.go:385 +0x54
github.com/lodastack/agent/agent/httpd.(*Service).serveUnixSocket(0xc420180380)
        /data/golang/src/github.com/lodastack/agent/agent/httpd/http.go:378 +0x3d
created by github.com/lodastack/agent/agent/httpd.(*Service).Start
        /data/golang/src/github.com/lodastack/agent/agent/httpd/http.go:362 +0x7a6

goroutine 14 [IO wait, 2116 minutes]:
internal/poll.runtime_pollWait(0x7fd048049f70, 0x72, 0xffffffffffffffff)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420180418, 0x72, 0xc420376c00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc420180418, 0xffffffffffffff00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc420180400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:335 +0x1e2
net.(*netFD).accept(0xc420180400, 0xc420376df0, 0x403787, 0xc42047ce40)
        /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc42000e0e8, 0x6a8bb8, 0x45ab10, 0xc420376e38)
        /usr/local/go/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).Accept(0xc42000e0e8, 0x9116b0, 0xc42047cdc0, 0xd84160, 0xc42019c0c0)
        /usr/local/go/src/net/tcpsock.go:247 +0x49
net/http.(*Server).Serve(0xc420380000, 0xd835e0, 0xc42000e0e8, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2695 +0x1b2
net/http.Serve(0xd835e0, 0xc42000e0e8, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2323 +0x73
github.com/lodastack/agent/agent/httpd.(*Service).serve(0xc420180380, 0xd835e0, 0xc42000e0e8)
        /data/golang/src/github.com/lodastack/agent/agent/httpd/http.go:385 +0x54
github.com/lodastack/agent/agent/httpd.(*Service).serveTCP(0xc420180380)
        /data/golang/src/github.com/lodastack/agent/agent/httpd/http.go:373 +0x3c
created by github.com/lodastack/agent/agent/httpd.(*Service).Start
        /data/golang/src/github.com/lodastack/agent/agent/httpd/http.go:366 +0x4f4

goroutine 15 [chan receive, 3 minutes]:
github.com/lodastack/agent/agent/agent.(*Agent).Report(0xc4200e3700)
        /data/golang/src/github.com/lodastack/agent/agent/agent/report.go:32 +0xcf
created by github.com/lodastack/agent/agent/agent.(*Agent).Start
        /data/golang/src/github.com/lodastack/agent/agent/agent/main.go:40 +0xfc

goroutine 16 [chan receive, 2835 minutes]:
github.com/lodastack/agent/command.Notify()
        /data/golang/src/github.com/lodastack/agent/command/start.go:75 +0xe0
created by github.com/lodastack/agent/command.runStart
        /data/golang/src/github.com/lodastack/agent/command/start.go:56 +0x4e4

goroutine 34 [select, 2835 minutes, locked to thread]:
runtime.gopark(0x911d00, 0x0, 0x8e6268, 0x6, 0x18, 0x1)
        /usr/local/go/src/runtime/proc.go:287 +0x12c
runtime.selectgo(0xc42003ff50, 0xc4200761e0)
        /usr/local/go/src/runtime/select.go:395 +0x1149
runtime.ensureSigM.func1()
        /usr/local/go/src/runtime/signal_unix.go:511 +0x220
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2337 +0x1

goroutine 39 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017b8f0)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 38 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017b890)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 37 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017b830)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 40 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017b950)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 41 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017b9b0)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 42 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017ba10)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 43 [IO wait]:
internal/poll.runtime_pollWait(0x7fd048049bb0, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4201d8018, 0x72, 0xffffffffffffff00, 0xd7fa20, 0xd7b348)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc4201d8018, 0xc420652000, 0x30, 0x30)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc4201d8000, 0xc420652060, 0x30, 0x30, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18a
net.(*netFD).Read(0xc4201d8000, 0xc420652060, 0x30, 0x30, 0x28, 0xc4205b60d8, 0x1)
        /usr/local/go/src/net/fd_unix.go:202 +0x52
net.(*conn).Read(0xc4203a0008, 0xc420652060, 0x30, 0x30, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:176 +0x6d
io.ReadAtLeast(0xd7d860, 0xc4203a0008, 0xc420652060, 0x30, 0x30, 0x30, 0x8347e0, 0x8c7c01, 0xc420652060)
        /usr/local/go/src/io/io.go:309 +0x86
io.ReadFull(0xd7d860, 0xc4203a0008, 0xc420652060, 0x30, 0x30, 0x30, 0x0, 0x0)
        /usr/local/go/src/io/io.go:327 +0x58
encoding/binary.Read(0xd7d860, 0xc4203a0008, 0xd88080, 0x12fce70, 0x85e080, 0xc420652000, 0x0, 0x0)
        /usr/local/go/src/encoding/binary/binary.go:245 +0xf99
github.com/lodastack/agent/agent/sysinfo.getTime(0x8eecc5, 0xc, 0x4, 0x0, 0x0, 0x0)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/time.go:194 +0x267
github.com/lodastack/agent/agent/sysinfo.Query(0x8eecc5, 0xc, 0x4, 0xc420528780, 0xc420041f60, 0xc42040b0e0)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/time.go:139 +0x5a
github.com/lodastack/agent/agent/sysinfo.TimeMetrics(0x2, 0xc420041fc4, 0x2)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/time.go:120 +0x6b
github.com/lodastack/agent/agent/sysinfo.Collector.Run(0x8e4011, 0x4, 0x12c, 0xc4203d6050, 0x1, 0x1)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/collector.go:37 +0x101
github.com/lodastack/agent/agent/sysinfo.(*Collector).Run(0xc42017ba40)
        <autogenerated>:1 +0x4b
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017ba70)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:58 +0x10c
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 44 [semacquire]:
sync.runtime_Semacquire(0xc42058402c)
        /usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc420584020)
        /usr/local/go/src/sync/waitgroup.go:131 +0x72
github.com/lodastack/agent/agent/sysinfo.PcapMetric(0x2, 0xc420466fc4, 0x2)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/pcap.go:120 +0x4d7
github.com/lodastack/agent/agent/sysinfo.Collector.Run(0x8e38db, 0x3, 0x12c, 0xc4203d6060, 0x1, 0x1)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/collector.go:37 +0x101
github.com/lodastack/agent/agent/sysinfo.(*Collector).Run(0xc42017baa0)
        <autogenerated>:1 +0x4b
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017bad0)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:58 +0x10c
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 45 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017bb00)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 46 [select]:
github.com/lodastack/agent/agent/scheduler.(*Scheduler).run(0xc42017bb30)
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/struct.go:56 +0x1c4
created by github.com/lodastack/agent/agent/scheduler.updateSys
        /data/golang/src/github.com/lodastack/agent/agent/scheduler/scheduler.go:166 +0x3b2

goroutine 649679 [select]:
net/http.(*persistConn).writeLoop(0xc4201c2000)
        /usr/local/go/src/net/http/transport.go:1759 +0x165
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1187 +0xa53

goroutine 649678 [IO wait]:
internal/poll.runtime_pollWait(0x7fd048049eb0, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420632698, 0x72, 0xffffffffffffff00, 0xd7fa20, 0xd7b348)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc420632698, 0xc4208b2000, 0x2000, 0x2000)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420632680, 0xc4208b2000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18a
net.(*netFD).Read(0xc420632680, 0xc4208b2000, 0x2000, 0x2000, 0xc420184380, 0x7fd049eaf458, 0x0)
        /usr/local/go/src/net/fd_unix.go:202 +0x52
net.(*conn).Read(0xc42039a010, 0xc4208b2000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:176 +0x6d
crypto/tls.(*block).readFromUntil(0xc4208142d0, 0x7fd042bc2088, 0xc42039a010, 0x5, 0xc42039a010, 0xc4208886c0)
        /usr/local/go/src/crypto/tls/conn.go:488 +0x95
crypto/tls.(*Conn).readRecord(0xc420031180, 0x911e17, 0xc4200312a0, 0x430c3b)
        /usr/local/go/src/crypto/tls/conn.go:590 +0xe0
crypto/tls.(*Conn).Read(0xc420031180, 0xc4204a1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:1134 +0x110
net/http.(*persistConn).Read(0xc4201c2000, 0xc4204a1000, 0x1000, 0x1000, 0xc4201b6780, 0xc420076658, 0x4584e0)
        /usr/local/go/src/net/http/transport.go:1391 +0x140
bufio.(*Reader).fill(0xc42005e780)
        /usr/local/go/src/bufio/bufio.go:97 +0x11a
bufio.(*Reader).Peek(0xc42005e780, 0x1, 0x0, 0x0, 0x0, 0xc420635da0, 0x0)
        /usr/local/go/src/bufio/bufio.go:129 +0x3a
net/http.(*persistConn).readLoop(0xc4201c2000)
        /usr/local/go/src/net/http/transport.go:1539 +0x185
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1186 +0xa2e

goroutine 650188 [syscall, locked to thread]:
github.com/google/gopacket/pcap._Cfunc_pcap_close(0x7fd03c000b20)
        github.com/google/gopacket/pcap/_obj/_cgo_gotypes.go:221 +0x41
github.com/google/gopacket/pcap.(*Handle).Close.func1(0x7fd03c000b20)
        /data/golang/src/github.com/google/gopacket/pcap/pcap.go:468 +0x60
github.com/google/gopacket/pcap.(*Handle).Close(0xc420576000)
        /data/golang/src/github.com/google/gopacket/pcap/pcap.go:468 +0xc7
github.com/lodastack/agent/agent/sysinfo.PcapMetric.func1(0xc420584020, 0xc420566000, 0xc420584014, 0x4, 0xc420584090, 0xb)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/pcap.go:107 +0x5e3
created by github.com/lodastack/agent/agent/sysinfo.PcapMetric
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/pcap.go:82 +0x436

goroutine 650189 [syscall, locked to thread]:
github.com/google/gopacket/pcap._Cfunc_pcap_close(0x7fd0000009d0)
        github.com/google/gopacket/pcap/_obj/_cgo_gotypes.go:221 +0x41
github.com/google/gopacket/pcap.(*Handle).Close.func1(0x7fd0000009d0)
        /data/golang/src/github.com/google/gopacket/pcap/pcap.go:468 +0x60
github.com/google/gopacket/pcap.(*Handle).Close(0xc420172410)
        /data/golang/src/github.com/google/gopacket/pcap/pcap.go:468 +0xc7
github.com/lodastack/agent/agent/sysinfo.PcapMetric.func1(0xc420584020, 0xc420566000, 0xc420584018, 0x4, 0xc4205840f0, 0xe)
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/pcap.go:107 +0x5e3
created by github.com/lodastack/agent/agent/sysinfo.PcapMetric
        /data/golang/src/github.com/lodastack/agent/agent/sysinfo/pcap.go:82 +0x436

not use unsafe and cgo

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Jan 11, 2018

This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector .

@oiooj

This comment has been minimized.

Copy link
Member

oiooj commented Jan 11, 2018

@davecheney I run go build -race myapp and nothing happened. What need I do?

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Jan 11, 2018

@pciet

This comment has been minimized.

Copy link
Contributor

pciet commented Jan 11, 2018

@oiooj if this is the same issue then it isn't present in 1.8.x nor in the 1.10 beta (download here https://groups.google.com/forum/#!topic/golang-dev/SCPpL9PqVbs).

If you see the crash with 1.10 beta or 1.8.5 then please open a new issue.

@andybons

This comment has been minimized.

Copy link
Member

andybons commented Jan 18, 2018

CL 88635 OK for Go 1.9.3.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Jan 19, 2018

Change https://golang.org/cl/88635 mentions this issue: [release-branch.go1.9] cmd/compile: fix mapassign_fast* routines for pointer keys

gopherbot pushed a commit that referenced this issue Jan 22, 2018

[release-branch.go1.9] cmd/compile: fix mapassign_fast* routines for …
…pointer keys

The signature of the mapassign_fast* routines need to distinguish
the pointerness of their key argument.  If the affected routines
suspend part way through, the object pointed to by the key might
get garbage collected because the key is typed as a uint{32,64}.

This is not a problem for mapaccess or mapdelete because the key
in those situations do not live beyond the call involved.  If the
object referenced by the key is garbage collected prematurely, the
code still works fine.  Even if that object is subsequently reallocated,
it can't be written to the map in time to affect the lookup/delete.

Fixes #22781

Change-Id: I0bbbc5e9883d5ce702faf4e655348be1191ee439
Reviewed-on: https://go-review.googlesource.com/79018
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-on: https://go-review.googlesource.com/88635
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@andybons

This comment has been minimized.

Copy link
Member

andybons commented Jan 22, 2018

go1.9.3 has been packaged and includes:

  • CL 88635 [release-branch.go1.9] cmd/compile: fix mapassign_fast* routines for pointer keys

The release is posted at golang.org/dl.

— golang.org/x/build/cmd/releasebot, Jan 22 21:02:56 UTC

@andybons andybons closed this Jan 22, 2018

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Feb 13, 2018

Change https://golang.org/cl/93715 mentions this issue: test: fast GC+concurrency+types verification

gopherbot pushed a commit that referenced this issue May 8, 2018

test: fast GC+concurrency+types verification
This test runs independent goroutines modifying a comprehensive variety
of local vars to look for garbage collector regressions. This test has
been verified to trigger issue 22781 on the go1.9.2 tag. This test
expands on test/fixedbugs/issue22781.go.

Tests #22781

Change-Id: Id32f8dde7ef650aea1b1b4cf518e6d045537bfdc
Reviewed-on: https://go-review.googlesource.com/93715
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment