Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: segfault on NIL Rows.releaseConn() when context is cancelled #20160

Closed
echa opened this issue Apr 28, 2017 · 2 comments
Closed

database/sql: segfault on NIL Rows.releaseConn() when context is cancelled #20160

echa opened this issue Apr 28, 2017 · 2 comments
Assignees

Comments

@echa
Copy link

@echa echa commented Apr 28, 2017

I'm seeing reproducible segmentation faults in func (rs *Rows) close(err error) error after an outside context has been canceled (e.g. due to timeout). At current master it's this line.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x40da61f]

goroutine 10962 [running]:
database/sql.(*Rows).close(0xc421780600, 0x44ae840, 0x44e9200, 0x0, 0x0)
	/usr/local/Cellar/go/1.8.1/libexec/src/database/sql/sql.go:2445 +0x15f
database/sql.(*Rows).awaitDone(0xc421780600, 0x44b1240, 0xc421e19040)
	/usr/local/Cellar/go/1.8.1/libexec/src/database/sql/sql.go:2122 +0x8e
created by database/sql.(*Rows).initContextClose
	/usr/local/Cellar/go/1.8.1/libexec/src/database/sql/sql.go:2116 +0x92
exit status 2

This looks like a race condition where the Rows.releaseConn function pointer was either never set or reset to NIL before Rows.awaitDone(ctx) got scheduled. The same bug surfaces when using lib/pg or sqlite3 as drivers. Pq at least never touches sql.Rows.releaseConn, so I assume the problem is in database/sql.

I tested with GOMAXPROCS=4 and a connection limit of 20 (idle and open are set to the same value), but the segfault also happens with GOMAXPROCS=1 and other limits. Note that sqlite3 will crash somewhere else when you don't set connection limits in the test below.

An interesting observation is that the bug seems to coincide with the use of prepared statements no matter if a statement is reused across connections or limited to a DB connection and closed right after use. I havent' seen the bug surface without prepared statements, but maybe I was just not patient enough. Here's how different types of queries behave:

Query Type lib/pg mattn/go-sqlite3
stmt.QueryContext() fast crash fast crash
stmt.QueryRowContext() eventual crash eventual crash
stmt.QueryContext() + stmt.Close() eventual crash eventual crash
db.QueryContext() no crash seen no crash seen

Here's how to reproduce the issue:

package main

import (
	"context"
	"database/sql"
	_ "github.com/lib/pq"
	_ "github.com/mattn/go-sqlite3"
	"log"
	"math/rand"
	"runtime"
	"sync"
	"time"
)

const (
	// DRIVER = "postgres"
	// DB     = "postgres://postgres:postgres@127.0.0.1:5432/postgres?sslmode=disable"
	DRIVER = "sqlite3"
	DB     = "file::memory:?mode=memory&cache=shared"
)

var (
	db      *sql.DB
	stmtMap map[string]*sql.Stmt
	mu      *sync.RWMutex
	jobs    chan struct{}
)

func prepare(ctx context.Context, query string) (*sql.Stmt, error) {
	mu.RLock()
	stmt, ok := stmtMap[query]
	mu.RUnlock()
	if ok {
		return stmt, nil
	}
	stmt, err := db.PrepareContext(ctx, query)
	if err != nil {
		return nil, err
	}
	mu.Lock()
	stmtMap[query] = stmt
	mu.Unlock()
	return stmt, nil
}

func queryStmt(ctx context.Context) error {
	stmt, err := prepare(ctx, "SELECT 1;")
	if err != nil {
		return err
	}
	rows, err := stmt.QueryContext(ctx)
	if err != nil {
		return err
	}
	rows.Close()
	return nil
}

func queryShortStmt(ctx context.Context) error {
	stmt, err := db.PrepareContext(ctx, "SELECT 1;")
	if err != nil {
		return err
	}
	rows, err := stmt.QueryContext(ctx)
	if err != nil {
		return err
	}
	rows.Close()
	stmt.Close()
	return nil
}

func queryRowStmt(ctx context.Context) error {
	stmt, err := prepare(ctx, "SELECT 1;")
	if err != nil {
		return err
	}
	var v int
	err = stmt.QueryRowContext(ctx).Scan(&v)
	if err != nil {
		return err
	}
	return nil
}

func query(ctx context.Context) error {
	rows, err := db.QueryContext(ctx, "SELECT 1;")
	if err != nil {
		return err
	}
	rows.Close()
	return nil
}

func handler(num int) {
	timeout := time.Duration(rand.Intn(100)) * time.Millisecond
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()
	queryStmt(ctx) // <-- segfaults almost immediately
	// queryRowStmt(ctx) // <-- segfaults eventually, just be patient
	// queryShortStmt(ctx) // <-- segfaults eventually, just be patient
	// query(ctx) // <-- this one seems to behave correct
}

func main() {
	jobs = make(chan struct{}, 100)
	stmtMap = make(map[string]*sql.Stmt)
	mu = new(sync.RWMutex)
	rand.Seed(1)
	runtime.GOMAXPROCS(4)

	var err error
	db, err = sql.Open(DRIVER, DB)
	if err != nil {
		log.Fatalln(err)
	}
	db.SetMaxIdleConns(20)
	db.SetMaxOpenConns(20)

	// dispatcher
	var i int
	go func() {
		for {
			select {
			case <-jobs:
				i++
				go handler(i)
			}
		}
	}()

	// keep the job queue filled
	for {
		jobs <- struct{}{}
	}
}

For a little more context, I inserted a debug statement in sql.go right before the problematic line (returning from here proved to be a quick workaround, but I'm not sure of the consequences).

if rs.releaseConn == nil {
	fmt.Printf("NIL Rows.releaseConn() row %+v with driver conn %+v\n", rs, rs.dc)
}

and got this output (formatted)

NIL Rows.releaseConn() row &{
	dc:0xc4201be380 
	releaseConn:<nil> 
	rowsi:0xc42005b420 
	cancel:0x10ab2c0 
	closeStmt:<nil> 
	closemu:{
		w:{
			state:1 
			sema:0
		} 
		writerSem:0 
		readerSem:0 
		readerCount:-1073741824 
		readerWait:0
	} 
	closed:true 
	lasterr:{} 
	lastcols:[]
} 

with driver conn &{
	db:0xc4200940a0 
	createdAt:{
		sec:63628968112 
		nsec:642549333 
		loc:0x1324dc0
	} 
	Mutex:{state:0 sema:0} 
	ci:0xc420092f00 
	closed:false 
	finalClosed:false 
	openStmt:map[0xc420394b80:true 0xc420146a40:true 0xc42146a240:true 0xc4216b9280:true 0xc420f7ac40:true 0xc42084f640:true 0xc4214acf40:true 0xc421570b40:true 0xc42110bac0:true 0xc4211ceb00:true 0xc42017f000:true 0xc4216b8c00:true 0xc4200c8300:true 0xc420f8e000:true 0xc42110adc0:true 0xc421571c80:true] 
	inUse:true 
	onPut:[] 
	dbmuClosed:false
}

I was surprised to see that many open statements on the connection, in particular, because I had prepared and reused a single statement across all connections without closing it. However, this looks like a separate issue to me.

TL;TR

What version of Go are you using (go version)?

go version go1.8.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/echa/GIT/go:/Users/echa/GIT/trimmer/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.8.1/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.8.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/hk/4x544vj51gvgy53xhndsjjm40000gn/T/go-build098454006=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I ran the program above.

What did you expect to see?

No output, GOMAXPROCS * 100% CPU usage until killed.

What did you see instead?

A segmentation fault.

@kardianos
Copy link
Contributor

@kardianos kardianos commented Apr 28, 2017

Thanks for the detailed and reproducible error report. I'm impressed.

In func (s *Stmt) QueryContext, the "releaseConn" is currently set on rows after initContextClose is called, leading to this issue. There seems to be a separate SQLite wrapper issue that reports a double close (probably first from ctx close, then from driver).

I was surprised to see that many open statements on the connection, in particular, because I had prepared and reused a single statement across all connections without closing it. However, this looks like a separate issue to me.

This is a common misconception. A single statement will still pull an arbitrary connection from the connection pool, and if not prepared on that connection, re-prepare it. It is unfortunate that is assumes all databases tie statements to connections, but it does (this is sometimes true, but not for all). In other words, I would expect to see as many as 20 prepare commands issued per statement in this example.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 28, 2017

CL https://golang.org/cl/42139 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.