Spurious conn busy - bug in QueryRow() ? #635

kmansoft · 2019-11-07T07:33:16Z

Hello,

I've just started using your library, pretty impressed, esp. with JSON handling.

I'm seeing spurious conn busy errors on my development machine. Once they start, they never stop until I restart the app.

The app uses pgx basically from two modules, one "does something" and stores "something" into pgsql, and another is an http API which fetches this data from pgsql. The failures always occur on the http side (which obviously uses separate coroutines from the first, "does something" part).

How can I track down what's causing these errors? I already checked my code for any leaked row sets or transactions, there don't seem to be none.

Second, related to the above. What is the default connection pool size if I create my connection like this?

pgsqlConfig, err := pgx.ParseConfig("postgres://user:pass@localhost:5432/db")
if err != nil {
	...
}

pgsqlConnection, err := pgx.ConnectConfig(context.Background(), pgsqlConfig)
if err != nil {
	...
}

And how can I change it (connection pool size)?

The text was updated successfully, but these errors were encountered:

kmansoft · 2019-11-07T13:21:51Z

More info:

The query that fails for me, and it was the first query executed on a particular code path, is an UPSERT like this:

_, err := model.db.Exec(context.Background(),
	`INSERT INTO device (key, push_tech, push_token)
		VALUES($1, $2, $3) 
	ON CONFLICT (key) DO UPDATE
		SET push_tech = $2, push_token = $3`,
	key, pushTech, pushToken)
if err != nil {
	return err
}

fmt.Printf("Upserted device: %s\n", key)

return nil

I added a query just above that, like this:

rows, err := model.db.Query(context.Background(),
	`SELECT _id, key, push_tech, push_token
	FROM device WHERE key = $1`, key)
if err != nil {
	return err
}
defer rows.Close()

if rows.Next() {
	device := mail.SyncDbDevice{}
	err = rows.Scan(&device.Id, &device.Key, &device.PushTech, &device.PushToken)
	if err != nil {
		return err
	}

	fmt.Printf("Selected device: %d, %s, %s, %s\n", device.Id, device.Key, device.PushTech, device.PushToken)
}

return nil

So it looks like this:

http_handler() {
query() // DEBUG
upsert()
.....
}

Whenever the upsert starts to fail, the query just above it executes just fine, no errors.

I suppose the upsert is different from the select in that it (potentially) modifies the database, while a select does not. Still, not sure how to solve the conn busy error, would appreciate help.

kmansoft · 2019-11-12T18:22:10Z

I did some debugging, adding trace printouts to PgConn.lock and PgConn.unlock.

I believe there is a bug in Conn.QueryRow - it calls Query behind the scenes which locks the connection to read the data, but I'm unable to find where QueryRow calls rows.Close(). Normally that's the pattern I understand: rows := conn.Query()... rows.Next() ... rows.Close() and rows.Close() unlocks the connection.

But QueryRow doesn't seem to Close the rows. Perhaps I've missed it, or perhaps it really is missing.

Please comment if I'm wrong, meanwhile I'll try switching from QueryRow to Query to see if the issue goes away.

Edit: don't mind this comment. The simple reproducer case below fails with Query() the same way it fails with QueryRow() - i.e. with "conn busy".

kmansoft · 2019-11-12T18:48:12Z

One more thing re: my earlier comment on Query continuing to work even when Upsert starting throwing "conn busy".

As it turns out, Query is missing error propagation when pgConn.lock() fails (with "conn busy").

The call stack: Query -> c.pgConn.ExecPrepared -> pgConn.execExtendedPrefix -> if err := pgConn.lock(); err != nil fails but the original Query call does not return this error, instead rows.Next silently fails (returns false).

kmansoft · 2019-11-12T18:56:21Z

A simple reproducer:

wg := sync.WaitGroup{}
wg.Add(10)
for coIter := 0; coIter < 10; coIter += 1 {
	go func() {
		for nowIter := 0; nowIter < 100; nowIter += 1 {
			var nowValue time.Time
			nowError := pgsqlConnection.QueryRow(context.Background(),
				"SELECT NOW()").Scan(&nowValue)
			if nowError != nil {
				fmt.Printf("QueryRow returned error %v\n", nowError)
				break
			}
			fmt.Printf("PG now() = %s\n", nowValue)
		}
		wg.Done()
	}()
}
wg.Wait()

You should see QueryRow return error conn busy - on my machine it happens 2/3 of the runs.

It also can throw a different error:

QueryRow returned error ERROR: bind message has 2 result formats but query has 1 columns (SQLSTATE 08P01)

Maybe something goes wrong in buffer / parameter management when the conn busy error is present.

durango · 2019-11-14T01:07:00Z

@kmansoft submitted a PR that should fix your issue. In the mean time, my current workaround was to acquire the pool manually for each query and then release it manually.

kmansoft · 2019-11-14T06:31:33Z

Thank you @durango I'll be looking forward to it getting merged.

In the meantime I'm wondering if I should switch from pgx connection to pgx pool. My simple test passes with pgxpool with no errors.

Edit: no, pgxpool also fails, please see below.

https://godoc.org/github.com/jackc/pgx/pgxpool

But coming from GO's "sql" package, I was expecting pgx to perform pooled connection management behind the scenes. Is this assumption wrong? Should I always be using a pgxpool in a multi-coroutine setup? Or is that not necessary? A pgx API design question really.

kmansoft · 2019-11-14T09:32:36Z

About pgxpool. It also fails on concurrent workloads.

pgsqlConfig, err := pgxpool.ParseConfig("postgres://user:pass@localhost:5432/database")
if err != nil {
	fmt.Println(err)
	return
}

pgxPool, err := pgxpool.ConnectConfig(context.Background(), pgsqlConfig)
if err != nil {
	fmt.Println(err)
	return
}

defer func() {
	fmt.Println("Disconnecting from pgsql")
	pgxPool.Close()
}()

count := int64(0)
wg := sync.WaitGroup{}
wg.Add(10)
for coIter := 0; coIter < 20; coIter += 1 {
	go func() {
		for nowIter := 0; nowIter < 1000; nowIter += 1 {
			var nowValue time.Time
			nowError := pgxPool.QueryRow(context.Background(),
				"SELECT NOW()").Scan(&nowValue)
			if nowError != nil {
				fmt.Printf("QueryRow returned error %v\n", nowError)
				os.Exit(1)
			}
			fmt.Printf("PG now() = %s\n", nowValue)
			atomic.AddInt64(&count, 1)
		}
		wg.Done()
	}()
}
wg.Wait()
fmt.Printf("Total queries run: %d\n", count)

Result:

QueryRow returned error closed pool

The trick to triggering is that the number of coroutines needs to be 20 (10 works with no error).

durango · 2019-11-14T13:03:07Z

I was using pgxpool with Query and QueryRow and still hitting the same problem as you (also noted about the 10 vs 20 goroutines on my own so our experience seems to be very much he same). My fix wont work because the pool gets released to early for QueryRow unfortunately.

durango · 2019-11-14T16:25:06Z

@kmansoft my current working solution is to drop down to pgx's stdlib and use stdlib.AcquireConn(*sql.DB) within goroutines (then defer close that connection). So far this has been working well in production

kmansoft · 2019-11-14T16:29:48Z

@durango thanks for the tip on AcquireConn... let's see what @jackc says about the issues with higher level APIs...

jackc · 2019-11-16T16:04:28Z

Sorry, it took me a while to get to this. There are a number of questions that have come up in this issue and I'll try to answer them all in this post.

What is the default connection pool size

4 (https://github.com/jackc/pgx/blob/master/pgxpool/pool.go#L15)

And how can I change it (connection pool size)?

Set MaxConns in the pgxpool.Config struct or set it in the connection string with the pool_max_conns argument. (https://github.com/jackc/pgx/blob/master/pgxpool/pool.go#L183)

But QueryRow doesn't seem to Close the rows. Perhaps I've missed it, or perhaps it really is missing.

The underlying query started by QueryRow is closed when Scan is called on the returned Row. (https://github.com/jackc/pgx/blob/master/rows.go#L87)

But coming from GO's "sql" package, I was expecting pgx to perform pooled connection management behind the scenes. Is this assumption wrong? Should I always be using a pgxpool in a multi-coroutine setup? Or is that not necessary? A pgx API design question really.

The pgx package itself is strictly connection based. If you want a pool you need to use pgxpool (or some other syncronizatin / pooling mechanism). From the docs for pgx.Conn (https://godoc.org/github.com/jackc/pgx#Conn) "Conn is a PostgreSQL connection handle. It is not safe for concurrent usage.".

@kmansoft Regarding the failing test case with pgxpool: The reason it is failing is the line wg.Add(10). You tell the wait group to wait for 10 goroutines to complete, but then start 20. So as soon as 10 or the 20 goroutines complete the main thread blocked on wg.Wait() continues and the defer that closes the pool executes. But there are still 10 goroutines trying to use the now closed pool. This explains why you are seeing a closed pool error. If you correct that is works.

As an aside, it would be less error prone to do a wg.Add(1) inside of the loop instead of wg.Add(20) on the outside. Then there is only one place to change the number of workers.

durango · 2019-11-16T16:19:04Z

For the record, I used max pool settings this entire time (ranged from 20 - 400). Checked long-living queries [> 5 minutes] (there were none).

The underlying query started by QueryRow is closed when Scan is called on the returned Row. (https://github.com/jackc/pgx/blob/master/rows.go#L87)

Saw this yesterday actually, forgot to close my PR / was trying to resolve my pgx issues.

Tbh, I haven't really found a working solution for pgx other than using it as a connection driver. I had to drop down to *sql.DB and *sql.Conn. Something, somewhere, down the line connections were not being released despite closing/releasing them manually.

Fwiw I was also using v4 of pgx and have noticed less problems on v3 for pooling, but that could just be biases in observations. I've also noticed less problems when I stopped using CopyFrom and manually inserted. No code examples (since I've moved on / don't have a git history of this development) but I promise I wasn't trying anything fancier than what appears in the test files (other than the stdlib.Acquire part but that was after).

@kmansoft by any chance, are you connecting to a pgbouncer server?

kmansoft · 2019-11-16T17:05:03Z

@jackc Thank you for pointing out that "plain" pgx is not intended to be used concurrently. My bad.

With wg in the pool test fixed the test now runs without issues. Thanks again.

@durango no, not pgbouncer, just "plain" pg.

sebnyberg · 2019-11-28T18:42:03Z

Ran into the same issue when closing the acquired connection from stdlib but not properly closing pgx.BatchResults after running SendBatch.

Zamiell · 2020-05-16T03:30:36Z

This was a good read, thanks.

For anyone coming across this thread from Google, the TLDR here is that in PostgreSQL land, you need to use either:

pgxpool
or PgBouncer

on top of pgx, or else you will get the dreaded "conn busy" error. This is because, as jackc mentions above, the default connection size is 4. And increasing the connection size it is a bad idea because vanilla pgx is only designed to be used by one process/thread at a time.

pgxpool is recommended if you have one just app using the same database. Using it is simple: https://github.com/Zamiell/hanabi-live/blob/8246439be9f5521c90bdec46ee28e2e4b2b389ec/server/src/models.go#L80-L87
PgBouncer is recommended if you have 2+ apps using the same database.

These two blogs are recommended reading if you want to know more:

warent · 2020-09-03T19:53:24Z

It's embarrassing to admit this but I want to share why this problem happened to me in case anyone else encounters it. I have an HTTP server and was trying to re-use the same connection for all requests. If you're doing this, be sure to use a single root/main-level pool, and then acquire a new connection from the pool per incoming http request (or, if you're using GraphQL, a connection per resolver / dataloader).

Using the pgxpool package to create a connection pool is safer than using a single connection, because the DB functions are concurrently accessed from multiple goroutines. jackc/pgx#635 (comment)

torlarse · 2020-11-17T20:43:20Z

@warent do you return a pointer to the connection from the single main-level pool connect function, or use a global var in a separate database package?

warent · 2020-11-17T20:52:25Z

You'll probably want to use a global pool var in a separate database package, and create connections from that wherever you need them.

I'm using GraphQL so my http handler and my pool are in the same file (main.go). When the graphql endpoint is called, I put a pointer to the pool in ctx which is passed into the resolvers, and then every graphql resolver creates its own connection as needed.

Both approaches seem to work fine but in retrospect a global variable would probably be cleaner.

jackc/pgx#635 Signed-off-by: Alexander Scheel <alexander.m.scheel@gmail.com>

bfontaine · 2022-11-07T10:02:16Z

Does it make sense to keep this issue open, or can we close it?

ringerc · 2023-08-10T23:10:44Z

Per my comment #750 (comment) , take care before just blindly adopting pgbouncer as a proxy or reverse proxy between pgx and postgres. In session-pooling mode it won't help you much, and in transaction-pooling mode it changes the semantics of many postgres operations in ways that'll be very important for you to account for in your app's database usage. Even worse in statement-pooling mode, which is best avoided whenever possible.

If pgx works like other drivers in row-at-a-time mode, apps should try to consume the result-set promptly after a query, then close the result-set. The app should defer doing any expensive work (RPCs and other network I/O, nontrivial disk I/O, acquiring contended locks, etc) until after it reads the result and closes the result-set. Doing that should be enough to solve most problems unless the app is highly parallel and database-heavy.

kmansoft changed the title ~~Spurious conn busy~~ Spurious conn busy - bug in QueryRow() ? Nov 12, 2019

durango mentioned this issue Nov 14, 2019

Releases pool connections for Pool.Query and Pool.QueryRow #636

Closed

PaperPrototype mentioned this issue Dec 10, 2021

"conn busy" error, help please [SOLVED] ;) #1124

Closed

cipherboy added a commit to WillowPatchGames/wpg that referenced this issue Feb 16, 2022

Prevent nested queries to handle pgx bug:

9725374

jackc/pgx#635 Signed-off-by: Alexander Scheel <alexander.m.scheel@gmail.com>

seybi87 mentioned this issue Apr 26, 2022

Error while running queries against CrateDB: panic: conn busy timescale/tsbs#201

Open

RainbowDashy mentioned this issue Jul 4, 2022

chore: use linter rowserrcheck and sqlclosecheck and fix errors bytebase/bytebase#1721

Closed

jofmi mentioned this issue Dec 13, 2023

Fix image upload settings augustin-wien/augustina-backend#135

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spurious conn busy - bug in QueryRow() ? #635

Spurious conn busy - bug in QueryRow() ? #635

kmansoft commented Nov 7, 2019

kmansoft commented Nov 7, 2019 •

edited

Loading

kmansoft commented Nov 12, 2019 •

edited

Loading

kmansoft commented Nov 12, 2019 •

edited

Loading

kmansoft commented Nov 12, 2019 •

edited

Loading

durango commented Nov 14, 2019

kmansoft commented Nov 14, 2019 •

edited

Loading

kmansoft commented Nov 14, 2019

durango commented Nov 14, 2019

durango commented Nov 14, 2019

kmansoft commented Nov 14, 2019

jackc commented Nov 16, 2019

durango commented Nov 16, 2019

kmansoft commented Nov 16, 2019 •

edited

Loading

sebnyberg commented Nov 28, 2019

Zamiell commented May 16, 2020 •

edited

Loading

warent commented Sep 3, 2020

torlarse commented Nov 17, 2020 •

edited

Loading

warent commented Nov 17, 2020 •

edited

Loading

bfontaine commented Nov 7, 2022

ringerc commented Aug 10, 2023

Spurious conn busy - bug in QueryRow() ? #635

Spurious conn busy - bug in QueryRow() ? #635

Comments

kmansoft commented Nov 7, 2019

kmansoft commented Nov 7, 2019 • edited Loading

kmansoft commented Nov 12, 2019 • edited Loading

kmansoft commented Nov 12, 2019 • edited Loading

kmansoft commented Nov 12, 2019 • edited Loading

durango commented Nov 14, 2019

kmansoft commented Nov 14, 2019 • edited Loading

kmansoft commented Nov 14, 2019

durango commented Nov 14, 2019

durango commented Nov 14, 2019

kmansoft commented Nov 14, 2019

jackc commented Nov 16, 2019

durango commented Nov 16, 2019

kmansoft commented Nov 16, 2019 • edited Loading

sebnyberg commented Nov 28, 2019

Zamiell commented May 16, 2020 • edited Loading

warent commented Sep 3, 2020

torlarse commented Nov 17, 2020 • edited Loading

warent commented Nov 17, 2020 • edited Loading

bfontaine commented Nov 7, 2022

ringerc commented Aug 10, 2023

kmansoft commented Nov 7, 2019 •

edited

Loading

kmansoft commented Nov 12, 2019 •

edited

Loading

kmansoft commented Nov 12, 2019 •

edited

Loading

kmansoft commented Nov 12, 2019 •

edited

Loading

kmansoft commented Nov 14, 2019 •

edited

Loading

kmansoft commented Nov 16, 2019 •

edited

Loading

Zamiell commented May 16, 2020 •

edited

Loading

torlarse commented Nov 17, 2020 •

edited

Loading

warent commented Nov 17, 2020 •

edited

Loading