pgx hangs on 10k+ rows batch #374

devlo · 2018-01-02T03:08:07Z

Hello,

When I execute the same batch (same prepared statement that do INSERT) with exactly the same data with queue of around 500 elements then everything works fine and executes almost instantly but when I try with queue of 10100 elements then pgx just hangs never completing the Send(), no error is returned also. In postgres logs I can see: could not receive data from client: connection reset by peer.

refs #374

jackc · 2018-01-15T16:47:32Z

I was able to reproduce this issue.

Send writes all queued queries before reading any results. The deadlock occurs when the batched queries to be sent are so large that the PostgreSQL server cannot receive it all at once. PostgreSQL received some of the queued queries and starts executing them. As PostgreSQL executes the queries it sends responses back. pgx will not read any of these responses until it has finished sending. Therefore, if all network buffers are full in both directions pgx will not be able to finish sending the queries and PostgreSQL will not be able to finish sending the responses.

This is a non-trivial issue to resolve.

The simplest solution would be if there was some guaranteed number of queries that would be safe and to limit batch to queuing that number. However, the number of safe queries varies based on multiple factors such as the type of query and type of connection.

Another approach would be to start a goroutine before sending the batched queries that reads and buffers responses until the send has finished. But this would introduce quite a bit more complexity to an already complex system.

Another approach would be to internally break large batches into multiple interleaved send and receives. But again, it adds a lot of error-prone complexity to an already complex system.

For the time being, I have added documentation to Batch.Send regarding this issue.

jackc · 2019-06-29T18:57:18Z

This is resolved in v4 (currently prerelease).

kataras · 2020-12-26T14:52:59Z

@jackc I have the same problem, SendBatch hangs on 1200 total items (100 per batch, contains a JSONB field). I am using v4.10.1. Should I use BeginTx and execute each insert command instead?

jackc · 2020-12-26T15:28:52Z

@kataras Do you have a simple repro case? As far as I knew this issue was resolved a long time ago.

kataras · 2020-12-26T15:42:59Z

Hello @jackc , unfurtunally (or fortunately for you :)) the project I am starting to use your library is a production-level one for a company and the repository is private. However, the insert command is very simple, I fetch the data, asynchronous, from another source (an external API) with a pagination of 100 per time, and after some conversation I am sending them to the postgresql database. Each SendBatch sends 100 rows exactly, each SendBatch belongs to a single and unique inside the same routine Batch pointer value. More than one batch operations can run in parallel but they are not sharing anything (the fetch is done asynchronous as I've noted above).

Maybe it's a configuration option inside the postegresql database server itself? The record JSONB column is not huge, but it's a JSON one, The command argument for that column is just a raw json.RawMessage ([]byte) for better performance.

Update: A moment ago, I replaced Batch and SendBatch with BeginTx, Exec and Commit and it works.

jackc · 2020-12-28T16:22:15Z

It sounds like you might be reusing the same *pgx.Batch. I'm not even sure what that would do -- but nothing good. Make sure you are using a new *pgx.Batch for each operation.

Update: A moment ago, I replaced Batch and SendBatch with BeginTx, Exec and Commit and it works.

Well, that's a workaround, but obviously you lose a lot of performance there.

kataras · 2021-01-02T12:25:02Z

Hello @jackc, happy new year!

Ofc I did use a new *pgx.Batch for each operation but it didn't work, no worries there.

I have another critical question, I want to remove database rows that are inside a slice of UUIDs. I tried to use []string but it can't convert from string to uuid (that's postgresql thing) so I am using pgxtype.UUID which completes the MarshalJSON as well so it's safe to read directly from a request body and then i convert that to a UUIDArray. Here is a sample code:

	var payload = struct {
		IDs []pgtype.UUID `json:"ids"` // it completes the json marshaler interface.
	}{}

Example payload:

{
    "ids": ["dcb823b8-524c-4817-87bc-b73839640c37","b5a98047-121d-4003-8778-7bff42ab7313"]
}

	var args pgtype.UUIDArray
	args.Set(ids) // ids is a type of: []pgtype.UUID

	info, err := db.Exec(queryCtx, query, args)

The query looks exactly as: "DELETE FROM table WHERE id IN ($1)". The column id is UUIDv4 generated automatically by postgresql's gen_random_uuid().

It gives me ERROR: incorrect binary data format in bind parameter 1 (SQLSTATE 22P03) However a static query like that works:

"DELETE FROM tableName WHERE id IN ('dcb823b8-524c-4817-87bc-e73839640c37', 'b5a98047-121d-4003-8778-7bff42ab7313');"

My temporarly solution:

func (db *DB) DeleteByIDs(ctx context.Context, tableName string, ids []string) (int64, error) {
	// Unfortunately we have to do the query look like that, which is not the safest method:
	var b strings.Builder
	lastIdx := len(ids) - 1
	for i, id := range ids {
		b.WriteString(xstrconv.SingleQuote(id))
		if i < lastIdx {
			b.WriteByte(',')
		}
	}

	query := fmt.Sprintf("DELETE FROM %s WHERE id IN(%s)", tableName, b.String())
	if db.Options.Trace {
		log.Println(query, ids)
	}

	info, err := db.Exec(ctx, query)
        return info.AffectedRows(), err
}

Is there a workaround to delete one or more rows based on a list of uuids?

Note: i've already tried to ..."id::text IN ($1)" didn't work as well, postgresql can't convert uuid to text error.
And pass ids []*pgxtype.UUID instead, it errored with: cannot convert [0xc000613bc0 0xc000613c20] to UUID

Thanks!

jackc · 2021-01-02T18:13:54Z

The problem is that the IN expects one or more uuid parameters, but one uuid[] is being sent instead. Use any instead.

e.g. delete from t where id = any ($1)

See https://www.postgresql.org/docs/current/functions-comparisons.html#id-1.5.8.30.16.

apecollector · 2022-03-30T15:23:12Z

I'm also getting a similar problem, using batch with 10k updates. The program hangs and doesn't complete or return an error. I am using pgxpool however, and it is v4 "github.com/jackc/pgx/v4/pgxpool", here's the affected code:


	batch := &pgx.Batch{}
	for _, w := range widgets {
		batch.Queue("UPDATE widgets SET name = $1 WHERE id = $2", w.Name, w.ID)
		if err != nil {
			return err
		}
	}
	br := dbpool.SendBatch(context.Background(), batch)
	_, err = br.Exec()
	if err != nil {
		return err
	}

If I change out SendBatch to a standard dbpool.Exec() with the same sql it works, just slowly. Also if I only queue 1000 updates it works as well, so there's definitely some limit I'm hitting.

jackc · 2022-04-03T00:08:31Z

I'm not sure what could be going there. SendBatch uses a goroutine to read and write at the same time to avoid the network buffer deadlock that was happening before.

And this test works for me even when changed to 100K inserts.

pgx/batch_test.go

Lines 153 to 192 in 3ce50c0

    
           func TestConnSendBatchMany(t *testing.T) { 
        
           	t.Parallel() 
        
           	conn := mustConnectString(t, os.Getenv("PGX_TEST_DATABASE")) 
        
           	defer closeConn(t, conn) 
        
           	sql := `create temporary table ledger( 
        
           	  id serial primary key, 
        
           	  description varchar not null, 
        
           	  amount int not null 
        
           	);` 
        
           	mustExec(t, conn, sql) 
        
           	batch := &pgx.Batch{} 
        
           	numInserts := 1000 
        
           	for i := 0; i < numInserts; i++ { 
        
           		batch.Queue("insert into ledger(description, amount) values($1, $2)", "q1", 1) 
        
           	} 
        
           	batch.Queue("select count(*) from ledger") 
        
           	br := conn.SendBatch(context.Background(), batch) 
        
           	for i := 0; i < numInserts; i++ { 
        
           		ct, err := br.Exec() 
        
           		assert.NoError(t, err) 
        
           		assert.EqualValues(t, 1, ct.RowsAffected()) 
        
           	} 
        
           	var actualInserts int 
        
           	err := br.QueryRow().Scan(&actualInserts) 
        
           	assert.NoError(t, err) 
        
           	assert.EqualValues(t, numInserts, actualInserts) 
        
           	err = br.Close() 
        
           	require.NoError(t, err) 
        
           	ensureConnValid(t, conn) 
        
           }

By any chance are you using the simple protocol?

apecollector · 2022-04-04T07:46:12Z

I'm not sure about the protocol, couldn't find anything in the docs with that terminology, this is my connection setup:

	dbpool, err := pgxpool.Connect(context.Background(), connString)
	if err != nil {
		return err
	}

	defer dbpool.Close()

I see in the test it's using 1,000 inserts, my issue only was reproducible with a 10,000 batch queue and worked with a 1,000 batch queue. Also maybe something with update vs insert?

jackc · 2022-04-04T13:09:29Z

I'm not sure about the protocol, couldn't find anything in the docs with that terminology,

The extended protocol is used by default so if you haven't changed anything that is what is used.

I see in the test it's using 1,000 inserts, my issue only was reproducible with a 10,000 batch queue and worked with a 1,000 batch queue.

I changed it locally to 100K and it still worked for me.

Maybe try ctrl+\ when it is hung to get a stack trace where it is stuck.

nassibnassar · 2023-07-05T22:49:13Z

This is anecdotal but thought I should report it. In my set up, a batch of 200K statements (a mix of updates and inserts) reproducibly caused the Postgres server process to be terminated because of "excessive memory consumption". Reducing the batch size to 200 resolved the problem.

jackc added a commit that referenced this issue Jan 15, 2018

Add docs regarding Batch.Send deadlock potential

f5c7ef0

refs #374

jackc added the bug label Jan 15, 2018

jackc added this to the v4 milestone Jun 29, 2019

jackc closed this as completed Jun 29, 2019

mitjat mentioned this issue Mar 2, 2023

stats: Implement Daily active accounts statistics oasisprotocol/nexus#329

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgx hangs on 10k+ rows batch #374

pgx hangs on 10k+ rows batch #374

devlo commented Jan 2, 2018 •

edited

Loading

jackc commented Jan 15, 2018

jackc commented Jun 29, 2019

kataras commented Dec 26, 2020

jackc commented Dec 26, 2020

kataras commented Dec 26, 2020

jackc commented Dec 28, 2020

kataras commented Jan 2, 2021 •

edited

Loading

jackc commented Jan 2, 2021

apecollector commented Mar 30, 2022 •

edited

Loading

jackc commented Apr 3, 2022

apecollector commented Apr 4, 2022 •

edited

Loading

jackc commented Apr 4, 2022

nassibnassar commented Jul 5, 2023

pgx hangs on 10k+ rows batch #374

pgx hangs on 10k+ rows batch #374

Comments

devlo commented Jan 2, 2018 • edited Loading

jackc commented Jan 15, 2018

jackc commented Jun 29, 2019

kataras commented Dec 26, 2020

jackc commented Dec 26, 2020

kataras commented Dec 26, 2020

jackc commented Dec 28, 2020

kataras commented Jan 2, 2021 • edited Loading

jackc commented Jan 2, 2021

apecollector commented Mar 30, 2022 • edited Loading

jackc commented Apr 3, 2022

apecollector commented Apr 4, 2022 • edited Loading

jackc commented Apr 4, 2022

nassibnassar commented Jul 5, 2023

devlo commented Jan 2, 2018 •

edited

Loading

kataras commented Jan 2, 2021 •

edited

Loading

apecollector commented Mar 30, 2022 •

edited

Loading

apecollector commented Apr 4, 2022 •

edited

Loading