Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgx hangs on 10k+ rows batch #374

Closed
devlo opened this issue Jan 2, 2018 · 13 comments
Closed

pgx hangs on 10k+ rows batch #374

devlo opened this issue Jan 2, 2018 · 13 comments
Labels
Milestone

Comments

@devlo
Copy link

devlo commented Jan 2, 2018

Hello,

When I execute the same batch (same prepared statement that do INSERT) with exactly the same data with queue of around 500 elements then everything works fine and executes almost instantly but when I try with queue of 10100 elements then pgx just hangs never completing the Send(), no error is returned also. In postgres logs I can see: could not receive data from client: connection reset by peer.

jackc added a commit that referenced this issue Jan 15, 2018
@jackc jackc added the bug label Jan 15, 2018
@jackc
Copy link
Owner

jackc commented Jan 15, 2018

I was able to reproduce this issue.

Send writes all queued queries before reading any results. The deadlock occurs when the batched queries to be sent are so large that the PostgreSQL server cannot receive it all at once. PostgreSQL received some of the queued queries and starts executing them. As PostgreSQL executes the queries it sends responses back. pgx will not read any of these responses until it has finished sending. Therefore, if all network buffers are full in both directions pgx will not be able to finish sending the queries and PostgreSQL will not be able to finish sending the responses.

This is a non-trivial issue to resolve.

The simplest solution would be if there was some guaranteed number of queries that would be safe and to limit batch to queuing that number. However, the number of safe queries varies based on multiple factors such as the type of query and type of connection.

Another approach would be to start a goroutine before sending the batched queries that reads and buffers responses until the send has finished. But this would introduce quite a bit more complexity to an already complex system.

Another approach would be to internally break large batches into multiple interleaved send and receives. But again, it adds a lot of error-prone complexity to an already complex system.

For the time being, I have added documentation to Batch.Send regarding this issue.

@jackc jackc added this to the v4 milestone Jun 29, 2019
@jackc
Copy link
Owner

jackc commented Jun 29, 2019

This is resolved in v4 (currently prerelease).

@jackc jackc closed this as completed Jun 29, 2019
@kataras
Copy link
Contributor

kataras commented Dec 26, 2020

@jackc I have the same problem, SendBatch hangs on 1200 total items (100 per batch, contains a JSONB field). I am using v4.10.1. Should I use BeginTx and execute each insert command instead?

@jackc
Copy link
Owner

jackc commented Dec 26, 2020

@kataras Do you have a simple repro case? As far as I knew this issue was resolved a long time ago.

@kataras
Copy link
Contributor

kataras commented Dec 26, 2020

Hello @jackc , unfurtunally (or fortunately for you :)) the project I am starting to use your library is a production-level one for a company and the repository is private. However, the insert command is very simple, I fetch the data, asynchronous, from another source (an external API) with a pagination of 100 per time, and after some conversation I am sending them to the postgresql database. Each SendBatch sends 100 rows exactly, each SendBatch belongs to a single and unique inside the same routine Batch pointer value. More than one batch operations can run in parallel but they are not sharing anything (the fetch is done asynchronous as I've noted above).

Maybe it's a configuration option inside the postegresql database server itself? The record JSONB column is not huge, but it's a JSON one, The command argument for that column is just a raw json.RawMessage ([]byte) for better performance.

Update: A moment ago, I replaced Batch and SendBatch with BeginTx, Exec and Commit and it works.

@jackc
Copy link
Owner

jackc commented Dec 28, 2020

It sounds like you might be reusing the same *pgx.Batch. I'm not even sure what that would do -- but nothing good. Make sure you are using a new *pgx.Batch for each operation.

Update: A moment ago, I replaced Batch and SendBatch with BeginTx, Exec and Commit and it works.

Well, that's a workaround, but obviously you lose a lot of performance there.

@kataras
Copy link
Contributor

kataras commented Jan 2, 2021

Hello @jackc, happy new year!

Ofc I did use a new *pgx.Batch for each operation but it didn't work, no worries there.

I have another critical question, I want to remove database rows that are inside a slice of UUIDs. I tried to use []string but it can't convert from string to uuid (that's postgresql thing) so I am using pgxtype.UUID which completes the MarshalJSON as well so it's safe to read directly from a request body and then i convert that to a UUIDArray. Here is a sample code:

	var payload = struct {
		IDs []pgtype.UUID `json:"ids"` // it completes the json marshaler interface.
	}{}

Example payload:

{
    "ids": ["dcb823b8-524c-4817-87bc-b73839640c37","b5a98047-121d-4003-8778-7bff42ab7313"]
}
	var args pgtype.UUIDArray
	args.Set(ids) // ids is a type of: []pgtype.UUID

	info, err := db.Exec(queryCtx, query, args)

The query looks exactly as: "DELETE FROM table WHERE id IN ($1)". The column id is UUIDv4 generated automatically by postgresql's gen_random_uuid().

It gives me ERROR: incorrect binary data format in bind parameter 1 (SQLSTATE 22P03) However a static query like that works:

"DELETE FROM tableName WHERE id IN ('dcb823b8-524c-4817-87bc-e73839640c37', 'b5a98047-121d-4003-8778-7bff42ab7313');"

My temporarly solution:

func (db *DB) DeleteByIDs(ctx context.Context, tableName string, ids []string) (int64, error) {
	// Unfortunately we have to do the query look like that, which is not the safest method:
	var b strings.Builder
	lastIdx := len(ids) - 1
	for i, id := range ids {
		b.WriteString(xstrconv.SingleQuote(id))
		if i < lastIdx {
			b.WriteByte(',')
		}
	}

	query := fmt.Sprintf("DELETE FROM %s WHERE id IN(%s)", tableName, b.String())
	if db.Options.Trace {
		log.Println(query, ids)
	}

	info, err := db.Exec(ctx, query)
        return info.AffectedRows(), err
}

Is there a workaround to delete one or more rows based on a list of uuids?

Note: i've already tried to ..."id::text IN ($1)" didn't work as well, postgresql can't convert uuid to text error.
And pass ids []*pgxtype.UUID instead, it errored with: cannot convert [0xc000613bc0 0xc000613c20] to UUID

Thanks!

@jackc
Copy link
Owner

jackc commented Jan 2, 2021

The problem is that the IN expects one or more uuid parameters, but one uuid[] is being sent instead. Use any instead.

e.g. delete from t where id = any ($1)

See https://www.postgresql.org/docs/current/functions-comparisons.html#id-1.5.8.30.16.

@apecollector
Copy link

apecollector commented Mar 30, 2022

I'm also getting a similar problem, using batch with 10k updates. The program hangs and doesn't complete or return an error. I am using pgxpool however, and it is v4 "github.com/jackc/pgx/v4/pgxpool", here's the affected code:


	batch := &pgx.Batch{}
	for _, w := range widgets {
		batch.Queue("UPDATE widgets SET name = $1 WHERE id = $2", w.Name, w.ID)
		if err != nil {
			return err
		}
	}
	br := dbpool.SendBatch(context.Background(), batch)
	_, err = br.Exec()
	if err != nil {
		return err
	}

If I change out SendBatch to a standard dbpool.Exec() with the same sql it works, just slowly. Also if I only queue 1000 updates it works as well, so there's definitely some limit I'm hitting.

@jackc
Copy link
Owner

jackc commented Apr 3, 2022

I'm not sure what could be going there. SendBatch uses a goroutine to read and write at the same time to avoid the network buffer deadlock that was happening before.

And this test works for me even when changed to 100K inserts.

pgx/batch_test.go

Lines 153 to 192 in 3ce50c0

func TestConnSendBatchMany(t *testing.T) {
t.Parallel()
conn := mustConnectString(t, os.Getenv("PGX_TEST_DATABASE"))
defer closeConn(t, conn)
sql := `create temporary table ledger(
id serial primary key,
description varchar not null,
amount int not null
);`
mustExec(t, conn, sql)
batch := &pgx.Batch{}
numInserts := 1000
for i := 0; i < numInserts; i++ {
batch.Queue("insert into ledger(description, amount) values($1, $2)", "q1", 1)
}
batch.Queue("select count(*) from ledger")
br := conn.SendBatch(context.Background(), batch)
for i := 0; i < numInserts; i++ {
ct, err := br.Exec()
assert.NoError(t, err)
assert.EqualValues(t, 1, ct.RowsAffected())
}
var actualInserts int
err := br.QueryRow().Scan(&actualInserts)
assert.NoError(t, err)
assert.EqualValues(t, numInserts, actualInserts)
err = br.Close()
require.NoError(t, err)
ensureConnValid(t, conn)
}

By any chance are you using the simple protocol?

@apecollector
Copy link

apecollector commented Apr 4, 2022

I'm not sure about the protocol, couldn't find anything in the docs with that terminology, this is my connection setup:

	dbpool, err := pgxpool.Connect(context.Background(), connString)
	if err != nil {
		return err
	}

	defer dbpool.Close()

I see in the test it's using 1,000 inserts, my issue only was reproducible with a 10,000 batch queue and worked with a 1,000 batch queue. Also maybe something with update vs insert?

@jackc
Copy link
Owner

jackc commented Apr 4, 2022

I'm not sure about the protocol, couldn't find anything in the docs with that terminology,

The extended protocol is used by default so if you haven't changed anything that is what is used.

I see in the test it's using 1,000 inserts, my issue only was reproducible with a 10,000 batch queue and worked with a 1,000 batch queue.

I changed it locally to 100K and it still worked for me.

Maybe try ctrl+\ when it is hung to get a stack trace where it is stuck.

@nassibnassar
Copy link

This is anecdotal but thought I should report it. In my set up, a batch of 200K statements (a mix of updates and inserts) reproducibly caused the Postgres server process to be terminated because of "excessive memory consumption". Reducing the batch size to 200 resolved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants