Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge unread notifications combined with huge Exec may deadlock #27

Closed
jackc opened this issue Feb 5, 2020 · 2 comments
Closed

Huge unread notifications combined with huge Exec may deadlock #27

jackc opened this issue Feb 5, 2020 · 2 comments

Comments

@jackc
Copy link
Owner

jackc commented Feb 5, 2020

This is an extreme corner case that is extremely unlikely to occur without purposely triggering it.

If the PostgreSQL server sends more data than can fit in the client's incoming network buffers and the client then sends more data than can fit in the server's incoming network buffer a deadlock will occur. The exact amount depends on the platform. On MacOS Catalina 1MB in each direction is enough to deadlock when using TCP on localhost. Less is necessary when using a Unix socket.

The only known way to trigger this is for a connection to listen on a channel and not perform any action until the network buffers are full of unread notifications. Then it needs to send a huge query that overfills the network buffers the other direction.

There are two possible solutions:

  1. Read before any writes. As there is no non-blocking read available (proposal: net: non-blocking Read on Conn golang/go#36973) it is unclear how much performance cost there would be or if it could be done reliably. (What is the magic time we wait for the read deadline? How short is too short?)
  2. Use goroutines for network IO. The problems with this approach are performance (a proof of concept test had a 20+% penalty in one test), additional complexity, and backwards compatibility. On the plus side, this would remove the need for the special casing to avoid deadlocks in batch operations and the copy protocol. It could also make it easy to detect network interruptions or database shutdowns (Database restarts are not handled gracefully pgx#672).

Both solutions have a large cost so it is unclear if this corner case is worth fixing.

func TestConnExecNotificationsFillReadBufferBeforeAttemptingWrite(t *testing.T) {
	t.Parallel()

	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
	defer cancel()

	pgConn, err := pgconn.Connect(ctx, os.Getenv("PGX_TEST_CONN_STRING"))
	require.NoError(t, err)
	defer closeConn(t, pgConn)

	_, err = pgConn.Exec(ctx, `listen a`).ReadAll()
	require.NoError(t, err)

	notifierConn, err := pgconn.Connect(ctx, os.Getenv("PGX_TEST_CONN_STRING"))
	require.NoError(t, err)
	defer closeConn(t, notifierConn)

	kilobyte := strings.Repeat("x", 1024)

	mrr := notifierConn.Exec(ctx, fmt.Sprintf(`select pg_notify('a', '%s' || n) from generate_series(1, %d) n`, kilobyte, 1000))
	err = mrr.Close()
	require.NoError(t, err)

	buf := &bytes.Buffer{}
	for i := 0; i < 1000; i++ {
		buf.WriteString(fmt.Sprintf("select '%s';\n", kilobyte))
	}

	_, err = pgConn.Exec(ctx, buf.String()).ReadAll()
	require.NoError(t, err, "apparent deadlock when PostgreSQL has sent too many messages before Exec")
}
@jackc
Copy link
Owner Author

jackc commented Jan 20, 2022

A slightly more realistic way to trigger this is to set client_min_messages = debug5. Then run try to send a very large query. For example, TestConnExecParamsMaxNumberOfParams hands with client_min_messages = debug5.

jackc added a commit to jackc/pgx that referenced this issue Jun 25, 2022
This eliminates an edge case that can cause a deadlock and is a
prerequisite to cheaply testing connection liveness and to recoving a
connection after a timeout.

jackc/pgconn#27

Squashed commit of the following:

commit 0d7b0dd
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 25 13:15:05 2022 -0500

    Add test for non-blocking IO preventing deadlock

commit 79d68d2
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 18 18:23:24 2022 -0500

    Release CopyFrom buf when done

commit 95a4313
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 18 18:22:32 2022 -0500

    Avoid allocations with non-blocking write

commit 6b63cee
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 18 17:46:49 2022 -0500

    Simplify iobufpool usage

commit 60ecdda
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 18 11:51:59 2022 -0500

    Add true non-blocking IO

commit 7dd26a3
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 20:28:23 2022 -0500

    Fix block when reading more than buffered

commit afa7022
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 20:10:23 2022 -0500

    More TLS support

commit 51655bf
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 17:46:00 2022 -0500

    Steps toward TLS

commit 2b80beb
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 13:06:29 2022 -0500

    Litle more TLS support

commit 765b2c6
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 12:29:30 2022 -0500

    Add testing of TLS

commit 5b64432
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 09:48:19 2022 -0500

    Introduce testVariants in prep for TLS

commit ecebd7b
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 09:32:14 2022 -0500

    Handle and test read of previously buffered data

commit 09c64d8
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 09:04:48 2022 -0500

    Rename nbbconn to nbconn

commit 73398bc
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 08:59:53 2022 -0500

    Remove backup files

commit f1df39a
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 08:58:05 2022 -0500

    Initial passing tests

commit ea3cdab
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat Jun 4 08:38:57 2022 -0500

    Fix connect timeout

commit ca22396
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Thu Jun 2 19:32:55 2022 -0500

    wip

commit 2e7b46d
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Mon May 30 08:32:43 2022 -0500

    Update comments

commit 7d04dc5
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat May 28 19:43:23 2022 -0500

    Fix broken test

commit bf1edc7
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat May 28 19:40:33 2022 -0500

    fixed putting wrong size bufs

commit 1f7a855
Author: Jack Christensen <jack@jackchristensen.com>
Date:   Sat May 28 18:13:47 2022 -0500

    initial not quite working non-blocking conn
@jackc
Copy link
Owner Author

jackc commented Jun 25, 2022

Fixed in pgx v5.

@jackc jackc closed this as completed Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant