Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FATAL in function client_proto(): bad client state: 6/7 #928

Merged
merged 1 commit into from Aug 18, 2023

Conversation

JelteF
Copy link
Member

@JelteF JelteF commented Aug 17, 2023

The cancel request protocol is very simple:

  1. Clients send a cancel key
  2. Server cancels the query running on the matching backend
  3. The server closes the socket to indicate the cancel request was
    handled

At no point after sending the cancel key the client is required to send
any data, and if it does it won't get a response anyway because the
server would close the connection. So most clients don't, specifically
all libpq based client don't.

The Go client however, sends a Terminate ('X') packet before closing the
cancel socket from its own side (usually due to a 10 second timeout).
This would cause a crash in PgBouncer, because PgBouncer would still
listen on the client socket, but on receiving any data we wouldreport a
fatal error. This PR fixes that fatal error by stopping to listen on the
client socket once we've received the cancel key (i.e. when we change
the client state to CL_WAITING_CANCEL).

I was able to reproduce both of the errors reported by @raymasson and
confirmed that after this change the errors would not occur again.

Reproducing this with libpq based clients is not possible, except when
manually attaching to the client using gdb and manually running send
on the cancel request its socket. So no test is added, since our test
suite uses psycopg which uses libpq under the hood.

Fixes #904

Related to #717

The cancel request protocol is very simple:
1. Clients send a cancel key
2. Server cancels the query running on the matching backend
3. The server closes the socket to indicate the cancel request was
   handled

At no point after sending the cancel key the client is required to send
any data, and if it does it won't get a response anyway because the
server would close the connection. So most clients don't, specifically
all libpq based client don't.

The Go client however, sends a Terminate ('X') packet before closing the
cancel socket from its own side (usually due to a 10 second timeout).
This would cause a crash in PgBouncer, because PgBouncer would still
listen on the client socket, but on receiving any data we wouldreport a
fatal error. This PR fixes that fatal error by stopping to listen on the
client socket once we've received the cancel key (i.e. when we change
the client state to CL_WAITING_CANCEL).

I was able to reproduce both of the errors reported by @raymasson and
confirmed that after this change the errors would not occur again.

Reproducing this with libpq based clients is not possible, except when
manually attaching to the client using `gdb` and manually running `send`
on the cancel request its socket. So no test is added, since our test
suite uses psycopg which uses libpq under the hood.

Fixes pgbouncer#904

Related to pgbouncer#717
@JelteF JelteF requested a review from eulerto August 17, 2023 07:41
@eulerto
Copy link
Member

eulerto commented Aug 17, 2023

Looks sane to me. It would be good if the reporter tests it.

@JelteF JelteF merged commit 5b9434e into pgbouncer:master Aug 18, 2023
23 checks passed
@JelteF JelteF deleted the fix-fatal-904 branch August 18, 2023 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FATAL @src/client.c:1092 in function client_proto(): bad client state: 6
2 participants