Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid sending Z packet in the middle of extended protocol packet sequence if we fail to get connection form pool #137

Merged
merged 22 commits into from
Aug 23, 2022

Conversation

drdrsh
Copy link
Collaborator

@drdrsh drdrsh commented Aug 22, 2022

Sending a query in the extended protocol is done by transmitting a sequence of discrete protocol messages (in contrast with simple mode where the entire query is sent in a Q packet).

This poses a problem to Pgcat if the pool is busy because if we are unable to checkout a connection from the pool we send back to the client a SystemError and a ReadyForQuery packets. This is unexpected by client (Getting ReadyForQuery before sending S). This resulted in error messages like

FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
FATAL:  could not get connection from the pool
message type 0x5a arrived from server while idle
ERROR:  portal "" does not exist
message type 0x5a arrived from server while idle
message type 0x31 arrived from server while idle
message type 0x32 arrived from server while idle
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x31 arrived from server while idle
message type 0x32 arrived from server while idle
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x31 arrived from server while idle
message type 0x32 arrived from server while idle
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x31 arrived from server while idle
message type 0x32 arrived from server while idle
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x31 arrived from server while idle
message type 0x32 arrived from server while idle
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle

In this fix, we will hold off on sending back the error and the ReadyForQuery packets if we fail to get a connection from the pool while we are processing extended protocol messages other than S. This will expose us to an edge case if all extended protocol messages end up being discarded because we have no available connections and only S goes through in which case the query will fail (which is appropriate) with the error message unnamed prepared statement does not exist, because Pgcat has discarded the packets that were supposed to build the prepared statement which S wants to run.

Pgbouncer doesn't handle this case simply because failure to obtain a connection from the pool is a fatal error in Pgbouncer that terminates the client connection.

conn_str = "postgres://sharding_user:sharding_user@127.0.0.1:6432/sharded_db"
conn_under_test = PG::connect(conn_str)
50.times do
Thread.new do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you need threads here, I thought async_exec is asynchronous so you could issue multiple queries without waiting for the result? Maybe it's not named correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that async_exec blocks the caller but releases the GVL whereas sync_exec blocks the client and does not release the GVL. I haven't studied the code closely so I could be wrong.

@drdrsh
Copy link
Collaborator Author

drdrsh commented Aug 23, 2022

Okay, I have a test failing without my fix.

Screen Shot 2022-08-23 at 12 29 50 PM

@drdrsh drdrsh changed the title Failing test Avoid sending Z packet in the middle of extended protocol packet sequence if we fail to get connection form pool Aug 23, 2022
@drdrsh drdrsh marked this pull request as ready for review August 23, 2022 17:54
Comment on lines +22 to +23
wget -O toxiproxy-2.4.0.deb https://github.com/Shopify/toxiproxy/releases/download/v2.4.0/toxiproxy_2.4.0_linux_$(dpkg --print-architecture).deb
sudo dpkg -i toxiproxy-2.4.0.deb
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it possible to run this test script on any architecture

Copy link
Collaborator

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good find, thank you.

@levkk levkk merged commit c054ff0 into postgresml:main Aug 23, 2022
@levkk levkk mentioned this pull request Aug 23, 2022
@chhetripradeep
Copy link
Contributor

Great debugging @drdrsh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants