Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
PgJDBC does not pipeline batches that return generated keys #195
PgJDBC disables batch pipelining if a batch requests generated keys. It forces a Sync message and result consumption after each query, before issuing the next. This means that batches with generated keys are not effectively batched, and perform the same as if the queries were executed in a loop by the client application. This results in a huge round-trip penalty.
Whether batch execution is performed or not is controlled by the
It is set by
The relevant commit appears to be 985c047.
which gives, in brief, the rationale for the change: avoiding a potential deadlock.
The deadlock the commit message refers to is a well-recognised issue in PgJDBC, where the PgJDBC send-buffer can fill up with queued work, then the PostgreSQL server can send enough data to fill PgJDBC's receive-buffer as well. At this point PgJDBC is blocked waiting to send data to PostgreSQL, and PostgreSQL is blocked waiting to send data from PGJDBC. I've documented tihs in more detail in issue #194.
This batching behaviour can be demonstrated by observing the on-the-wire client/server exchange when running
For batching the driver is currently using the most conservative possible limit, queuing only one query at a time. That'll be partly because it doesn't differentiate between
PgJDBC doesn't know how much data to expect for each response and how many queries it can safely queue before risking running out of receive buffer space, and it doesn't attempt to manage its send buffer to avoid overflowing it at all. So it takes the most conservative possible course and queues one statement at a time.
The ideal solution to this problem is one I've wanted to tackle for some time, but one that's somewhat intrusive in terms of changes to the driver: split the sending and receiving sides of the protocol handler into separate worker threads that exchange messages and synchronize only
A lower-impact but potentially less reliable and performant solution would be to send batches of work that're bigger than one query, but are still limited in size and spaced out by sync points. The driver could use the Describe message result and the list of returned columns to estimate the possible size of the result message for each query execution, then size batches to stay within the receive buffer plus a reasonable margin.
However, this still relies on the underlying heuristics about the server's response sizes being otherwise correct, which they aren't necessarily even without generated keys, per issue #194. So it's a workaround that doesn't really fully solve the underlying bug.