-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PgJDBC does not pipeline batches that return generated keys #195
Comments
For batching the driver is currently using the most conservative possible limit, queuing only one query at a time. That'll be partly because it doesn't differentiate between PgJDBC doesn't know how much data to expect for each response and how many queries it can safely queue before risking running out of receive buffer space, and it doesn't attempt to manage its send buffer to avoid overflowing it at all. So it takes the most conservative possible course and queues one statement at a time. The ideal solution to this problem is one I've wanted to tackle for some time, but one that's somewhat intrusive in terms of changes to the driver: split the sending and receiving sides of the protocol handler into separate worker threads that exchange messages and synchronize only |
A lower-impact but potentially less reliable and performant solution would be to send batches of work that're bigger than one query, but are still limited in size and spaced out by sync points. The driver could use the Describe message result and the list of returned columns to estimate the possible size of the result message for each query execution, then size batches to stay within the receive buffer plus a reasonable margin. However, this still relies on the underlying heuristics about the server's response sizes being otherwise correct, which they aren't necessarily even without generated keys, per issue #194. So it's a workaround that doesn't really fully solve the underlying bug. |
Per #194 - I think the right solution is proper send-buffer sizing. It'll fix the underlying deadlock and make batching with generated keys safe (and probably faster). |
stale |
PgJDBC disables batch pipelining if a batch requests generated keys. It forces a Sync message and result consumption after each query, before issuing the next. This means that batches with generated keys are not effectively batched, and perform the same as if the queries were executed in a loop by the client application. This results in a huge round-trip penalty.
Whether batch execution is performed or not is controlled by the
QUERY_DISALLOW_BATCHING
flag, inorg/postgresql/core/QueryExecutor.java
.It is set by
AbstractJdbc2Statement.executeBatch()
whenwantsGeneratedKeysAlways
is set. That public member (which should be protected, really) is set byAbstractJdbc3Connection
'sprepareStatement
variants when generated keys are requested.When
QUERY_DISALLOW_BATCHING
is set,org.postgresql.core.v3.QueryExecutorImpl.sendQuery(...)
setsdisallowBatching
, which forces a Sync and waits for results before proceeding with the next query.The relevant commit appears to be 985c047.
which gives, in brief, the rationale for the change: avoiding a potential deadlock.
The deadlock the commit message refers to is a well-recognised issue in PgJDBC, where the PgJDBC send-buffer can fill up with queued work, then the PostgreSQL server can send enough data to fill PgJDBC's receive-buffer as well. At this point PgJDBC is blocked waiting to send data to PostgreSQL, and PostgreSQL is blocked waiting to send data from PGJDBC. I've documented tihs in more detail in issue #194.
This batching behaviour can be demonstrated by observing the on-the-wire client/server exchange when running
testPreparedBatchResultSet
inTestBatch.java
from https://github.com/ringerc/pgjdbc-batchtest .The text was updated successfully, but these errors were encountered: