PgJDBC does not pipeline batches that return generated keys #195

ringerc · 2014-10-02T13:10:49Z

PgJDBC disables batch pipelining if a batch requests generated keys. It forces a Sync message and result consumption after each query, before issuing the next. This means that batches with generated keys are not effectively batched, and perform the same as if the queries were executed in a loop by the client application. This results in a huge round-trip penalty.

Whether batch execution is performed or not is controlled by the QUERY_DISALLOW_BATCHING flag, in org/postgresql/core/QueryExecutor.java.

It is set by AbstractJdbc2Statement.executeBatch() when wantsGeneratedKeysAlways is set. That public member (which should be protected, really) is set by AbstractJdbc3Connection's prepareStatement variants when generated keys are requested.

When QUERY_DISALLOW_BATCHING is set, org.postgresql.core.v3.QueryExecutorImpl.sendQuery(...) sets disallowBatching, which forces a Sync and waits for results before proceeding with the next query.

The relevant commit appears to be 985c047.

which gives, in brief, the rationale for the change: avoiding a potential deadlock.

The deadlock the commit message refers to is a well-recognised issue in PgJDBC, where the PgJDBC send-buffer can fill up with queued work, then the PostgreSQL server can send enough data to fill PgJDBC's receive-buffer as well. At this point PgJDBC is blocked waiting to send data to PostgreSQL, and PostgreSQL is blocked waiting to send data from PGJDBC. I've documented tihs in more detail in issue #194.

This batching behaviour can be demonstrated by observing the on-the-wire client/server exchange when running testPreparedBatchResultSet in TestBatch.java from https://github.com/ringerc/pgjdbc-batchtest .

The text was updated successfully, but these errors were encountered:

ringerc · 2014-10-02T13:14:04Z

For batching the driver is currently using the most conservative possible limit, queuing only one query at a time. That'll be partly because it doesn't differentiate between RETURNING * usage triggered by use of the Statement.RETURN_GENERATED_KEYS flag to Connection.prepareStatement (where lots of data may be returned for each execution) and the case where an application has requested that only a couple of small columns should be returned as generated keys by using the String[] form of Connection.prepareStatement.

PgJDBC doesn't know how much data to expect for each response and how many queries it can safely queue before risking running out of receive buffer space, and it doesn't attempt to manage its send buffer to avoid overflowing it at all. So it takes the most conservative possible course and queues one statement at a time.

The ideal solution to this problem is one I've wanted to tackle for some time, but one that's somewhat intrusive in terms of changes to the driver: split the sending and receiving sides of the protocol handler into separate worker threads that exchange messages and synchronize only
at protocol Sync points. Or use non-blocking I/O with nio, per issue #163. Both these options completely eliminate any possible deadlock.

ringerc · 2014-10-02T13:16:01Z

A lower-impact but potentially less reliable and performant solution would be to send batches of work that're bigger than one query, but are still limited in size and spaced out by sync points. The driver could use the Describe message result and the list of returned columns to estimate the possible size of the result message for each query execution, then size batches to stay within the receive buffer plus a reasonable margin.

However, this still relies on the underlying heuristics about the server's response sizes being otherwise correct, which they aren't necessarily even without generated keys, per issue #194. So it's a workaround that doesn't really fully solve the underlying bug.

ringerc · 2014-10-02T14:28:13Z

Per #194 - I think the right solution is proper send-buffer sizing. It'll fix the underlying deadlock and make batching with generated keys safe (and probably faster).

Add limited support for returning generated columns in batches See individual commits for notes on the limitations. This isn't a perfect solution to issue #194 and #195, but it'll work well for most people.

davecramer · 2019-05-22T14:03:30Z

stale

ringerc added the enhancement label Oct 2, 2014

ringerc mentioned this issue Oct 2, 2014

PgJDBC can experience client/server deadlocks during batch execution #194

Open

ringerc self-assigned this Oct 2, 2014

This was referenced Oct 2, 2014

getGeneratedKeys returns all columns for Statement.RETURN_GENERATED_KEYS #99

Closed

PgJDBC issues an unnecessary Describe for each execution of a prepared statement #196

Closed

Enable batching when returning generated keys #15

Closed

ringerc mentioned this issue Oct 10, 2014

Batch returning support #204

Merged

royceness mentioned this issue Apr 10, 2015

Deadlock in processDeadParsedQueries() #279

Closed

ringerc mentioned this issue Dec 26, 2015

Support for Simple protocol. #365

Closed

davecramer closed this as completed May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PgJDBC does not pipeline batches that return generated keys #195

PgJDBC does not pipeline batches that return generated keys #195

ringerc commented Oct 2, 2014

ringerc commented Oct 2, 2014

ringerc commented Oct 2, 2014

ringerc commented Oct 2, 2014

davecramer commented May 22, 2019

PgJDBC does not pipeline batches that return generated keys #195

PgJDBC does not pipeline batches that return generated keys #195

Comments

ringerc commented Oct 2, 2014

ringerc commented Oct 2, 2014

ringerc commented Oct 2, 2014

ringerc commented Oct 2, 2014

davecramer commented May 22, 2019