Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgJDBC does not pipeline batches that return generated keys #195

Closed
ringerc opened this issue Oct 2, 2014 · 4 comments
Closed

PgJDBC does not pipeline batches that return generated keys #195

ringerc opened this issue Oct 2, 2014 · 4 comments
Assignees

Comments

@ringerc
Copy link
Member

ringerc commented Oct 2, 2014

PgJDBC disables batch pipelining if a batch requests generated keys. It forces a Sync message and result consumption after each query, before issuing the next. This means that batches with generated keys are not effectively batched, and perform the same as if the queries were executed in a loop by the client application. This results in a huge round-trip penalty.

Whether batch execution is performed or not is controlled by the QUERY_DISALLOW_BATCHING flag, in org/postgresql/core/QueryExecutor.java.

It is set by AbstractJdbc2Statement.executeBatch() when wantsGeneratedKeysAlways is set. That public member (which should be protected, really) is set by AbstractJdbc3Connection's prepareStatement variants when generated keys are requested.

When QUERY_DISALLOW_BATCHING is set, org.postgresql.core.v3.QueryExecutorImpl.sendQuery(...) sets disallowBatching, which forces a Sync and waits for results before proceeding with the next query.

The relevant commit appears to be 985c047.

which gives, in brief, the rationale for the change: avoiding a potential deadlock.

The deadlock the commit message refers to is a well-recognised issue in PgJDBC, where the PgJDBC send-buffer can fill up with queued work, then the PostgreSQL server can send enough data to fill PgJDBC's receive-buffer as well. At this point PgJDBC is blocked waiting to send data to PostgreSQL, and PostgreSQL is blocked waiting to send data from PGJDBC. I've documented tihs in more detail in issue #194.

This batching behaviour can be demonstrated by observing the on-the-wire client/server exchange when running testPreparedBatchResultSet in TestBatch.java from https://github.com/ringerc/pgjdbc-batchtest .

@ringerc
Copy link
Member Author

ringerc commented Oct 2, 2014

For batching the driver is currently using the most conservative possible limit, queuing only one query at a time. That'll be partly because it doesn't differentiate between RETURNING * usage triggered by use of the Statement.RETURN_GENERATED_KEYS flag to Connection.prepareStatement (where lots of data may be returned for each execution) and the case where an application has requested that only a couple of small columns should be returned as generated keys by using the String[] form of Connection.prepareStatement.

PgJDBC doesn't know how much data to expect for each response and how many queries it can safely queue before risking running out of receive buffer space, and it doesn't attempt to manage its send buffer to avoid overflowing it at all. So it takes the most conservative possible course and queues one statement at a time.

The ideal solution to this problem is one I've wanted to tackle for some time, but one that's somewhat intrusive in terms of changes to the driver: split the sending and receiving sides of the protocol handler into separate worker threads that exchange messages and synchronize only
at protocol Sync points. Or use non-blocking I/O with nio, per issue #163. Both these options completely eliminate any possible deadlock.

@ringerc
Copy link
Member Author

ringerc commented Oct 2, 2014

A lower-impact but potentially less reliable and performant solution would be to send batches of work that're bigger than one query, but are still limited in size and spaced out by sync points. The driver could use the Describe message result and the list of returned columns to estimate the possible size of the result message for each query execution, then size batches to stay within the receive buffer plus a reasonable margin.

However, this still relies on the underlying heuristics about the server's response sizes being otherwise correct, which they aren't necessarily even without generated keys, per issue #194. So it's a workaround that doesn't really fully solve the underlying bug.

@ringerc
Copy link
Member Author

ringerc commented Oct 2, 2014

Per #194 - I think the right solution is proper send-buffer sizing. It'll fix the underlying deadlock and make batching with generated keys safe (and probably faster).

@ringerc ringerc self-assigned this Oct 2, 2014
ringerc added a commit that referenced this issue Dec 1, 2014
Add limited support for returning generated columns in batches

See individual commits for notes on the limitations. This isn't a perfect solution to issue #194 and #195, but it'll work well for most people.
@davecramer
Copy link
Member

stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants