PgJDBC can experience client/server deadlocks during batch execution #194
Comments
To reduce the likeihood of tripping this bug, PgJDBC doesn't queue batches that return result sets, such as a One possible option for making deadlocks impossible is covered in brief by issue #163 - using non-blocking sockets with java.nio. However, it's likely to be intrusive. |
An alternative to completely changing the data exchange mechanism is to instead get PgJDBC to manage its send buffer properly. PgJDBC currently ignores its send buffer and tries to manage the server's buffer. This is backwards. The only buffer PgJDBC can completely control is its own send buffer. So what we really need to do is avoid blocking on writing to that if we know that there's already a pending query response. (If there's no pending query it's fine to block, the server will continue consuming our input even if there's an error). Using Non-blocking reads/writes with java.nio streams?Java doesn't expose any API to query the available space in the TCP send buffer, and there's no portable way to query it from the underlying platform. You need Linux-specific hacks like SIOCOUTQ. In java.nio (since Java 1.4) there's now the option of creating a non-blocking We could guarantee that it's safe to read from the receive stream by forcing the server to send more data by writing a Even if we solved the SSL issue and got a guaranteed non-blocking input stream too, we'd have to muck around with a control loop that select()s the next readable/writeable socket and pipelines more data. This is complicated by the fact that the output socket might still be writable, just not with the message size we want. So doing this with a non-blocking approach would require a pretty major change to the driver. Writing up to the send buffer size, then syncing and flushingInstead, we can just avoid blocking on the socket by never filling the send buffer without ending it in a This is deadlock-proof, but greatly limits the number of big queries that PgJDBC can pipeline in a batch. Currently with an assumed 250 byte reply and 64k buffer PgJDBC assumes it can safely pipeline 64000 / 250 = 256 queries before needing to sync and consume input. If we instead use the real send buffer size on a typical system, as determined by poking in the driver's guts reflectively, e.g.:
I can see that my default is That's still a lot of sanely sized queries, and bigger queries will be less affected by round trip costs anyway. So we should consider moving deadlock prevention logic from attempting to control the server's send buffer to trying to control our own send buffer. That's much safer, and lets us safely batch prepared statements that request generated keys. |
Per GitHub issue pgjdbc#194 and the comments above MAX_BUFFERED_QUERIES, we're using a pretty rough heuristic for receive buffer management. This approach can't account for the data in prepared queries that return generated keys, it assumes a flat 250 bytes per query response. Change that, the buffer in bytes, up to an estimated MAX_BUFFERED_RECV_BYTES (still 64k, same as before) with an estimated NODATA_QUERY_RESPONSE_SIZE_BYTES or 250 bytes per query. Behaviour is not changed, we're just counting bytes instead of queries. This means that it's now possible to adjust the baseline NODATA_QUERY_RESPONSE_SIZE_BYTES for the size of any returned columns in a query with a RETURNING clause in a subsequent patch.
Just FYI: We had this problem with Maven artifact org.postgresql:postgresql:9.4.1208.jre6. The stacktrace on the client is:
The server just shows in pg_stat_activity that it is executing some SQL statement that is part of the batch, with waiting == false. We had it once that that statement was even supposed to return zero rows (because the corresponding table was empty). We only seem to have the problem when the client runs on the same host as the server (connection using TCP from some IP address to the same IP address, not necessarily 127.0.0.1). We only ever had this at our clients that use Windows (but it might also occur on other operating systems). We are trying to find a way to tweak the settings so as to avoid this problem. |
Thx for the report. I would suggest upgrading as rewriteInsert in 1209 makes batch inserts much Dave Cramer On 16 September 2016 at 04:51, Nicolas Barbier notifications@github.com
|
FTR: Upgrading to version 1210 didn’t fix the problem. After setting sendBufferSize and recvBufferSize to large values (512 kB; although I guess we only really needed to increase sendBufferSize), the problem disappeared. |
pretty sure this has been resolved |
Pretty sure it has not. The resolution would either use non-blocking API or use a separate thread to pump the data. |
Agreed that it's not resolved. I looked into using a separate thread, but couldn't find much clarity on how threads interact with JDBC drivers and what the rules are there. How would we reliably ensure our receive-pumping thread was terminated when the connection was GC'd and closed, etc. But I expect we can rely on the shared TCP socket for that. It's probably not that hard, and likely the sensible solution. Java is already so heavily threaded that nobody's going to get upset if we spawn a thread. Some care will be required to make sure the new thread gets the same classloader as the spawning thread to work properly in containerized environments, but that's well established. I'm a bit unsure why I dismissed a threaded solution when I looked into this before. |
Ah, it was related to the work I did here https://stackoverflow.com/q/8514725/398670 . |
@ringerc org.postgresql.Driver is using thread during connection establishing (ConnectThread in Driver::connect). How is that any different from your case? |
Had this problem too on latest version of driver 42.2.5. Thread stuck with this stack trace
Native stack trace:
One of TCP connections (used by stuck thread) has big
Are there any known workarounds? May be setting query timeout can help? |
@turbanoff I think the work around is to increase the size of the send buffer as per the comments above |
is it a same issue? the driver version is 42.2.4; |
@zistrong are you attempting to use the same connection in a multi-threaded app ? |
We've observed this using 42.2.8. Is there a way to code around this issue? Will setting a timeout work?
|
Changing the size of the output buffer has shown to help, but currently changing timeouts won't help. |
I increasingly think I need to find the time to make PgJDBC use separate threads for tx and rx so the method-caller's thread can retain control. PgJDBC can then wait on multiple events in possible deadlock scenarios, i.e. "when tx buffer is full, sleep until tx buffer writeable OR rx buffer readable". Shouldn't even be that hard. |
Actually, we can probably break the deadlock by using I still wonder if it's better to just switch to NIO and SocketChannel. It's probably not as hard as I thought above. We might need to wrap the rx channel in a buffer so we can push unconsumed input back onto it, but that's not too bad. |
On Wed, Jan 22, 2020, 12:59 AM Craig Ringer, ***@***.***> wrote:
I increasingly think I need to find the time to make PgJDBC use separate
threads for tx and rx so the method-caller's thread can retain control.
PgJDBC can then wait on multiple events in possible deadlock scenarios,
i.e. "when tx buffer is full, sleep until tx buffer writeable OR rx buffer
readable".
Shouldn't even be that hard.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#194?email_source=notifications&email_token=AADDH5QB2EQ46ZDURFV46ULQ67OFDA5CNFSM4AVH6IZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJSKO2A#issuecomment-577021800>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADDH5QLSB6FLM44UHGDY7LQ67OFDANCNFSM4AVH6IZQ>
.
I have to agree. There are other good reasons to do this as well such as
the issues we run into in the replication protocol with socket timeouts
+1 from me
… |
Can you please explain how to change size of the output buffer? :) |
Is there a workaround for avoiding this behavior? Avoid batching? |
We've found that reducing the batch size avoids the behaviour. In addition, we made sure the operating system TCP configuration have larger allocated buffers. e.g. net.core.rmem_max = 134217728 This is really just fiddling around the edges and trying to second guess the behaviour. It isn't a solution. |
I've encountered the same issue in our application aswell.
Using driverversion 42.2.14 private void setBufferSize(Connection con) {
Field queryExecutorField = null;
try {
queryExecutorField = PgConnection.class.getDeclaredField("queryExecutor");
} catch (NoSuchFieldException | SecurityException e) {
e.printStackTrace();
}
queryExecutorField.setAccessible(true);
QueryExecutorImpl pc = null;
try {
pc = (QueryExecutorImpl)queryExecutorField.get(con);
} catch (IllegalArgumentException | IllegalAccessException e) {
e.printStackTrace();
}
Field pgstreamField = null;
try {
pgstreamField = QueryExecutorBase.class.getDeclaredField("pgStream");
} catch (NoSuchFieldException | SecurityException e) {
e.printStackTrace();
}
pgstreamField.setAccessible(true);
PGStream pgs = null;
try {
pgs = (PGStream) pgstreamField.get(pc);
} catch (IllegalArgumentException | IllegalAccessException e) {
e.printStackTrace();
}
Socket s = pgs.getSocket();
try {
System.err.println("PgJDBC send buffer size is: " + s.getSendBufferSize());
s.setSendBufferSize((int)byteSize+5000000);
} catch (SocketException e) {
e.printStackTrace();
} Apparently I'm somehow calculating my byteSize for the buffer wrong. switch(dataType) {
case CHAR:
case NCHAR:
case VARCHAR:
case NVARCHAR:
byteSize += columnValue.getBytes().length;
prepStatement.setString(columnIndex, columnValue);
break;
case TIMESTAMP:
Timestamp timestamp = null;
try {
Long timeStamp = Long.parseLong(columnValue);
timestamp = new Timestamp(timeStamp);
} catch(NumberFormatException e) {
LOG.error("couldn't parse timestamp from csv", e);
}
byteSize += 10;
prepStatement.setTimestamp(columnIndex, timestamp);
break;
case NUMBER:
byteSize += 8;
long number = Long.parseLong(columnValue);
prepStatement.setLong(columnIndex, number);
break;
case BLOB:
case CLOB:
byteSize += columnValue.getBytes().length;
ByteArrayInputStream is = new ByteArrayInputStream(columnValue.getBytes());
prepStatement.setBinaryStream(columnIndex, is);
break;
default:
break;
} Only after I add 5000000 it works as intended. |
PgJDBC can encounter client/server deadlocks during batch execution, where the server is waiting for the client and the client is waiting for the server. Neither can progress and one must be terminated.
The client cannot continue until the server consumes some input from the server's receive buffer (the client's send buffer).
The server cannot continue until the client consumes some input from the client's receive buffer, (the server's send buffer).
Each is blocked trying to send to the other. Neither can receive until the other sends.
PgJDBC tries to prevent this case from arising with some heuristics in its batch facilities where it attempts to limit the number of queries that may be queued; see
org.postgresql.core.v3.QueryExecutorImpl
, and the comments aroundMAX_BUFFERED_QUERIES
. The coarse heuristic of assuming 250 bytes per server reply and a 64kb server send buffer can be defeated by large numbers of asynchronous messages likeNOTIFY
s or non-fatalRAISE
messages. It was introduced in commit c1a939f, with a followup commit 985c047 restricting batching to queries that don't return generated keys.The main reason that deadlocks are rare is that the 64k buffer size is now unrealistically small; on my Linux system, default buffers are 200kb for both send and receive, giving us 400kb of buffer space to work with.
I've produced a very artificial test case showing that a deadlock can still occur; see
TestDeadlock.java
in https://github.com/ringerc/pgjdbc-batchtest . While that test is very heavy handed in producing the deadlock, real world cases can and do arise.The client's stack looks like:
The server's stack looks something like:
The text was updated successfully, but these errors were encountered: