New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgJDBC can experience client/server deadlocks during batch execution #194

Open
ringerc opened this Issue Oct 2, 2014 · 10 comments

Comments

Projects
None yet
5 participants
@ringerc
Member

ringerc commented Oct 2, 2014

PgJDBC can encounter client/server deadlocks during batch execution, where the server is waiting for the client and the client is waiting for the server. Neither can progress and one must be terminated.

The client cannot continue until the server consumes some input from the server's receive buffer (the client's send buffer).

The server cannot continue until the client consumes some input from the client's receive buffer, (the server's send buffer).

Each is blocked trying to send to the other. Neither can receive until the other sends.

PgJDBC tries to prevent this case from arising with some heuristics in its batch facilities where it attempts to limit the number of queries that may be queued; see org.postgresql.core.v3.QueryExecutorImpl, and the comments around MAX_BUFFERED_QUERIES. The coarse heuristic of assuming 250 bytes per server reply and a 64kb server send buffer can be defeated by large numbers of asynchronous messages like NOTIFYs or non-fatal RAISE messages. It was introduced in commit c1a939f, with a followup commit 985c047 restricting batching to queries that don't return generated keys.

The main reason that deadlocks are rare is that the 64k buffer size is now unrealistically small; on my Linux system, default buffers are 200kb for both send and receive, giving us 400kb of buffer space to work with.

I've produced a very artificial test case showing that a deadlock can still occur; see TestDeadlock.java in https://github.com/ringerc/pgjdbc-batchtest . While that test is very heavy handed in producing the deadlock, real world cases can and do arise.

The client's stack looks like:

SocketOutputStream.socketWrite(byte[], int, int) line: 113  
SocketOutputStream.write(byte[], int, int) line: 159    
BufferedOutputStream.write(byte[], int, int) line: 122  
BufferedOutputStream(FilterOutputStream).write(byte[]) line: 97 
PGStream.Send(byte[]) line: 224 
SimpleParameterList.writeV3Value(int, PGStream) line: 344   
QueryExecutorImpl.sendBind(SimpleQuery, SimpleParameterList, Portal, boolean) line: 1387    
QueryExecutorImpl.sendOneQuery(SimpleQuery, SimpleParameterList, int, int, int) line: 1622  
QueryExecutorImpl.sendQuery(V3Query, V3ParameterList, int, int, int, QueryExecutorImpl$ErrorTrackingResultHandler) line: 1137   
QueryExecutorImpl.execute(Query[], ParameterList[], ResultHandler, int, int, int) line: 396 
Jdbc4PreparedStatement(AbstractJdbc2Statement).executeBatch() line: 2897

The server's stack looks something like:

#0  0x0000003bc68f5dad in __libc_send (fd=10, buf=0x20eee90, n=8192, flags=-1, flags@entry=0) at ../sysdeps/unix/sysv/linux/x86_64/send.c:27
#1  0x00000000005a6c47 in secure_write (port=<optimized out>, ptr=ptr@entry=0x20eee90, len=len@entry=8192) at be-secure.c:396
#2  0x00000000005b008b in internal_flush () at pqcomm.c:1252
#3  0x00000000005b01dd in internal_putbytes (
    s=0x2b452a1 "abcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefgh"..., s@entry=0x2ae52c0 "SWARNING", len=655444) at pqcomm.c:1198
#4  0x00000000005b11f0 in pq_putmessage (msgtype=78 'N', s=0x2ae52c0 "SWARNING", len=<optimized out>) at pqcomm.c:1395
#5  0x00000000005b1e64 in pq_endmessage (buf=buf@entry=0x7fff12cf9300) at pqformat.c:346
#6  0x0000000000721ff7 in send_message_to_frontend (edata=<optimized out>) at elog.c:2887
#7  EmitErrorReport () at elog.c:1429
#8  0x000000000072190d in errfinish (dummy=<optimized out>) at elog.c:488
#9  0x00007fc526f1f47b in exec_stmt_raise (estate=estate@entry=0x7fff12cf9780, stmt=stmt@entry=0x21eecb0) at pl_exec.c:3027
#10 0x00007fc526f20e55 in exec_stmt (stmt=0x21eecb0, estate=0x7fff12cf9780) at pl_exec.c:1442
#11 exec_stmts (estate=0x7fff12cf9780, stmts=<optimized out>) at pl_exec.c:1345
#12 0x00007fc526f22d1b in exec_stmt_block (estate=estate@entry=0x7fff12cf9780, block=0x21ef758) at pl_exec.c:1283
#13 0x00007fc526f237c3 in plpgsql_exec_trigger (func=func@entry=0x21e9310, trigdata=0x7fff12cf9f30) at pl_exec.c:689
#14 0x00007fc526f18b3a in plpgsql_call_handler (fcinfo=0x7fff12cf9ac0) at pl_handler.c:120
#15 0x00000000005672c7 in ExecCallTriggerFunc (trigdata=trigdata@entry=0x7fff12cf9f30, tgindx=tgindx@entry=0, finfo=finfo@entry=0x21eb240, instr=instr@entry=0x0, 
    per_tuple_context=per_tuple_context@entry=0x21e90f0) at trigger.c:1871
#16 0x000000000056857a in AfterTriggerExecute (trigdesc=0x21eb320, trigdesc=0x21eb320, per_tuple_context=0x21e90f0, instr=0x0, finfo=0x21eb240, rel=0x7fc527133858, 
    event=0x21e7100) at trigger.c:3454
#17 afterTriggerInvokeEvents (events=events@entry=0x21c1420, firing_id=4, estate=estate@entry=0x21eb100, delete_ok=delete_ok@entry=1 '\001') at trigger.c:3633
#18 0x000000000056c8be in AfterTriggerEndQuery (estate=estate@entry=0x21eb100) at trigger.c:3823
#19 0x000000000057f3dc in standard_ExecutorFinish (queryDesc=0x21d0998) at execMain.c:379
#20 0x0000000000658fe8 in ProcessQuery (plan=<optimized out>, sourceText=0x21d08a0 "INSERT INTO deadlock_demo1(id, largetext) VALUES ($1,$2)", params=0x21d0908, 
    dest=0xb1a160 <donothingDR>, completionTag=0x7fff12cfa420 "INSERT 0 1") at pquery.c:225
#21 0x000000000065918c in PortalRunMulti (portal=portal@entry=0x21b1060, isTopLevel=isTopLevel@entry=1 '\001', dest=0xb1a160 <donothingDR>, dest@entry=0x217f1a0, 
    altdest=0xb1a160 <donothingDR>, altdest@entry=0x217f1a0, completionTag=completionTag@entry=0x7fff12cfa420 "INSERT 0 1") at pquery.c:1279
#22 0x0000000000659c7c in PortalRun (portal=0x21b1060, count=1, isTopLevel=<optimized out>, dest=0x217f1a0, altdest=0x217f1a0, completionTag=0x7fff12cfa420 "INSERT 0 1")
    at pquery.c:816
#23 0x00000000006573bc in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1931
#24 0x000000000045d9ad in BackendRun (port=0x20ed950) at postmaster.c:4011
#25 BackendStartup (port=0x20ed950) at postmaster.c:3685
#26 ServerLoop () at postmaster.c:1586
#27 0x00000000006161f3 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x20c72f0) at postmaster.c:1253
#28 0x000000000045e558 in main (argc=5, argv=0x20c72f0) at main.c:226
@ringerc

This comment has been minimized.

Member

ringerc commented Oct 2, 2014

To reduce the likeihood of tripping this bug, PgJDBC doesn't queue batches that return result sets, such as a PreparedStatement that requests generated keys, per issue #195.

One possible option for making deadlocks impossible is covered in brief by issue #163 - using non-blocking sockets with java.nio. However, it's likely to be intrusive.

@ringerc

This comment has been minimized.

Member

ringerc commented Oct 2, 2014

An alternative to completely changing the data exchange mechanism is to instead get PgJDBC to manage its send buffer properly. PgJDBC currently ignores its send buffer and tries to manage the server's buffer. This is backwards.

The only buffer PgJDBC can completely control is its own send buffer. So what we really need to do is avoid blocking on writing to that if we know that there's already a pending query response. (If there's no pending query it's fine to block, the server will continue consuming our input even if there's an error).

Using Non-blocking reads/writes with java.nio streams?

Java doesn't expose any API to query the available space in the TCP send buffer, and there's no portable way to query it from the underlying platform. You need Linux-specific hacks like SIOCOUTQ.

In java.nio (since Java 1.4) there's now the option of creating a non-blocking Channel, then wrapping that up as a stream socket with the Channels class, providing a stream-socket-compatible API. Writes to the output stream on such a socket throw IllegalBlockingModeException instead of blocking. So we can potentially just write(byte[]) whole messages until we get an exception, then consume input from the socket. However, non-blocking reads are a problem with SSL, so we might just block on the read side and get deadlocked there, with the server waiting for us to send more data, and us waiting for the server to send more data.

We could guarantee that it's safe to read from the receive stream by forcing the server to send more data by writing a Sync message. However, we can't guarantee that there's enough space in the send buffer to write that after an IllegalBlockingModeException. Writing the Sync message might just fail too, leaving us unable to be sure if it's safe to read from the input channel.

Even if we solved the SSL issue and got a guaranteed non-blocking input stream too, we'd have to muck around with a control loop that select()s the next readable/writeable socket and pipelines more data. This is complicated by the fact that the output socket might still be writable, just not with the message size we want. So doing this with a non-blocking approach would require a pretty major change to the driver.

Writing up to the send buffer size, then syncing and flushing

Instead, we can just avoid blocking on the socket by never filling the send buffer without ending it in a Sync message. If PgJDBC avoids filling its send buffer when there's one or more queries already pipelined without forcing a sync + flush and consuming input before continuing, it can know for certain that the send buffer is empty and the server's send buffer is empty (or nearly so; there could be asynchronous messages/notifications sent after the Sync). It then knows it's safe to write up to the buffer size again before switching to consuming input.

This is deadlock-proof, but greatly limits the number of big queries that PgJDBC can pipeline in a batch. Currently with an assumed 250 byte reply and 64k buffer PgJDBC assumes it can safely pipeline 64000 / 250 = 256 queries before needing to sync and consume input.

If we instead use the real send buffer size on a typical system, as determined by poking in the driver's guts reflectively, e.g.:

                // We must use AbstractJdbc2Connection directly, as that's the declaring class
                Field pgProtoConnField = AbstractJdbc2Connection.class.getDeclaredField("protoConnection");
                pgProtoConnField.setAccessible(true);

                ProtocolConnection pc = (ProtocolConnection)pgProtoConnField.get(conn); 
                Field pgstreamField = pc.getClass().getDeclaredField("pgStream");
                pgstreamField.setAccessible(true);

                PGStream pgs = (PGStream) pgstreamField.get(pc);
                Socket s = pgs.getSocket();
                System.err.println("PgJDBC send buffer size is: " + s.getSendBufferSize());   

I can see that my default is 1313280 (bytes), i.e. 1.2MB. So that's the most we could queue up in a sub-batch with total safety unless we have non-blocking writes and reads or separate reader/writer threads. If each query sends 100kb of data it's 12 queries, but if each query sends a more reasonable 5kb it's 250 queries - the current limit. For small queries like 500 byte ones, we can queue up 2500, much more than our current limit.

That's still a lot of sanely sized queries, and bigger queries will be less affected by round trip costs anyway. So we should consider moving deadlock prevention logic from attempting to control the server's send buffer to trying to control our own send buffer. That's much safer, and lets us safely batch prepared statements that request generated keys.

@ringerc ringerc self-assigned this Oct 2, 2014

@ringerc ringerc added the enhancement label Oct 3, 2014

ringerc added a commit to ringerc/pgjdbc that referenced this issue Oct 3, 2014

Change MAX_BUFFERED_QUERIES to a byte measure and count buffer bytes
Per GitHub issue pgjdbc#194 and the comments above MAX_BUFFERED_QUERIES,
we're using a pretty rough heuristic for receive buffer management.

This approach can't account for the data in prepared queries that
return generated keys, it assumes a flat 250 bytes per query
response.

Change that, the buffer in bytes, up to an estimated MAX_BUFFERED_RECV_BYTES
(still 64k, same as before) with an estimated NODATA_QUERY_RESPONSE_SIZE_BYTES
or 250 bytes per query.

Behaviour is not changed, we're just counting bytes instead of queries. This
means that it's now possible to adjust the baseline
NODATA_QUERY_RESPONSE_SIZE_BYTES for the size of any returned columns in a
query with a RETURNING clause in a subsequent patch.

ringerc added a commit that referenced this issue Dec 1, 2014

Merge pull request #204 from 2ndQuadrant/batch-returning-support
Add limited support for returning generated columns in batches

See individual commits for notes on the limitations. This isn't a perfect solution to issue #194 and #195, but it'll work well for most people.
@nicolasbarbier

This comment has been minimized.

nicolasbarbier commented Sep 16, 2016

Just FYI:

We had this problem with Maven artifact org.postgresql:postgresql:9.4.1208.jre6. The stacktrace on the client is:

"SqlExecutor-remote" daemon prio=6 tid=0x000000000cbaf000 nid=0x11dc runnable [0x0000000013d0e000] java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000b861f3b8> (a java.io.BufferedOutputStream)
at org.postgresql.core.PGStream.flush(PGStream.java:525)
at org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:1255)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:199)
- locked <0x00000000b861f580> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:424)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:161)
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:155)

The server just shows in pg_stat_activity that it is executing some SQL statement that is part of the batch, with waiting == false. We had it once that that statement was even supposed to return zero rows (because the corresponding table was empty).

We only seem to have the problem when the client runs on the same host as the server (connection using TCP from some IP address to the same IP address, not necessarily 127.0.0.1).

We only ever had this at our clients that use Windows (but it might also occur on other operating systems).

We are trying to find a way to tweak the settings so as to avoid this problem.

@davecramer

This comment has been minimized.

Member

davecramer commented Sep 16, 2016

Thx for the report.

I would suggest upgrading as rewriteInsert in 1209 makes batch inserts much
faster

Dave Cramer

On 16 September 2016 at 04:51, Nicolas Barbier notifications@github.com
wrote:

Just FYI:

We had this problem with Maven artifact org.postgresql:postgresql:9.4.1208.jre6.
The stacktrace on the client is:

"SqlExecutor-remote" daemon prio=6 tid=0x000000000cbaf000 nid=0x11dc runnable [0x0000000013d0e000] java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

  • locked <0x00000000b861f3b8> (a java.io.BufferedOutputStream)
    at org.postgresql.core.PGStream.flush(PGStream.java:525)
    at org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:1255)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:199)
  • locked <0x00000000b861f580> (a org.postgresql.core.v3.QueryExecutorImpl)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:424)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:161)
    at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:155)

The server just shows in pg_stat_activity that it is executing some SQL
statement that is part of the batch, with waiting == false. We had it once
that that statement was even supposed to return zero rows (because the
corresponding table was empty).

We only seem to have the problem when the client runs on the same host as
the server (connection using TCP from some IP address to the same IP
address, not necessarily 127.0.0.1).

We only ever had this at our clients that use Windows (but it might also
occur on other operating systems).

We are trying to find a way to tweak the settings so as to avoid this
problem.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#194 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAYz9p3-MI4_g0sr2ghxdG8MKIZJUWPYks5qqmargaJpZM4Cp_Iz
.

@nicolasbarbier

This comment has been minimized.

nicolasbarbier commented Sep 21, 2016

FTR: Upgrading to version 1210 didn’t fix the problem. After setting sendBufferSize and recvBufferSize to large values (512 kB; although I guess we only really needed to increase sendBufferSize), the problem disappeared.

@davecramer

This comment has been minimized.

Member

davecramer commented Oct 26, 2017

pretty sure this has been resolved

@davecramer davecramer closed this Oct 26, 2017

@vlsi

This comment has been minimized.

Member

vlsi commented Oct 26, 2017

Pretty sure it has not.

The resolution would either use non-blocking API or use a separate thread to pump the data.

@vlsi vlsi reopened this Oct 26, 2017

@ringerc

This comment has been minimized.

Member

ringerc commented Jul 31, 2018

Agreed that it's not resolved.

I looked into using a separate thread, but couldn't find much clarity on how threads interact with JDBC drivers and what the rules are there. How would we reliably ensure our receive-pumping thread was terminated when the connection was GC'd and closed, etc. But I expect we can rely on the shared TCP socket for that.

It's probably not that hard, and likely the sensible solution. Java is already so heavily threaded that nobody's going to get upset if we spawn a thread. Some care will be required to make sure the new thread gets the same classloader as the spawning thread to work properly in containerized environments, but that's well established.

I'm a bit unsure why I dismissed a threaded solution when I looked into this before.

@ringerc

This comment has been minimized.

Member

ringerc commented Jul 31, 2018

Ah, it was related to the work I did here https://stackoverflow.com/q/8514725/398670 .

@sasavilic

This comment has been minimized.

sasavilic commented Oct 3, 2018

@ringerc org.postgresql.Driver is using thread during connection establishing (ConnectThread in Driver::connect). How is that any different from your case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment