Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgJDBC can experience client/server deadlocks during batch execution #194

Open
ringerc opened this issue Oct 2, 2014 · 23 comments
Open

PgJDBC can experience client/server deadlocks during batch execution #194

ringerc opened this issue Oct 2, 2014 · 23 comments
Assignees
Labels
Milestone

Comments

@ringerc
Copy link
Member

@ringerc ringerc commented Oct 2, 2014

PgJDBC can encounter client/server deadlocks during batch execution, where the server is waiting for the client and the client is waiting for the server. Neither can progress and one must be terminated.

The client cannot continue until the server consumes some input from the server's receive buffer (the client's send buffer).

The server cannot continue until the client consumes some input from the client's receive buffer, (the server's send buffer).

Each is blocked trying to send to the other. Neither can receive until the other sends.

PgJDBC tries to prevent this case from arising with some heuristics in its batch facilities where it attempts to limit the number of queries that may be queued; see org.postgresql.core.v3.QueryExecutorImpl, and the comments around MAX_BUFFERED_QUERIES. The coarse heuristic of assuming 250 bytes per server reply and a 64kb server send buffer can be defeated by large numbers of asynchronous messages like NOTIFYs or non-fatal RAISE messages. It was introduced in commit c1a939f, with a followup commit 985c047 restricting batching to queries that don't return generated keys.

The main reason that deadlocks are rare is that the 64k buffer size is now unrealistically small; on my Linux system, default buffers are 200kb for both send and receive, giving us 400kb of buffer space to work with.

I've produced a very artificial test case showing that a deadlock can still occur; see TestDeadlock.java in https://github.com/ringerc/pgjdbc-batchtest . While that test is very heavy handed in producing the deadlock, real world cases can and do arise.

The client's stack looks like:

SocketOutputStream.socketWrite(byte[], int, int) line: 113  
SocketOutputStream.write(byte[], int, int) line: 159    
BufferedOutputStream.write(byte[], int, int) line: 122  
BufferedOutputStream(FilterOutputStream).write(byte[]) line: 97 
PGStream.Send(byte[]) line: 224 
SimpleParameterList.writeV3Value(int, PGStream) line: 344   
QueryExecutorImpl.sendBind(SimpleQuery, SimpleParameterList, Portal, boolean) line: 1387    
QueryExecutorImpl.sendOneQuery(SimpleQuery, SimpleParameterList, int, int, int) line: 1622  
QueryExecutorImpl.sendQuery(V3Query, V3ParameterList, int, int, int, QueryExecutorImpl$ErrorTrackingResultHandler) line: 1137   
QueryExecutorImpl.execute(Query[], ParameterList[], ResultHandler, int, int, int) line: 396 
Jdbc4PreparedStatement(AbstractJdbc2Statement).executeBatch() line: 2897

The server's stack looks something like:

#0  0x0000003bc68f5dad in __libc_send (fd=10, buf=0x20eee90, n=8192, flags=-1, flags@entry=0) at ../sysdeps/unix/sysv/linux/x86_64/send.c:27
#1  0x00000000005a6c47 in secure_write (port=<optimized out>, ptr=ptr@entry=0x20eee90, len=len@entry=8192) at be-secure.c:396
#2  0x00000000005b008b in internal_flush () at pqcomm.c:1252
#3  0x00000000005b01dd in internal_putbytes (
    s=0x2b452a1 "abcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefghabcdefgh"..., s@entry=0x2ae52c0 "SWARNING", len=655444) at pqcomm.c:1198
#4  0x00000000005b11f0 in pq_putmessage (msgtype=78 'N', s=0x2ae52c0 "SWARNING", len=<optimized out>) at pqcomm.c:1395
#5  0x00000000005b1e64 in pq_endmessage (buf=buf@entry=0x7fff12cf9300) at pqformat.c:346
#6  0x0000000000721ff7 in send_message_to_frontend (edata=<optimized out>) at elog.c:2887
#7  EmitErrorReport () at elog.c:1429
#8  0x000000000072190d in errfinish (dummy=<optimized out>) at elog.c:488
#9  0x00007fc526f1f47b in exec_stmt_raise (estate=estate@entry=0x7fff12cf9780, stmt=stmt@entry=0x21eecb0) at pl_exec.c:3027
#10 0x00007fc526f20e55 in exec_stmt (stmt=0x21eecb0, estate=0x7fff12cf9780) at pl_exec.c:1442
#11 exec_stmts (estate=0x7fff12cf9780, stmts=<optimized out>) at pl_exec.c:1345
#12 0x00007fc526f22d1b in exec_stmt_block (estate=estate@entry=0x7fff12cf9780, block=0x21ef758) at pl_exec.c:1283
#13 0x00007fc526f237c3 in plpgsql_exec_trigger (func=func@entry=0x21e9310, trigdata=0x7fff12cf9f30) at pl_exec.c:689
#14 0x00007fc526f18b3a in plpgsql_call_handler (fcinfo=0x7fff12cf9ac0) at pl_handler.c:120
#15 0x00000000005672c7 in ExecCallTriggerFunc (trigdata=trigdata@entry=0x7fff12cf9f30, tgindx=tgindx@entry=0, finfo=finfo@entry=0x21eb240, instr=instr@entry=0x0, 
    per_tuple_context=per_tuple_context@entry=0x21e90f0) at trigger.c:1871
#16 0x000000000056857a in AfterTriggerExecute (trigdesc=0x21eb320, trigdesc=0x21eb320, per_tuple_context=0x21e90f0, instr=0x0, finfo=0x21eb240, rel=0x7fc527133858, 
    event=0x21e7100) at trigger.c:3454
#17 afterTriggerInvokeEvents (events=events@entry=0x21c1420, firing_id=4, estate=estate@entry=0x21eb100, delete_ok=delete_ok@entry=1 '\001') at trigger.c:3633
#18 0x000000000056c8be in AfterTriggerEndQuery (estate=estate@entry=0x21eb100) at trigger.c:3823
#19 0x000000000057f3dc in standard_ExecutorFinish (queryDesc=0x21d0998) at execMain.c:379
#20 0x0000000000658fe8 in ProcessQuery (plan=<optimized out>, sourceText=0x21d08a0 "INSERT INTO deadlock_demo1(id, largetext) VALUES ($1,$2)", params=0x21d0908, 
    dest=0xb1a160 <donothingDR>, completionTag=0x7fff12cfa420 "INSERT 0 1") at pquery.c:225
#21 0x000000000065918c in PortalRunMulti (portal=portal@entry=0x21b1060, isTopLevel=isTopLevel@entry=1 '\001', dest=0xb1a160 <donothingDR>, dest@entry=0x217f1a0, 
    altdest=0xb1a160 <donothingDR>, altdest@entry=0x217f1a0, completionTag=completionTag@entry=0x7fff12cfa420 "INSERT 0 1") at pquery.c:1279
#22 0x0000000000659c7c in PortalRun (portal=0x21b1060, count=1, isTopLevel=<optimized out>, dest=0x217f1a0, altdest=0x217f1a0, completionTag=0x7fff12cfa420 "INSERT 0 1")
    at pquery.c:816
#23 0x00000000006573bc in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1931
#24 0x000000000045d9ad in BackendRun (port=0x20ed950) at postmaster.c:4011
#25 BackendStartup (port=0x20ed950) at postmaster.c:3685
#26 ServerLoop () at postmaster.c:1586
#27 0x00000000006161f3 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x20c72f0) at postmaster.c:1253
#28 0x000000000045e558 in main (argc=5, argv=0x20c72f0) at main.c:226
@ringerc
Copy link
Member Author

@ringerc ringerc commented Oct 2, 2014

To reduce the likeihood of tripping this bug, PgJDBC doesn't queue batches that return result sets, such as a PreparedStatement that requests generated keys, per issue #195.

One possible option for making deadlocks impossible is covered in brief by issue #163 - using non-blocking sockets with java.nio. However, it's likely to be intrusive.

@ringerc
Copy link
Member Author

@ringerc ringerc commented Oct 2, 2014

An alternative to completely changing the data exchange mechanism is to instead get PgJDBC to manage its send buffer properly. PgJDBC currently ignores its send buffer and tries to manage the server's buffer. This is backwards.

The only buffer PgJDBC can completely control is its own send buffer. So what we really need to do is avoid blocking on writing to that if we know that there's already a pending query response. (If there's no pending query it's fine to block, the server will continue consuming our input even if there's an error).

Using Non-blocking reads/writes with java.nio streams?

Java doesn't expose any API to query the available space in the TCP send buffer, and there's no portable way to query it from the underlying platform. You need Linux-specific hacks like SIOCOUTQ.

In java.nio (since Java 1.4) there's now the option of creating a non-blocking Channel, then wrapping that up as a stream socket with the Channels class, providing a stream-socket-compatible API. Writes to the output stream on such a socket throw IllegalBlockingModeException instead of blocking. So we can potentially just write(byte[]) whole messages until we get an exception, then consume input from the socket. However, non-blocking reads are a problem with SSL, so we might just block on the read side and get deadlocked there, with the server waiting for us to send more data, and us waiting for the server to send more data.

We could guarantee that it's safe to read from the receive stream by forcing the server to send more data by writing a Sync message. However, we can't guarantee that there's enough space in the send buffer to write that after an IllegalBlockingModeException. Writing the Sync message might just fail too, leaving us unable to be sure if it's safe to read from the input channel.

Even if we solved the SSL issue and got a guaranteed non-blocking input stream too, we'd have to muck around with a control loop that select()s the next readable/writeable socket and pipelines more data. This is complicated by the fact that the output socket might still be writable, just not with the message size we want. So doing this with a non-blocking approach would require a pretty major change to the driver.

Writing up to the send buffer size, then syncing and flushing

Instead, we can just avoid blocking on the socket by never filling the send buffer without ending it in a Sync message. If PgJDBC avoids filling its send buffer when there's one or more queries already pipelined without forcing a sync + flush and consuming input before continuing, it can know for certain that the send buffer is empty and the server's send buffer is empty (or nearly so; there could be asynchronous messages/notifications sent after the Sync). It then knows it's safe to write up to the buffer size again before switching to consuming input.

This is deadlock-proof, but greatly limits the number of big queries that PgJDBC can pipeline in a batch. Currently with an assumed 250 byte reply and 64k buffer PgJDBC assumes it can safely pipeline 64000 / 250 = 256 queries before needing to sync and consume input.

If we instead use the real send buffer size on a typical system, as determined by poking in the driver's guts reflectively, e.g.:

                // We must use AbstractJdbc2Connection directly, as that's the declaring class
                Field pgProtoConnField = AbstractJdbc2Connection.class.getDeclaredField("protoConnection");
                pgProtoConnField.setAccessible(true);

                ProtocolConnection pc = (ProtocolConnection)pgProtoConnField.get(conn); 
                Field pgstreamField = pc.getClass().getDeclaredField("pgStream");
                pgstreamField.setAccessible(true);

                PGStream pgs = (PGStream) pgstreamField.get(pc);
                Socket s = pgs.getSocket();
                System.err.println("PgJDBC send buffer size is: " + s.getSendBufferSize());   

I can see that my default is 1313280 (bytes), i.e. 1.2MB. So that's the most we could queue up in a sub-batch with total safety unless we have non-blocking writes and reads or separate reader/writer threads. If each query sends 100kb of data it's 12 queries, but if each query sends a more reasonable 5kb it's 250 queries - the current limit. For small queries like 500 byte ones, we can queue up 2500, much more than our current limit.

That's still a lot of sanely sized queries, and bigger queries will be less affected by round trip costs anyway. So we should consider moving deadlock prevention logic from attempting to control the server's send buffer to trying to control our own send buffer. That's much safer, and lets us safely batch prepared statements that request generated keys.

@ringerc ringerc self-assigned this Oct 2, 2014
@ringerc ringerc added the enhancement label Oct 3, 2014
ringerc added a commit to ringerc/pgjdbc that referenced this issue Oct 3, 2014
Per GitHub issue pgjdbc#194 and the comments above MAX_BUFFERED_QUERIES,
we're using a pretty rough heuristic for receive buffer management.

This approach can't account for the data in prepared queries that
return generated keys, it assumes a flat 250 bytes per query
response.

Change that, the buffer in bytes, up to an estimated MAX_BUFFERED_RECV_BYTES
(still 64k, same as before) with an estimated NODATA_QUERY_RESPONSE_SIZE_BYTES
or 250 bytes per query.

Behaviour is not changed, we're just counting bytes instead of queries. This
means that it's now possible to adjust the baseline
NODATA_QUERY_RESPONSE_SIZE_BYTES for the size of any returned columns in a
query with a RETURNING clause in a subsequent patch.
ringerc added a commit that referenced this issue Dec 1, 2014
Add limited support for returning generated columns in batches

See individual commits for notes on the limitations. This isn't a perfect solution to issue #194 and #195, but it'll work well for most people.
@nicolasbarbier
Copy link

@nicolasbarbier nicolasbarbier commented Sep 16, 2016

Just FYI:

We had this problem with Maven artifact org.postgresql:postgresql:9.4.1208.jre6. The stacktrace on the client is:

"SqlExecutor-remote" daemon prio=6 tid=0x000000000cbaf000 nid=0x11dc runnable [0x0000000013d0e000] java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000b861f3b8> (a java.io.BufferedOutputStream)
at org.postgresql.core.PGStream.flush(PGStream.java:525)
at org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:1255)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:199)
- locked <0x00000000b861f580> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:424)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:161)
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:155)

The server just shows in pg_stat_activity that it is executing some SQL statement that is part of the batch, with waiting == false. We had it once that that statement was even supposed to return zero rows (because the corresponding table was empty).

We only seem to have the problem when the client runs on the same host as the server (connection using TCP from some IP address to the same IP address, not necessarily 127.0.0.1).

We only ever had this at our clients that use Windows (but it might also occur on other operating systems).

We are trying to find a way to tweak the settings so as to avoid this problem.

@davecramer
Copy link
Member

@davecramer davecramer commented Sep 16, 2016

Thx for the report.

I would suggest upgrading as rewriteInsert in 1209 makes batch inserts much
faster

Dave Cramer

On 16 September 2016 at 04:51, Nicolas Barbier notifications@github.com
wrote:

Just FYI:

We had this problem with Maven artifact org.postgresql:postgresql:9.4.1208.jre6.
The stacktrace on the client is:

"SqlExecutor-remote" daemon prio=6 tid=0x000000000cbaf000 nid=0x11dc runnable [0x0000000013d0e000] java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

  • locked <0x00000000b861f3b8> (a java.io.BufferedOutputStream)
    at org.postgresql.core.PGStream.flush(PGStream.java:525)
    at org.postgresql.core.v3.QueryExecutorImpl.sendSync(QueryExecutorImpl.java:1255)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:199)
  • locked <0x00000000b861f580> (a org.postgresql.core.v3.QueryExecutorImpl)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:424)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:161)
    at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:155)

The server just shows in pg_stat_activity that it is executing some SQL
statement that is part of the batch, with waiting == false. We had it once
that that statement was even supposed to return zero rows (because the
corresponding table was empty).

We only seem to have the problem when the client runs on the same host as
the server (connection using TCP from some IP address to the same IP
address, not necessarily 127.0.0.1).

We only ever had this at our clients that use Windows (but it might also
occur on other operating systems).

We are trying to find a way to tweak the settings so as to avoid this
problem.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#194 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAYz9p3-MI4_g0sr2ghxdG8MKIZJUWPYks5qqmargaJpZM4Cp_Iz
.

@nicolasbarbier
Copy link

@nicolasbarbier nicolasbarbier commented Sep 21, 2016

FTR: Upgrading to version 1210 didn’t fix the problem. After setting sendBufferSize and recvBufferSize to large values (512 kB; although I guess we only really needed to increase sendBufferSize), the problem disappeared.

@davecramer
Copy link
Member

@davecramer davecramer commented Oct 26, 2017

pretty sure this has been resolved

@davecramer davecramer closed this Oct 26, 2017
@vlsi
Copy link
Member

@vlsi vlsi commented Oct 26, 2017

Pretty sure it has not.

The resolution would either use non-blocking API or use a separate thread to pump the data.

@vlsi vlsi reopened this Oct 26, 2017
@ringerc
Copy link
Member Author

@ringerc ringerc commented Jul 31, 2018

Agreed that it's not resolved.

I looked into using a separate thread, but couldn't find much clarity on how threads interact with JDBC drivers and what the rules are there. How would we reliably ensure our receive-pumping thread was terminated when the connection was GC'd and closed, etc. But I expect we can rely on the shared TCP socket for that.

It's probably not that hard, and likely the sensible solution. Java is already so heavily threaded that nobody's going to get upset if we spawn a thread. Some care will be required to make sure the new thread gets the same classloader as the spawning thread to work properly in containerized environments, but that's well established.

I'm a bit unsure why I dismissed a threaded solution when I looked into this before.

@ringerc
Copy link
Member Author

@ringerc ringerc commented Jul 31, 2018

Ah, it was related to the work I did here https://stackoverflow.com/q/8514725/398670 .

@sasavilic
Copy link

@sasavilic sasavilic commented Oct 3, 2018

@ringerc org.postgresql.Driver is using thread during connection establishing (ConnectThread in Driver::connect). How is that any different from your case?

@turbanoff
Copy link

@turbanoff turbanoff commented Apr 25, 2019

Had this problem too on latest version of driver 42.2.5. Thread stuck with this stack trace

java.lang.Thread.State: RUNNABLE
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	- locked <0x00000006c09dd2d8> (a java.io.BufferedOutputStream)
	at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
	at org.postgresql.core.PGStream.sendInteger4(PGStream.java:226)
	at org.postgresql.core.v3.QueryExecutorImpl.sendBind(QueryExecutorImpl.java:1559)
	at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1805)
	at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1328)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:467)
	- locked <0x00000006c09daf40> (a org.postgresql.core.v3.QueryExecutorImpl)
	at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:840)
	at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1538)
	at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:2437)

Native stack trace:

Thread 79 (Thread 0x7fb146ff1700 (LWP 7547)):
#0  0x00007fb227d7acfb in send () from /lib64/libpthread.so.0
#1  0x00007fb200de919e in NET_Send () from /opt/java/jdk1.8.0_40/jre/lib/amd64/libnet.so
#2  0x00007fb200de8d5a in Java_java_net_SocketOutputStream_socketWrite0 () from /opt/java/jdk1.8.0_40/jre/lib/amd64/libnet.so
#3  0x00007fb21190661a in ?? ()
#4  0x00000006c09db230 in ?? ()
#5  0x00000006c09db180 in ?? ()
#6  0x00000006c09dd2f0 in ?? ()
#7  0x00000006c09dd2d8 in ?? ()
#8  0x0000000000000b8b in ?? ()
#9  0x0000000000000000 in ?? ()

One of TCP connections (used by stuck thread) has big Recv-Q and Send-Q

$ netstat -atpW | grep 3498 | grep postgre
Proto Recv-Q Send-Q Local Address    Foreign Address        State       PID/Program name
[otaprodqa@otaprodqa] [Wed Apr 24, 20:30:59] [2034][~ ]$ netstat -atpW | grep 3498 | grep postgre
Proto Recv-Q Send-Q Local Address                      Foreign Address                     State       PID/Program name
tcp6       0      0 somelonghostname.mycompa.com:39994 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39954 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39992 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39998 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39968 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:38214 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:40002 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:40018 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39964 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39956 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39970 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39986 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39972 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39990 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39978 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:40010 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:40008 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39996 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6  149016  53616 somelonghostname.mycompa.com:40012 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39980 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39982 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java
tcp6       0      0 somelonghostname.mycompa.com:39966 myaqadb.in.somesompany.com:postgres ESTABLISHED 3498/java

Are there any known workarounds? May be setting query timeout can help?

@davecramer
Copy link
Member

@davecramer davecramer commented May 22, 2019

@turbanoff I think the work around is to increase the size of the send buffer as per the comments above

@davecramer davecramer assigned davecramer and unassigned ringerc May 22, 2019
@davecramer davecramer added this to the 42.3.0 milestone May 22, 2019
@sehrope sehrope mentioned this issue Jul 15, 2019
1 of 2 tasks complete
@zistrong
Copy link

@zistrong zistrong commented Jul 25, 2019

"__ejb-thread-pool13" daemon prio=10 tid=0x00007f9f135d0000 nid=0x413e runnable [0x00007f9f0be35000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140) at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109) at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67) at org.postgresql.core.PGStream.receiveChar(PGStream.java:293) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1952) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) - locked <0x00000005b56f9d80> (a org.postgresql.core.v3.QueryExecutorImpl) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:150) at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:127) at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) "http-thread-pool-8080(13)" daemon prio=10 tid=0x00007f9f58028800 nid=0x41ff waiting for monitor entry [0x00007f9f0b7db000] java.lang.Thread.State: BLOCKED (on object monitor) at org.postgresql.core.QueryExecutorBase.getStandardConformingStrings(QueryExecutorBase.java:247) - waiting to lock <0x00000005b56f9d80> (a org.postgresql.core.v3.QueryExecutorImpl) at org.postgresql.core.CachedQueryCreateAction.create(CachedQueryCreateAction.java:40) at org.postgresql.core.CachedQueryCreateAction.create(CachedQueryCreateAction.java:17) at org.postgresql.util.LruCache.borrow(LruCache.java:115) - locked <0x00000005c5de8cd8> (a org.postgresql.util.LruCache) at org.postgresql.core.QueryExecutorBase.borrowQuery(QueryExecutorBase.java:271) at org.postgresql.jdbc.PgConnection.borrowQuery(PgConnection.java:146) at org.postgresql.jdbc.PgPreparedStatement.<init>(PgPreparedStatement.java:85) at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1223) at org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1681)

is it a same issue? the driver version is 42.2.4;

@davecramer
Copy link
Member

@davecramer davecramer commented Jul 25, 2019

@zistrong are you attempting to use the same connection in a multi-threaded app ?
From what I can see it is not the same issue

@oobles
Copy link

@oobles oobles commented Dec 4, 2019

We've observed this using 42.2.8. Is there a way to code around this issue? Will setting a timeout work?

at java.net.SocketOutputStream.socketWrite0(java.base@11.0.5/Native Method) at java.net.SocketOutputStream.socketWrite(java.base@11.0.5/Unknown Source) at java.net.SocketOutputStream.write(java.base@11.0.5/Unknown Source) at sun.security.ssl.SSLSocketOutputRecord.deliver(java.base@11.0.5/SSLSocketOutputRecord.java:319) at sun.security.ssl.SSLSocketImpl$AppOutputStream.write(java.base@11.0.5/SSLSocketImpl.java:983) at java.io.BufferedOutputStream.flushBuffer(java.base@11.0.5/BufferedOutputStream.java:81) at java.io.BufferedOutputStream.write(java.base@11.0.5/BufferedOutputStream.java:127) at java.io.FilterOutputStream.write(java.base@11.0.5/FilterOutputStream.java:108) at org.postgresql.core.PGStream.sendInteger4(org.postgresql.jdbc@42.2.8/PGStream.java:255) at org.postgresql.core.v3.QueryExecutorImpl.sendBind(org.postgresql.jdbc@42.2.8/QueryExecutorImpl.java:1611) at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(org.postgresql.jdbc@42.2.8/QueryExecutorImpl.java:1853) at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(org.postgresql.jdbc@42.2.8/QueryExecutorImpl.java:1378) at org.postgresql.core.v3.QueryExecutorImpl.execute(org.postgresql.jdbc@42.2.8/QueryExecutorImpl.java:496) at org.postgresql.jdbc.PgStatement.executeBatch(org.postgresql.jdbc@42.2.8/PgStatement.java:853) at org.postgresql.jdbc.PgPreparedStatement.executeBatch(org.postgresql.jdbc@42.2.8/PgPreparedStatement.java:1546) at com.zaxxer.hikari.pool.ProxyStatement.executeBatch(com.zaxxer.hikari@3.3.1/ProxyStatement.java:128) at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(com.zaxxer.hikari@3.3.1/HikariProxyPreparedStatement.java)

@davecramer
Copy link
Member

@davecramer davecramer commented Dec 4, 2019

Changing the size of the output buffer has shown to help, but currently changing timeouts won't help.

@ringerc
Copy link
Member Author

@ringerc ringerc commented Jan 22, 2020

I increasingly think I need to find the time to make PgJDBC use separate threads for tx and rx so the method-caller's thread can retain control. PgJDBC can then wait on multiple events in possible deadlock scenarios, i.e. "when tx buffer is full, sleep until tx buffer writeable OR rx buffer readable".

Shouldn't even be that hard.

@ringerc
Copy link
Member Author

@ringerc ringerc commented Jan 22, 2020

Actually, we can probably break the deadlock by using Socket.setSoTimeout(int timeout). A whole lot of blocking calls would need wrapping, so it might make sense to start by doing this only in places where a deadlock is known to be possible. On send timeout try to consume input from the socket then resume the send attempt.

I still wonder if it's better to just switch to NIO and SocketChannel. It's probably not as hard as I thought above. We might need to wrap the rx channel in a buffer so we can push unconsumed input back onto it, but that's not too bad.

@davecramer
Copy link
Member

@davecramer davecramer commented Jan 22, 2020

@turbanoff
Copy link

@turbanoff turbanoff commented Apr 1, 2020

Changing the size of the output buffer has shown to help, but currently changing timeouts won't help.

Can you please explain how to change size of the output buffer? :)

@sposerina
Copy link

@sposerina sposerina commented Jul 20, 2020

Is there a workaround for avoiding this behavior? Avoid batching?

@oobles
Copy link

@oobles oobles commented Jul 21, 2020

We've found that reducing the batch size avoids the behaviour. In addition, we made sure the operating system TCP configuration have larger allocated buffers. e.g.

net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 134217728
net.core.wmem_default = 134217728
net.ipv4.tcp_mem = 18569037 24758719 37138074
net.ipv4.tcp_rmem= 10240 87380 134217728
net.ipv4.tcp_wmem= 10240 87380 134217728

This is really just fiddling around the edges and trying to second guess the behaviour. It isn't a solution.

@hitsuyagaa
Copy link

@hitsuyagaa hitsuyagaa commented Aug 4, 2020

I've encountered the same issue in our application aswell.
Using batchSize of 5000 always reliably reproduces the issue on a table with 54 columns.
Setting the batchSize to 2000 solves the problem but I suppose only until an even bigger data set comes around with more columns and longer textfields. (Insert batching)
I don't see how to change the output buffer size. Could you elaborate?

    owns: BufferedOutputStream  (id=60)	
    owns: QueryExecutorImpl  (id=61)	
    SocketOutputStream.socketWrite0(FileDescriptor, byte[], int, int) line: not available [native method] [local variables unavailable]	
    SocketOutputStream.socketWrite(byte[], int, int) line: 111	
    SocketOutputStream.write(byte[], int, int) line: 155	
    BufferedOutputStream.flushBuffer() line: 82	
    BufferedOutputStream.write(byte[], int, int) line: 126	
    BufferedOutputStream(FilterOutputStream).write(byte[]) line: 97	
    PGStream.sendInteger4(int) line: 264	
    QueryExecutorImpl.sendParse(SimpleQuery, SimpleParameterList, boolean) line: 1537	
    QueryExecutorImpl.sendOneQuery(SimpleQuery, SimpleParameterList, int, int, int) line: 1858	
    QueryExecutorImpl.sendQuery(Query, V3ParameterList, int, int, int, ResultHandler, BatchResultHandler) line: 1421	
    QueryExecutorImpl.execute(Query[], ParameterList[], BatchResultHandler, int, int, int) line: 496	
    PgPreparedStatement(PgStatement).internalExecuteBatch() line: 851	
    PgPreparedStatement(PgStatement).executeBatch() line: 874	
    PgPreparedStatement.executeBatch() line: 1563	
    Postgres(Database).executeBatchedInsert(Connection, PreparedStatement, String[], int, CsvRow[], String) line: 99	
    DatabaseHandler.prepareAndExecuteBatchedInsert(Connection, List<CsvRow>, int, String[], String[], String) line: 305	
    ImportManager.importTable(String[], String[], String) line: 238	
    ImportManager.importTable(String) line: 126	
    Tester.main(String[]) line: 27

Using driverversion 42.2.14

private void setBufferSize(Connection con) {
        Field queryExecutorField = null;
        try {
            queryExecutorField = PgConnection.class.getDeclaredField("queryExecutor");
        } catch (NoSuchFieldException | SecurityException e) { 
            e.printStackTrace();
        }
        queryExecutorField.setAccessible(true);

        QueryExecutorImpl pc = null;
        try {
            pc = (QueryExecutorImpl)queryExecutorField.get(con);
        } catch (IllegalArgumentException | IllegalAccessException e) {
            e.printStackTrace();
        } 
        Field pgstreamField = null;
        try {
            pgstreamField = QueryExecutorBase.class.getDeclaredField("pgStream");
        } catch (NoSuchFieldException | SecurityException e) {
            e.printStackTrace();
        }
        pgstreamField.setAccessible(true);

        PGStream pgs = null;
        try {
            pgs = (PGStream) pgstreamField.get(pc);
        } catch (IllegalArgumentException | IllegalAccessException e) {
            e.printStackTrace();
        }
        Socket s = pgs.getSocket();
        try {
            System.err.println("PgJDBC send buffer size is: " + s.getSendBufferSize());
            s.setSendBufferSize((int)byteSize+5000000);
        } catch (SocketException e) {
            e.printStackTrace();
        }

Apparently I'm somehow calculating my byteSize for the buffer wrong.

switch(dataType) {
                case CHAR:
                case NCHAR:
                case VARCHAR:
                case NVARCHAR: 
                    byteSize += columnValue.getBytes().length;
                    prepStatement.setString(columnIndex, columnValue);
                    break;
                case TIMESTAMP:
                    Timestamp timestamp = null;
                    try {
                          Long timeStamp = Long.parseLong(columnValue);
                          timestamp = new Timestamp(timeStamp);
                    } catch(NumberFormatException e) { 
                        LOG.error("couldn't parse timestamp from csv", e);
                    }
                    byteSize += 10;
                    prepStatement.setTimestamp(columnIndex, timestamp);
                    break;
                case NUMBER:
                    byteSize += 8;
                    long number = Long.parseLong(columnValue);
                    prepStatement.setLong(columnIndex, number);
                    break;
                case BLOB:
                case CLOB:
                    byteSize += columnValue.getBytes().length;
                    ByteArrayInputStream is = new ByteArrayInputStream(columnValue.getBytes());
                    prepStatement.setBinaryStream(columnIndex, is);
                    break;
                default:
                    break;
            }

Only after I add 5000000 it works as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants