New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statement's execution ( PgStatement.execute() ) may hung in PgStatement.killTimerTask() forever if connectuion pool will try to close statement before it will finish execution #1022

Closed
alextomaili opened this Issue Nov 25, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@alextomaili

alextomaili commented Nov 25, 2017

I've used com.mchange.v2.c3p0.ComboPooledDataSource with enabled "unreturnedConnectionTimeout"
One of my prepared statements have got timeout large than "unreturnedConnectionTimeout"
of connection pool.

I've used myStatement.setQueryTimeout(queryTimeout) and
"queryTimeout" > "unreturnedConnectionTimeout"

As result connection pool has called close() for my prepared statement before query execution was finished and before than cancelTimerTask of PgStatement call cancel() for my prepared statement.
As result cleanupTimer() method which was called from PgStatement.close() has setup cancelTimerTask to null, but timeout filed was been leave as is, greater than zero.

Then, connection to postgres has not been closed, because thread of connection pool which was responsible to that was hung on lock of QueryExecutor when it tries to setup autoCommit flag of connection to default state.

Thread [139] BLOCKED at 2017-11-22T19:04:12.507 - C3P0PooledConnectionPoolManager[identityToken->1hgec119r1rirvs7z15ezq|5243960d]-HelperThread-#1
org.postgresql.core.QueryExecutorBase.getTransactionState(QueryExecutorBase.java:247) 
         <------------------ synchronized on QueryExecutorImpl

org.postgresql.jdbc.PgConnection.commit(PgConnection.java:796)
org.postgresql.jdbc.PgConnection.setAutoCommit(PgConnection.java:757)
com.mchange.v2.c3p0.impl.C3P0ImplUtils.resetTxnState(C3P0ImplUtils.java:252)
com.mchange.v2.c3p0.impl.NewPooledConnection.close(NewPooledConnection.java:629)
com.mchange.v2.c3p0.impl.NewPooledConnection.closeMaybeCheckedOut(NewPooledConnection.java:255)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.destroyResource(C3P0PooledConnectionPool.java:622)
com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask.run(BasicResourcePool.java:1076)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)

thread which has runs my query hold lock of QueryExecutor and hung on socket

Thread [136] RUNNABLE at 2017-11-22T19:04:12.507 - catalina-exec-1
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
org.postgresql.core.PGStream.receiveChar(PGStream.java:280)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1916)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)    
              <------------------ synchronized on QueryExecutorImpl

org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:116)
com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:353)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
net.bull.javamelody.JdbcWrapper.doExecute(JdbcWrapper.java:404)
net.bull.javamelody.JdbcWrapper$StatementInvocationHandler.invoke(JdbcWrapper.java:129)
net.bull.javamelody.JdbcWrapper$DelegatingInvocationHandler.invoke(JdbcWrapper.java:286)
com.sun.proxy.$Proxy73.executeQuery(Unknown Source)
com.tradefxl.reports.sas.dao.TransfersDAO.getTransfers(TransfersDAO.java:73)

This is looks like deadlock by itself.

Of course this is behavior of connection pool. May be it breaks API contract. It mustn't try to call
close() for statements from own thread before call close() for connection to close socket and
unblock thread which are waiting for a data from postgres, i don't know.
Just to be on safe side this is close() method of "connection wrapper" NewPooledConnection which used
to wrap JDBC connection inside connection pool

com.mchange.v2.c3p0.impl.NewPooledConnection.close

    private void close( Throwable cause, boolean forced ) throws SQLException
    {
	assert Thread.holdsLock( this );

        if ( this.invalidatingException == null )
        {
            List closeExceptions = new LinkedList();

            // cleanup ResultSets
            cleanupResultSets( closeExceptions );

            // cleanup uncached Statements
	    // System.err.println(this + ".close( ... ) -- uncachedActiveStatements: " + uncachedActiveStatements);
            cleanupUncachedStatements( closeExceptions );

            // cleanup cached Statements
            try
<<<CANCEL TIMER HERE>>>
            { closeAllCachedStatements(); } 
<<<CANCEL TIMER HERE>>>
            catch ( SQLException e )
            { closeExceptions.add(e); }

	    if ( forced )
		{
		    // reset transaction state
<<<TRY TO SET AUTOCOMMIT HERE>>>
		    try { C3P0ImplUtils.resetTxnState( physicalConnection, forceIgnoreUnresolvedTransactions, autoCommitOnClose, false ); } <-- stuck here
<<<TRY TO SET AUTOCOMMIT HERE>>>
		    catch (Exception e)
			{
			    if (logger.isLoggable( MLevel.FINER ))
				logger.log( MLevel.FINER, 
					    "Failed to reset the transaction state of  " + physicalConnection + "just prior to close(). " +
					    "Only relevant at all if this was a Connection being forced close()ed midtransaction.", 
					    e );
			}
		}

            // cleanup physicalConnection
            try
<<<TRY TO CLOSE CONNECTION HERE>>>
            { physicalConnection.close(); } 
<<<TRY TO CLOSE CONNECTION HERE>>>
            catch ( SQLException e )
            {
                if (logger.isLoggable( MLevel.FINER ))
                    logger.log( MLevel.FINER, "Failed to close physical Connection: " + physicalConnection, e );

                closeExceptions.add(e); 
            }
   

I don't think what this is bug itself, because similar behavior exists in other JDBC drivers, i speak about
synchronized access to socket from different threads. c3p0 connetcion pool expects what service thread can stuck, and has protection against this
maxAdministrativeTaskTime

But really PgStatement is in invalid state now: cancelTimerTask is null and timeout > 0
As result when connection to postgres was been closed from postgres side (this is may be query timeout for postgres session or pg_terminate_backend() from DBA ) thread which is execute query was stuck forever in PgStatement.killTimerTask() into this loop

while (!STATE_UPDATER.compareAndSet(this, StatementCancelState.CANCELLED, StatementCancelState.IDLE)) {
      synchronized (connection) {
        try {
          // Note: wait timeout here is irrelevant since synchronized(connection) would block until
          // .cancel finishes
          connection.wait(10);
        } catch (InterruptedException e) { // NOSONAR
          // Either re-interrupt this method or rethrow the "InterruptedException"
          interrupted = true;
        }
      }

stack trace of thread

"catalina-exec-1" #136 daemon prio=5 os_prio=0 tid=0x00007f1424007000 nid=0x1831 in Object.wait() [0x00007f1418fd6000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at org.postgresql.jdbc.PgStatement.killTimerTask(PgStatement.java:932)
	- locked <0x00000000fee96f10> (a org.postgresql.jdbc.PgConnection)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:433)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:116)
	at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:353)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at net.bull.javamelody.JdbcWrapper.doExecute(JdbcWrapper.java:404)
	at net.bull.javamelody.JdbcWrapper$StatementInvocationHandler.invoke(JdbcWrapper.java:129)
	at net.bull.javamelody.JdbcWrapper$DelegatingInvocationHandler.invoke(JdbcWrapper.java:286)
	at com.sun.proxy.$Proxy73.executeQuery(Unknown Source)
	at com.tradefxl.reports.sas.dao.TransfersDAO.getTransfers(TransfersDAO.java:73)

This has happens because cleanupTimer() method returns false from here

  private boolean cleanupTimer() {
    TimerTask timerTask = CANCEL_TIMER_UPDATER.get(this);
    if (timerTask == null) {
      // If timeout is zero, then timer task did not exist, so we safely report "all clear"
      return timeout == 0;
    }
...

and then this code inside killTimerTask()

    boolean timerTaskIsClear = cleanupTimer();
    // The order is important here: in case we need to wait for the cancel task, the state must be
    // kept StatementCancelState.IN_QUERY, so cancelTask would be able to cancel the query.
    // It is believed that this case is very rare, so "additional cancel and wait below" would not
    // harm it.
    if (timerTaskIsClear && STATE_UPDATER.compareAndSet(this, StatementCancelState.IN_QUERY, StatementCancelState.IDLE)) {
      return;
    }

can't decide that timer already was canceled but statements state is IN QUERY and goes to infinite loop

Looks like that PgStatement.cleanupTimer() must return false only if task is really pending and
current thread must wait for it's finish.

In general, several threads can call cleanupTimer(), thread which runs query, timer's thread and any external thread, for example service thread of connection pool.
PgStatement.close() is public so contract doesn't forbid this.

maybe

  private boolean cleanupTimer() {
    TimerTask timerTask = CANCEL_TIMER_UPDATER.get(this);
    if (timerTask == null) {
      // If timeout is zero, then timer task did not exist, so we safely report "all clear"
      return timeout == 0;
    }

should be replaced by this

  private boolean cleanupTimer() {
    TimerTask timerTask = CANCEL_TIMER_UPDATER.get(this);
    if (timerTask == null) {
      // task was already cleared, nothing to wait   
      return true;
    }

@davecramer

This comment has been minimized.

Member

davecramer commented Nov 25, 2017

Have you tested your solution ?

@vlsi

This comment has been minimized.

Member

vlsi commented Nov 25, 2017

@alextomaili , great analysis, unfortunately it is not flawless.

The below change is not sufficient.

      // task was already cleared, nothing to wait   
      return true;

CANCEL_TIMER_UPDATER==null && timeout != 0 might be caused by the following sequence:
TL;DR:

if (!CANCEL_TIMER_UPDATER.compareAndSet(PgStatement.this, this, null)) {

  1. Timeout is set to non-zero value
  2. Query gets fired
  3. Timeout fires, and
  4. It updates CANCEL_TIMER_UPDATER to null (it does so multiple cancel calls can tell which one should perform PgStatement.this.cancel()

PS. Currently PgStatement.close() is not thread-safe (see #1012 )

@alextomaili

This comment has been minimized.

alextomaili commented Nov 25, 2017

The below change is not sufficient.

Thanks i see.

I think main purpose of waiting loop in PgStatement.killTimerTask() to block
current thread to prevent reuse of current connection to run other query before
"cancel task" of current query will be guaranteed finished. Otherwise "cancel task" may break
other query.
Return "true" from cleanupTimer() as i offered above can break this scenario

But problem of hung is really exist.

@vlsi

This comment has been minimized.

Member

vlsi commented Nov 25, 2017

What if we call cancel() in the stmt.close() implementation?
I think it is expected for the close call to cancel the statement if pending.

@alextomaili

This comment has been minimized.

alextomaili commented Nov 26, 2017

What if we call cancel() in the stmt.close() implementation?
I think it is expected for the close call to cancel the statement if pending.

Of course, this is expected behavior.
It will prevent hung in spinning loop and kill running statement in postgres, this is fine

vlsi added a commit to vlsi/pgjdbc that referenced this issue Nov 26, 2017

fix: prevent statement hang in case close() called when query is in p…
…rogress

Note: Statement#close() is still not thread-safe, however it is much more robust with the fix

fixes pgjdbc#1022

vlsi added a commit to vlsi/pgjdbc that referenced this issue Nov 26, 2017

fix: prevent statement hang in case close() called when query is in p…
…rogress

Note: Statement#close() is still not thread-safe, however it is much more robust with the fix

fixes pgjdbc#1022

vlsi added a commit to vlsi/pgjdbc that referenced this issue Nov 26, 2017

fix: prevent statement hang in case close() called when query is in p…
…rogress

Note: Statement#close() is still not thread-safe, however it is much more robust with the fix

fixes pgjdbc#1022

@vlsi vlsi closed this in #1023 Nov 27, 2017

vlsi added a commit that referenced this issue Nov 27, 2017

fix: prevent statement hang in case close() called when query is in p…
…rogress

Note: Statement#close() is still not thread-safe, however it is much more robust with the fix

fixes #1022

@vlsi vlsi added this to the 42.2.0 milestone Jan 8, 2018

rhavermans added a commit to bolcom/pgjdbc that referenced this issue Jul 13, 2018

fix: prevent statement hang in case close() called when query is in p…
…rogress

Note: Statement#close() is still not thread-safe, however it is much more robust with the fix

fixes pgjdbc#1022

rhavermans added a commit to bolcom/pgjdbc that referenced this issue Jul 13, 2018

fix: prevent statement hang in case close() called when query is in p…
…rogress

Note: Statement#close() is still not thread-safe, however it is much more robust with the fix

fixes pgjdbc#1022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment