Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneously inserted rows with statement pooling enabled and when a Connection gets re-used from the ConnectionPool #2356

Closed
PriyadharshiniP opened this issue Mar 20, 2024 · 10 comments · Fixed by #2361
Labels
Bug A bug in the driver. A high priority item that one can expect to be addressed quickly.
Milestone

Comments

@PriyadharshiniP
Copy link
Contributor

PriyadharshiniP commented Mar 20, 2024

Driver version

com.microsoft.sqlserver mssql-jdbc 12.6.1.jre11

SQL Server version

Microsoft SQL Server 2019 (RTM-CU18) (KB5017593) - 15.0.4261.1 (X64) 
	Sep 12 2022 15:07:06 
	Copyright (C) 2019 Microsoft Corporation
	Developer Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 19045: )

Client Operating System

Windows 10 Pro

JAVA/JVM version

Java 17

Problem description

There are some duplicate inserts and some missing entries when a custom connection pool is being utilized and server side statement pooling is enabled (using the setDisableStatementPooling flag). When the connection is re-used from the connection pool and the flag setDisableStatementPooling is set to false this issue seems to happen.

To test this instance, I was able to simulate a very basic stand alone as below :
Please see the following test program, where I intentionally return the previously created connection object from the _userToConnectionsMap cache map for subsequent runs.

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Collection;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.locks.ReentrantLock;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.sql.PooledConnection;

public class SQLConnectionPool {

	 final private ReentrantLock _connLock = new ReentrantLock();
	 final private Map _userToConnectionsMap = new ConcurrentHashMap();
	 static String tableName = "batchTest";
	/**
	 * Crate a new connection
	 * 
	 * @return the new created connection
	 * @throws SQLException When fail to create a new connection.
	 */
	private Connection createNewConnection(final String username, final String password) throws SQLException {

		com.microsoft.sqlserver.jdbc.SQLServerConnectionPoolDataSource msDs = new com.microsoft.sqlserver.jdbc.SQLServerConnectionPoolDataSource();
		msDs.setUser(username);
		msDs.setApplicationName("SQL Server Duplicate Row");
		msDs.setPassword(password);
		msDs.setServerName("<>");
		msDs.setPortNumber(0);
		msDs.setDatabaseName("<>");
		msDs.setInstanceName("<>");
		msDs.setEncrypt("false");
		msDs.setDisableStatementPooling(false);
		msDs.setStatementPoolingCacheSize(20);
		msDs.setServerPreparedStatementDiscardThreshold(20);
		msDs.setSendTemporalDataTypesAsStringForBulkCopy(false);
		// ... create the ACTUAL new connection...
		PooledConnection actual = msDs.getPooledConnection(username, password);
		// ... cache it...
		Collection conns = (Collection) _userToConnectionsMap.get(username);
		if (conns == null || conns.isEmpty()) {
			conns = new CopyOnWriteArrayList();
			_userToConnectionsMap.put(username, conns);
		}
		
		conns.add(actual);
		return actual.getConnection();
	}

	final public Connection getConnection(final String username, final String password)
			throws SQLException, InterruptedException {

		if (username == null) {
			throw new NullPointerException("The username cannot be null.");
		}
		if (password == null) {
			throw new NullPointerException("The password cannot be null.");
		}

		// Acquire a lock...
		_connLock.lock();
		// Declare the connection to be returned...
		Connection conn = null;
		try {

			conn = getAvailableConnection(username);
			// If none found...
			if (conn == null) {
				conn = createNewConnection(username, password);
			}
//	           
		} finally {
			// Ensure that the lock is released...
			_connLock.unlock();
		}
		// Return the connection...
		return conn;
	}

	private Connection getAvailableConnection(final String username) throws SQLException {
		// Get the connections for the specified user...
		Collection conns = (Collection) _userToConnectionsMap.get(username);
		// If none found, then NO connections are available...
		if (conns == null || conns.isEmpty()) {
			return null;
		}
		// ... otherwise, check to see if any of the connections are available...
		boolean isAvailable = false;
		final Iterator iterator = conns.iterator();
		PooledConnection pconn = null;
		while (iterator.hasNext()) {			
			pconn = (PooledConnection) iterator.next();

			return pconn.getConnection();
			// Is this connection available? ignore this check for this sample test
			
		}
		return null;
	}

	public static void main(String[] args) throws SQLException, ClassNotFoundException, InterruptedException {

		Logger logger = Logger.getLogger("com.microsoft.sqlserver.jdbc.Statement");
		logger.setLevel(Level.FINER);
		Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");

		SQLConnectionPool pool = new SQLConnectionPool();

		try {
		
			Connection conn = pool.getConnection("<>", "<>");
			try (Statement stmt = conn.createStatement();) {

				String dropSql = "if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[" + tableName
						+ "]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) DROP TABLE [" + tableName + "]";
				stmt.execute(dropSql);

				String createSql = "create table " + tableName + " (c1 int, c2 varchar(20))";
				stmt.execute(createSql);
			}
			for (int j = 0; j < 5; j++) {

				conn = pool.getConnection("<>", "<>");
				conn.setAutoCommit(false);
				try (PreparedStatement pstmt = conn.prepareStatement("insert into " + tableName + " values (?, ?)");) {
					for (int i = 0; i < 3; i++) {
						pstmt.setInt(1, i);
						pstmt.setString(2, "test" + i);
						pstmt.addBatch();
					}

					long start = System.currentTimeMillis();
					pstmt.executeBatch();
					conn.commit();
					long end = System.currentTimeMillis();

					System.out.println("Finished. Time taken : " + (end - start) + " milliseconds.");
				}
				conn.close();
			}
		} finally {
			// nothing for this test
		}

	}
}

This program results in incorrect entries being inserted to the test table with last item of the batch being repeated and the first item of the batch missed out.

Expected behavior

Caching and connection pool shouldn't lead to incorrect insert to the table. Should see no duplicates and no missing rows.

Actual behavior

At the end of the stand alone program run, could see there are (2*j) - 1 = 9 rows instead of 5 rows for column c1 : 2 and (j-1) = 4 entries missing for column c1 : 0

image

Error message/stack trace

With FINER logging enabled for SQL driver , could see the following exception : SQLException: com.microsoft.sqlserver.jdbc.SQLServerException: Could not find prepared statement with handle

image

Any other details that can be helpful

This issue does not happen when I comment the call to sp_reset_connection in SQLServerPooledConnection.java
image

JDBC trace logs

SQL Profiler data

image

@PriyadharshiniP PriyadharshiniP changed the title Duplicate row seems to be inserted with prepared statement pooling enabled and when a Connection gets re-used from the ConnectionPool Erroneously inserted rows with prepared statement pooling enabled and when a Connection gets re-used from the ConnectionPool Mar 20, 2024
@PriyadharshiniP PriyadharshiniP changed the title Erroneously inserted rows with prepared statement pooling enabled and when a Connection gets re-used from the ConnectionPool Erroneously inserted rows with statement pooling enabled and when a Connection gets re-used from the ConnectionPool Mar 20, 2024
@lilgreenbird
Copy link
Contributor

hi @PriyadharshiniP

thanks for the repro info we'll do some investigation and get back to you.

@lilgreenbird lilgreenbird added this to the 12.7.0 milestone Mar 21, 2024
@lilgreenbird lilgreenbird added the Bug A bug in the driver. A high priority item that one can expect to be addressed quickly. label Mar 21, 2024
@lilgreenbird
Copy link
Contributor

hi @PriyadharshiniP

thanks for the repro we can see the issue will be implementing a fix for this in the upcoming preview.

@PriyadharshiniP
Copy link
Contributor Author

Thanks for looking into this.

  1. Additionally, it appears that the following modifications to the SQLServerPreparedStatement : by resetting the batchParam value to fetch the metadata of first row(0) instead of the last row seems to prevent duplicates from happening in this case.
    But not entirely sure of the consequences.
    Please have a look as it may help you fix this issue :
    image

  2. I'm also not clear why, when we re-use the same connection from the pool, a Could not find prepared statement with handle exception is thrown. I would really appreciate any clarifications to assist me better comprehend statement pooling.

@lilgreenbird
Copy link
Contributor

I believe this is happening cos the cache is messed up. If I clear the preparedStatementHandleCache in resetPooledConnection() then this works. I'm just doing some testing now.

@PriyadharshiniP
Copy link
Contributor Author

PriyadharshiniP commented Mar 22, 2024

I appreciate you committing the fix.
It appears to prevent duplicates from happening by creating a new prepared statement handler for every usage of same connection from the connection pool. But I don't fully understand this fix.
I am aware that this statement pooling is per-connection and not server-wide.
But the prepared statement handler should be retained for re-using the connection having the same user name and password from pool : SQLServerPooledConnection , is it not ?
If statement caching was implemented and a connection pool was in use, it seems to be a waste in both client and server side and no real performance improvements with statement cache?
Please clarify.

Also this duplicates error may happen in any other scenario where the SQLException: com.microsoft.sqlserver.jdbc.SQLServerException: Could not find prepared statement with handle gets thrown due to server side exceptions and in that cases, duplicates is still possible, is it not ?

@David-Engel
Copy link
Collaborator

Statement pooling is per connection, yes, but sp_reset_connection (resetPooledConnection) clears any prepared statements in the session on the server (among many other cleanup tasks). That's why you see the Could not find prepared statement with handle error. Clearing the cache would keep the driver from re-using invalid prepared statement handles. But I'd be worried that the driver is executing some of the batch or consuming/flushing results on the wrong side of the pool use boundary. ExecuteBatch() should be executing and consuming all the results from the server immediately and I wonder if it's leaving some hanging until the next statement execution (which is happening after the connection goes back into and then comes out of the pool again).

@PriyadharshiniP
Copy link
Contributor Author

Thanks for clarifying that the statement pooling is per connection and cannot be shared even when the connection is used , goes back and is re-used from pool.
It appears that the un prepare call is not being initiated explicitly, and presumably, there can still be a performance improvement if statement pooling and pooling are enabled.

With the fix under this issue, it seems to have fixed the corrupt client side prepared statement cache by clearing it when the connection gets re-used from the pool.
But I am still not sure, if there can be some duplicates or entries from a batch mistakenly inserted due to any other scenarios where there is a corrupt prepared statement cache or no server prepared handle with a retry. I cannot replicate such a scenario yet with 8179 and 586 exceptions. However it is in the exception handling, I think it may be possible.
The row in batchParam ought to have also been reset to the index of numBatchesPrepared rather than the last row if the retryBasedOnFailedReuseOfCachedHandle method returns true, right? I'm saying that there may still be instances where this exception stack trace occurs, and that the batch should be correctly retried.
Please advise.
image

@lilgreenbird
Copy link
Contributor

yes we agree there may be other issues there we will take a closer look into this when we can, thanks

@PriyadharshiniP
Copy link
Contributor Author

Hi @lilgreenbird
Thanks for fixing. May I know when a stable release (12.7.0) for this be available ?

@Jeffery-Wasty
Copy link
Contributor

The 12.7 preview release is available. The next stable release to include this will either be the 12.6.2 hotfix and/or the 12.8 GA release. We have not finalized a date for the hotfix, but the GA can be expected at the end of July (July 31).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A bug in the driver. A high priority item that one can expect to be addressed quickly.
Projects
Status: Closed Issues
Development

Successfully merging a pull request may close this issue.

4 participants