New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microsoft SQL Server Resource Adaptor(JDBC Driver) returning XAException with error code XAER_RMERR in case of fail-over of SQL Server AG from primary node to standby node while committing an XA Transaction #1432
Comments
hi @pa1484 I could find anything in the XA specification that says XAER_RMERR is non-retriable? Similarly for javax.transaction.xa.XAException: Seems both these are valid and it's up to the application to interpret what actions to take for each return code. It's great that you could patch your application to handle the XAER_RMERR code since changing the return code in the driver would be a breaking change for those applications which may be relying on the current XAER_RMERR return value. |
Hi, In the reference specification page 34, the semantics for the different responses to xa_commit are listed. [XAER_RMERR] Note that in this case the RM has rolled back the transaction after concluding that it can never be committed. From the transaction manager perspective it is not retriable.7 Thanks, |
oh I see. I was going by the line numbers from your log above which is xa_forget in current version. You're right in previous version that was the line of xa_commit. Yes from the description in the specification it does look like RMFAIL would be a more appropriate return code in this scenario. |
Hi @pa1484 I just checked actually the error code did not originate from the driver. Distributed transactions is handled by the SQL Server which uses the Microsoft Distributed Transaction Coordinator (MDTC). The driver uses the msqlsrvxa1 library interface to communicate to the DTC and merely returns the result from the call. I could not find anything in their documentation what error should be returned in the case of an AG failover scenario but in this case the server returned RMERR which the driver just passed along. The driver does not do any interpretation of the result. |
So, how do we get MDTC to fix this bug? Do we need to raise a new issue in another repo? |
I'm not sure if this is a bug? As per your msg above the spec says it should return RMERR if "An error occurred in committing the work performed on behalf of the transaction branch and the branch’s work has been rolled back. Note that returning this error signals a catastrophic event to a transaction manager since other resource managers may successfully commit their work on behalf of this branch. This error should be returned only when a resource manager concludes that it can never commit the branch and that it cannot hold the branch’s resources in a prepared state. Otherwise, [XA_RETRY] should be returned." Or an RMFAIL if "An error occurred that make the resource manager unavailable." Seems up to interpretation what kind of an error a failover is looks MDTC chose the former. |
Hi, I tried assuming XAER_RMERR as a XAER_RMFAIL which is retriable so, WebSphere retried committing the transaction branch and the resource manager successfully committed the transaction. The spec says, an XAER_RMERR error code should be returned only when a resource manager concludes that it can never commit the transaction branch. But in this case upon retry resource manager could be able to commit the transaction branch. So, I believe this is a bug in either SQL server JDBC driver or MDTC which is returning wrong error code to transaction manager. Please let us know if we need raise a new issue in another repo. |
hi @pa1484 you will have to go through MS support, who will file an RFC with the appropriate team. |
Let us know if anyone still needs help from the JDBC team, otherwise I will close this issue in due time. |
Does anyone know if this was fixed? |
This is not reproducible on our end. You will have to go through MS support as was recommended above, to get the most up-to-date information. |
Looking at the JDBC code, this looks like it is coming from the driver due to the connection being dropped and the socket write failing. The error is not from the server. The error is because the socket write failed when we tried to call the XA function on the server. We should be able to detect this and return RMFAIL instead of RMERR when we can't even make the call as this is different from an actual error sent from the server. |
Thanks for re-examining this, we have a mutual customer that is trying to get AlwaysOn setup and this issue is holding up their progress. If we can do anything to help, let us know |
Driver version
SQL Server JDBC Driver 7.0 & 8.2
SQL Server version
SQL Server AG 2017
Client Operating System
Windows Server 2016, version 10.0
JAVA/JVM version
IBM J9 VM
Table schema
Problem description
During a fail-over test for one of the product when using SQL server AG as back end, we found that there are few XA transactions in SQL Server DTC are in in-doubt state.
After analyzing the trace files, we found that the SQL Server JDBC driver was returning an XAException with an error code of XAER_RMERR on the XAResource.commit due to SQL Server AG fail-over. Websphere application server treats XAER_RMERR as a non-retriable condition as per the XA specification.
We tested with a diagnostic patch on Websphere application server to treat XAER_RMERR as a XAER_RMFAIL(which is retriable) and recreated the scenario.Due to the diagnostic patch Websphere retries the commits which are eventually successful and we no longer finding in-doubt transactions in SQL server DTC.
SQL Server Resource Adaptor(JDBC Driver) should return an XAException with a retriable error code such as XAER_RMFAIL in case fail-over of SQL Server AG from primary node to standby node while committing an XA Transaction.
SQL Server Resource Adaptor(JDBC Driver) returning XAException with error code XAER_RMERR in case fail-over of SQL Server AG from primary node to standby node while committing an XA Transaction
Below is the trace that shows XAException with XAER_RMERR error code returned by SQLServer JDBC driver during fail-over.
[7/30/20 17:20:37:254 CST] 00000369 WSRdbXaResour > commit Entry
com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl@5e889f7e
WSRdbManagedConnectionImpl@3a76217e
com.ibm.ws.tx.jta.XidImpl@e2696786
{XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54),
data(000001739f05fd25000000017aa729486c90e9918373568c6d645a1d91c7913943fe1a86000001739f05fd25000000017aa729486c90e9918373568c6d645a1d91c7913943fe1a86000000010000000000000000000000000001)}
Global Transaction ID: 00 00 01 73 9f 05 fd 25 00 00 00 01 7a a7 29 48 6c 90 e9 91 83 73 56 8c 6d 64 5a 1d 91 c7 91 39 43 fe 1a 86
Branch Qualifier: 00 00 01 73 9f 05 fd 25 00 00 00 01 7a a7 29 48 6c 90 e9 91 83 73 56 8c 6d 64 5a 1d 91 c7 91 39 43 fe 1a 86 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 01
Format ID: 1463898948
TWO PHASE
[7/30/20 17:20:37:286 CST] 00000369 WSRdbXaResour E DSRA0302E: XAException occurred. Error code is: XAER_RMERR (-3). Exception is: com.microsoft.sqlserver.jdbc.SQLServerException: SQL Server did not return a response. The connection has been closed.
[7/30/20 17:20:37:286 CST] 00000369 WSRdbXaResour 3 XAException was not a connection error for XAResource com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl@5e889f7e
[7/30/20 17:20:37:286 CST] 00000369 WSRdbXaResour < commit Exit
javax.transaction.xa.XAException: com.microsoft.sqlserver.jdbc.SQLServerException: SQL Server did not return a response. The connection has been closed.
at com.microsoft.sqlserver.jdbc.SQLServerXAResource.DTC_XA_Interface(SQLServerXAResource.java:738)
at com.microsoft.sqlserver.jdbc.SQLServerXAResource.commit(SQLServerXAResource.java:807)
at com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl.commit(WSRdbXaResourceImpl.java:361)
at com.ibm.ejs.j2c.XATransactionWrapper.commit(XATransactionWrapper.java:496)
at com.ibm.tx.jta.impl.JTAXAResourceImpl.commit(JTAXAResourceImpl.java:308)
at com.ibm.tx.jta.impl.RegisteredResources.deliverOutcome(RegisteredResources.java:1644)
at com.ibm.tx.jta.impl.RegisteredResources.distributeOutcome(RegisteredResources.java:2003)
at com.ibm.tx.jta.impl.RegisteredResources.distributeCommit(RegisteredResources.java:2315)
at com.ibm.tx.jta.impl.TransactionImpl.internalCommit(TransactionImpl.java:1879)
at com.ibm.tx.jta.impl.TransactionImpl.internalCommit(TransactionImpl.java:1855)
at com.ibm.tx.jta.impl.TransactionImpl.coreStage2CommitProcessing(TransactionImpl.java:1135)
at com.ibm.tx.jta.impl.TransactionImpl.stage2CommitProcessing(TransactionImpl.java:1173)
at com.ibm.tx.jta.impl.TransactionImpl.processCommit(TransactionImpl.java:1034)
at com.ibm.tx.jta.impl.TransactionImpl.commit(TransactionImpl.java:964)
at com.ibm.ws.tx.jta.TranManagerImpl.commit(TranManagerImpl.java:439)
at com.ibm.tx.jta.impl.TranManagerSet.commit(TranManagerSet.java:191)
at com.ibm.ws.uow.UOWManagerImpl.uowCommit(UOWManagerImpl.java:807)
.....
.....
.....
Below is the trace where XAER_RMERR error code is converted to XAER_RMFAIL error code, later retried and committed successfully on a different thread.
[8/3/20 13:49:45:363 CST] 0000026e WSRdbXaResour > commit Entry
com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl@1909f879
WSRdbManagedConnectionImpl@85bb6fef
com.ibm.ws.tx.jta.XidImpl@aad4067c
{XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54),
data(00000173b2de55fe000000014b323c4c6b4650ebd3cd42e87a42c9c60b9b3d3d64a9b46a00000173b2de55fe000000014b323c4c6b4650ebd3cd42e87a42c9c60b9b3d3d64a9b46a000000010000000000000000000000000001)}
Global Transaction ID: 00 00 01 73 b2 de 55 fe 00 00 00 01 4b 32 3c 4c 6b 46 50 eb d3 cd 42 e8 7a 42 c9 c6 0b 9b 3d 3d 64 a9 b4 6a
Branch Qualifier: 00 00 01 73 b2 de 55 fe 00 00 00 01 4b 32 3c 4c 6b 46 50 eb d3 cd 42 e8 7a 42 c9 c6 0b 9b 3d 3d 64 a9 b4 6a 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 01
Format ID: 1463898948
TWO PHASE
[8/3/20 13:49:45:363 CST] 0000026e WSRdbXaResour W Converting XAER_RMERR to XAER_RMFAIL
[8/3/20 13:49:45:379 CST] 0000026e FfdcProvider W com.ibm.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I: FFDC Incident emitted on C:\IBM\BAW\profiles\Node1Profile\logs\ffdc\AppClusterMember1_ad159ec9_20.08.03_13.49.45.3635376806510343302270.txt com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl.commit 126
[8/3/20 13:49:45:379 CST] 0000026e WSRdbXaResour E DSRA0302E: XAException occurred. Error code is: XAER_RMFAIL (-7). Exception is: com.microsoft.sqlserver.jdbc.SQLServerException: SQL Server did not return a response. The connection has been closed.
.....
.....
[8/3/20 13:49:45:520 CST] 0000026e WSRdbXaResour < commit Exit
javax.transaction.xa.XAException
at com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl.commit(WSRdbXaResourceImpl.java:386)
at com.ibm.ejs.j2c.XATransactionWrapper.commit(XATransactionWrapper.java:496)
at com.ibm.tx.jta.impl.JTAXAResourceImpl.commit(JTAXAResourceImpl.java:308)
at com.ibm.tx.jta.impl.RegisteredResources.deliverOutcome(RegisteredResources.java:1644)
at com.ibm.tx.jta.impl.RegisteredResources.distributeOutcome(RegisteredResources.java:2003)
at com.ibm.tx.jta.impl.RegisteredResources.distributeCommit(RegisteredResources.java:2315)
at com.ibm.tx.jta.impl.TransactionImpl.internalCommit(TransactionImpl.java:1879)
at com.ibm.tx.jta.impl.TransactionImpl.internalCommit(TransactionImpl.java:1855)
at com.ibm.tx.jta.impl.TransactionImpl.coreStage2CommitProcessing(TransactionImpl.java:1135)
at com.ibm.tx.jta.impl.TransactionImpl.stage2CommitProcessing(TransactionImpl.java:1173)
at com.ibm.tx.jta.impl.TransactionImpl.processCommit(TransactionImpl.java:1034)
at com.ibm.tx.jta.impl.TransactionImpl.commit(TransactionImpl.java:964)
at com.ibm.ws.tx.jta.TranManagerImpl.commit(TranManagerImpl.java:439)
at com.ibm.tx.jta.impl.TranManagerSet.commit(TranManagerSet.java:191)
at com.ibm.ws.uow.UOWManagerImpl.uowCommit(UOWManagerImpl.java:807)
....
....
....
[8/3/20 13:50:55:489 CST] 00000088 WSRdbXaResour > commit Entry
com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl@e862f1d8
WSRdbManagedConnectionImpl@7ab88f11
com.ibm.ws.tx.jta.XidImpl@aad4067c
{XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54),
data(00000173b2de55fe000000014b323c4c6b4650ebd3cd42e87a42c9c60b9b3d3d64a9b46a00000173b2de55fe000000014b323c4c6b4650ebd3cd42e87a42c9c60b9b3d3d64a9b46a000000010000000000000000000000000001)}
Global Transaction ID: 00 00 01 73 b2 de 55 fe 00 00 00 01 4b 32 3c 4c 6b 46 50 eb d3 cd 42 e8 7a 42 c9 c6 0b 9b 3d 3d 64 a9 b4 6a
Branch Qualifier: 00 00 01 73 b2 de 55 fe 00 00 00 01 4b 32 3c 4c 6b 46 50 eb d3 cd 42 e8 7a 42 c9 c6 0b 9b 3d 3d 64 a9 b4 6a 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 01
Format ID: 1463898948
TWO PHASE
[8/3/20 13:50:55:489 CST] 00000088 WSRdbXaResour 3 XAResource.start was never issued; allowing commit for recovery.
[8/3/20 13:50:55:505 CST] 00000088 WSRdbXaResour < commit Exit
AG nodes are configured 'sync-commit'
in-doubt xact resolution set to 0
JDBC trace logs
Reproduction code
The text was updated successfully, but these errors were encountered: