Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong transaction state for broken database connection #40489

Open
timonzi opened this issue May 7, 2024 · 4 comments
Open

Wrong transaction state for broken database connection #40489

timonzi opened this issue May 7, 2024 · 4 comments
Labels
area/hibernate-orm Hibernate ORM area/narayana Transactions / Narayana kind/bug Something isn't working

Comments

@timonzi
Copy link

timonzi commented May 7, 2024

Describe the bug

If the database connection is interrupted while we are persisting an entity in a transaction, it may be that the entity has been persisted in the database, but we are not aware of this via our transaction observers, or that the wrong observer is being called.

Example code

We have a simple method which creates/persists the entity and fires an event:

@Inject
Event<TestEvent> event;

@Transactional
public void createEntity(final int i) {
    event.fire(new TestEvent("entity" + i));

    final var entity = new TestEntity();
    entity.setId("entity" + i);

    entityManager.persist(entity);
}

Within a bean, we have multiple observers:

public void afterSuccess(@Observes(during = TransactionPhase.AFTER_SUCCESS) final TestEvent event) {
    // do something
}


public void afterFailure(@Observes(during = TransactionPhase.AFTER_FAILURE) final TestEvent event) {
    // do something
}


public void afterCompletion(@Observes(during = TransactionPhase.AFTER_COMPLETION) final TestEvent event) {
    // do something
}

Now we are faced with the problem in our (real) applications that if we have a problem with the database connection during the persistence of the entity (e.g. because the database was restarted, a network error occurred, etc.), the AFTER_FAILURE observer is called, but the entity was successfully persisted.

This is an inconsistent state and we cannot know whether the entity has really been persisted or not - without performing another check on the database. This might not be a big problem for simple applications, but for generic implementations, or if you don't know what may or may not have been (partially) persisted in the transaction, then this is a very big problem for us.

Hint: We have the same problem when using an implementation of jakarta.transaction.Synchronization. Also here the status in the afterCompletion method is the wrong one (Status.STATUS_ROLLEDBACK instead of Status.STATUS_COMMITTED).

Please let me know if I need to create an issue at another location / project.

Expected behavior

The status of the transaction must reflect whether entities have been persisted or not. It must not be the case that entities have been persisted but we are not aware of this. The same applies in the opposite case, of course.

Or to put it another way: We must be sure that the status of the observers is consistent with regard to the database operations.

Actual behavior

Entities may have been successfully persisted to the database, but we get the information that the transaction failed.

How to Reproduce?

Reproducer: https://github.com/timonzi/transaction-phase-reproducer

Execute the PersistenceTest.

What it does:

  • It starts a Postgres:15.5 instance as Quarkus test resource
  • It creates a thread, which kills all connections every 100 ms
  • It tries to create and persist 100000 entities

Hint: We create so many entities and kill the connections so often because it is not clear to me at which point the problem exists. It can happen of course also when the database connection is lost only once (as said, because of Database restart, network problem, etc.)

What can we see:

  • There are more persisted entities on the database than expected

Example log output:

COUNT = 100000; // Number of entities which should be persisted in a loop
exceptionCounter = 141; // Number of exceptions which occurred in the loop
COUNT - exceptionCounter = 99859; // Number of entities which we expect to be persisted, because of the exceptions
entityCount = 99869; // Number of entities in the database
observerBean.getAfterCompletionCount() = 100000; // Number of calls of after completion observer
observerBean.getAfterSuccessCount() = 99859; // Number of calls of after success observer
observerBean.getAfterFailureCount() = 141; // // Number of calls of after failure observer

Output of uname -a or ver

Linux nb-timonz 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)

Quarkus version or git rev

3.9.3 (but I think the problem occurs in all Quarkus versions)

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.3

Additional information

No response

@timonzi timonzi added the kind/bug Something isn't working label May 7, 2024
@gastaldi gastaldi added area/narayana Transactions / Narayana and removed triage/needs-triage labels May 9, 2024
@quarkus-bot
Copy link

quarkus-bot bot commented May 9, 2024

/cc @mmusgrov (narayana)

@gastaldi gastaldi added the area/hibernate-orm Hibernate ORM label May 9, 2024
@mmusgrov
Copy link
Contributor

The after failure observer method should only be called if the transaction rolls back, but if it rolled back then the database will be instructed to rollback the entity state that got written to the db. So if the entity did get committed then I see a few possibilities:

  1. There was a heuristic outcome where some resources commit and others rollback. This would happen if Hibernate committed the state before being instructed to, although this is unlikely could someone from the hibernate team comment on that scenario (and @timonzi too if you could be in that situation).

  2. There is a problem with the CDI event code. Arc is responsible for delivering CDI transaction lifecycle notifications and it registers a Synchronization with the transaction manager to achieve that [1]. Since the bug report says an AFTER_FAILURE notification was delivered this is unlikely to be the cause.

  3. The connection handling code did not detect that there was a problem with the connection - I don't know enough about connection management to say any useful here though.

Since Narayana is only responsible for registering and calling the Arc Synchronization, and since it invoked the notification on the Arc Synchronization (the bug report said it received an AFTER_FAILURE notification), I plan to remove the "area/narayana" label.

[1] https://github.com/quarkusio/quarkus/blob/main/independent-projects/arc/runtime/src/main/java/io/quarkus/arc/impl/EventImpl.java#L298-L304

@mmusgrov
Copy link
Contributor

I guess we need some logs to make further progress (hibernate and narayana and the connection management code should be sufficient).

@timonzi
Copy link
Author

timonzi commented May 14, 2024

To be honest, I don't know which logs I could provide here, as there are a lot of exceptions in the log due to the frequent connection terminations, for example. I'm not sure which of these are to be expected and which might indicate the problem. I thought my reproducer could help you with this.

Perhaps the following warning is relevant, but I'm not that familiar with the topic:

2024-05-14 09:37:59,349 WARN  [com.arj.ats.jta] (main) ARJUNA016045: attempted rollback of < formatId=131077, gtrid_length=35, bqual_length=36, tx_uid=0:ffffc0a8b228:90df:664314c5:c353e, node_name=quarkus, branch_uid=0:ffffc0a8b228:90df:664314c5:c3543, subordinatenodename=null, eis_name=0 > (io.agroal.narayana.LocalXAResource@151ac0eb) failed with exception code XAException.XAER_RMERR: javax.transaction.xa.XAException: Error trying to transactionRollback local transaction: This connection has been closed.
	at io.agroal.narayana.XAExceptionUtils.xaException(XAExceptionUtils.java:20)
	at io.agroal.narayana.XAExceptionUtils.xaException(XAExceptionUtils.java:8)
	at io.agroal.narayana.LocalXAResource.rollback(LocalXAResource.java:89)
	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelAbort(XAResourceRecord.java:338)
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3109)
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3088)
	at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1706)
	at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1641)
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:74)
	at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:135)
	at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1271)
	at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:104)
	at io.quarkus.narayana.jta.runtime.NotifyingTransactionManager.commit(NotifyingTransactionManager.java:70)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.endTransaction(TransactionalInterceptorBase.java:406)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:175)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
	at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
	at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
	at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
	at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
	at org.acme.PersistenceTestBean_Subclass.createEntity(Unknown Source)
	at org.acme.PersistenceTestBean_ClientProxy.createEntity(Unknown Source)
	at org.acme.PersistenceTest.testPersistence(PersistenceTest.java:39)
	at org.acme.PersistenceTest_ClientProxy.testPersistence(Unknown Source)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at io.quarkus.test.junit.QuarkusTestExtension.runExtensionMethod(QuarkusTestExtension.java:1018)
	at io.quarkus.test.junit.QuarkusTestExtension.interceptTestMethod(QuarkusTestExtension.java:832)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:198)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:169)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:93)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:58)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:141)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:57)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:103)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:85)
	at org.junit.platform.launcher.core.DelegatingLauncher.execute(DelegatingLauncher.java:47)
	at org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:63)
	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:57)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
	at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:1009)
	at org.postgresql.jdbc.PgConnection.rollback(PgConnection.java:1016)
	at io.agroal.pool.ConnectionHandler.transactionRollback(ConnectionHandler.java:354)
	at io.agroal.narayana.LocalXAResource.rollback(LocalXAResource.java:86)
	... 96 more

What I find strange about the stacktrace is that you can see LocalXAResource. But we do not use XA transactions. Maybe this has nothing to do with XA transactions, but as I said, I'm not familiar with this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hibernate-orm Hibernate ORM area/narayana Transactions / Narayana kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants