fix: abort transaction in all cases #881

emasab · 2022-02-18T13:26:29Z

catching RuntimeException all other subclasses of Exception are not caught and don't abort the transaction

esabellico-facephi · 2022-02-19T15:53:39Z

a bit of context: maybe this can happen only in Kotlin you can actually compile code that throws checked exceptions or calling java code that throws checked exceptions without declaring them with throws in the TransactionBody interface

emasab · 2022-03-12T16:06:11Z

here is some code to reproduce the error https://github.com/emasab/mongo-client-fix-881

stIncMale · 2022-04-12T17:29:21Z

driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

@@ -209,7 +209,7 @@ public <T> T withTransaction(final TransactionBody<T> transactionBody, final Tra
            try {
                startTransaction(options);
                retVal = transactionBody.execute();
-            } catch (RuntimeException e) {
+            } catch (Throwable e) {


Such a tiny change, and so many things come to mind when thinking about it.

Errors in Java the way they are, make no sense

My view on this is expressed here. A consequence of this view is that we should not catch Errors, which means we should catch Exceptions instead of catching Throwables each time we want to catch "everything" that makes sense catching (surely, there are exceptions from this rule, like catching assertion errors in test frameworks, or catching specific Errors that should have been Exceptions, but were made Errors instead).

Unlike Java, Rust uses two drastically different mechanisms to represent recoverable and unrecoverable errors:

Recoverable errors are represented via return values. Unlike in C, the compiler forces a caller to either handle them or to explicitly propagate as the caller's return value via the ? operator.

Unrecoverable errors are called panics, which can be implemented via unwinding or aborting. Unwinding is similar to throwing an exception in Java, and an unwinding panic can even be caught.

It is worth pointing out that the ability to catch unwinding panics in Rust should not be interpreted as "see, even in Rust catching unrecoverable errors is possible, which suggests that trying to handle unrecoverable errors (whether in Java or in Rust) may make sense!". The reason Rust allows catching unwinding panics is that "Rust's unwinding strategy is not specified to be fundamentally compatible with any other language's unwinding. As such, unwinding into Rust from another language, or unwinding into another language from Rust is Undefined Behavior. You must absolutely catch any panics at the FFI boundary!" This point leads us directly to the issue that this PR is trying to address.

Kotlin's exception handling is not compatible with Java's exception handling

Unlike Java, Kotlin does not have the concept of checked exceptions (good for Kotlin). However, this means that Kotlin's exception mechanism is not compatible with Java's exception mechanism, and just like Rust code must catch any unwinding panics at the FFI boundary, Kotlin generally should catch all exceptions that are considered checked in Java (the check is simple: they are of the Exception type and not of the RuntimeException type) at the Kotlin-Java boundary. Unfortunately, the Kotlin's interop guide erroneously does not require anything like this. It partially addresses the incompatibility for a subset of cases, but that subset does not include the case addressed in this PR.

Conclusion

Even though I consider catching Throwables instead of Exceptions an undesirable practice, the code of the driver contains many places where this is done, which is why introducing one more such catch changes nothing.

Even though having unchecked exception thrown from Kotlin code that implements a Java API method that does not declare checked exceptions is a problem in the Kotlin code, we better work around it as suggested in the PR. Also, from now on we should remember to consider any method as being able to throw checked exception even it is not declared¹ (despite this not being true for all methods, for simplicity and robustness it is better to treat all of them this way).

¹ One may argue that even without calling Kotlin code, sneaky throws are possible in Java due to the bug-feature it introduced. However, in this case it is easy to argue that whoever uses the technique is fully responsible for the consequences. The same is much harder to say about Kotlin code because it is more abundant and Kotlin's documentation fails to require the right thing to be done for interoperability.

Nice reasoning. I add that if you decompile a Kotlin class file you see that a checked exception is wrapped inside a Throwable and then thrown, I've not tested it but I'd guess that it's done in Scala and Groovy too.

Interesting, it is not obvious to me why Kotlin compiler wraps checked exceptions in Throwable given that an exception of the Throwable type is considered checked by Java compiler.

stIncMale · 2022-04-13T15:13:35Z

Thanks, @emasab, @esabellico-facephi, for informing the driver team about the problem. JAVA-4575 Driver code should catch Exception instead of RuntimeException even if no checked exception is declared will address the problem in other places in the code.

emasab · 2022-04-13T17:01:11Z

Not at all @stIncMale. Correction about the Kotlin bytecode: it's not wrapped, it's casted to Throwable. And as it's checked in Java, it should be the Kotlin compiler that, not doing the checks, allows to compile to bytecode. If you decompile the Kotlin bytecode to Java it gives the error: you have to declare it with throws.

emasab · 2022-04-27T15:57:46Z

@stIncMale may I ask another question, given that we're still experiencing a case where the same sessionId is used with two different transaction ids even if the session is started and closed with the transaction.

In the current version: https://github.com/emasab/mongo-java-driver/blob/1294de8a3daf0bcc0403d127f8c932511a51ea18/driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

at line 238 shouln't it abort the transaction before calling start transaction again?
} else if (e.hasErrorLabel(TRANSIENT_TRANSACTION_ERROR_LABEL)) { continue outer; }
at line 241 shouln't it abort the transaction before giving up after 2 minutes have passed?
} throw e; }

Thanks!

stIncMale · 2022-04-29T21:24:45Z

@emasab I have not been familiar with this code, but it appears that the code is written with the following idea behind it: a client may try to either commit or rollback a transaction, but cannot try both:

Both commitTransaction and abortTransaction change transactionState to COMMITTED/ABORTED respectively regardless of whether they succeed or fail.
commitTransaction fails if transactionState is ABORTED. abortTransaction fails if transactionState is COMMITTED.

clearTransactionContextOnError called when commitTransaction fails must be doing the client-side cleanup needed to either retry commitTransaction or abandon the transaction; and the server must cleanup resources related to an abandoned transaction eventually.

we're still experiencing a case where the same sessionId is used with two different transaction ids even if the session is started and closed with the transaction

startTransaction may be executed multiple times per a withTransaction execution. This method advances the transaction number. That is, it appears to be normal to see more than one txnNumber for a session even if there is only one withTransaction for that session.

If you still think you are observing a bug, it may be worth reporting it via Jira. And if you have a reproducer, it would be very helpful.

emasab · 2022-05-04T17:13:08Z

@stIncMale thanks for the explanation it's much clearer now, about the txnId it's giving this error Caused by: com.mongodb.MongoQueryException: Query failed with error code 251 and error message 'cannot continue txnId 28 for session f1949373-d84d-468e-8667-98389bc480b9 - W/Ao6W6j9bCrE7yhAkO0wEm49NrHTolA71DjF3jfPsw= with txnId 29 but it's using a version of withTransaction that is copied from this one and adapted to Kotlin's coroutines so if debugging further I find something that applies to this code too, I'm sending some test to reproduce it.

See JAVA-4575 for more details.

fix: abort transaction in all cases

e517676

catching RuntimeException all other subclasses of Exception are not caught and don't abort the transaction

jyemin requested a review from stIncMale April 5, 2022 20:09

stIncMale approved these changes Apr 12, 2022

View reviewed changes

stIncMale merged commit ebada8e into mongodb:master Apr 13, 2022

ispringer pushed a commit to evergage/mongo-java-driver that referenced this pull request May 16, 2023

Abort transaction in all cases (mongodb#881)

60bebcb

See JAVA-4575 for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: abort transaction in all cases #881

fix: abort transaction in all cases #881

emasab commented Feb 18, 2022

esabellico-facephi commented Feb 19, 2022

emasab commented Mar 12, 2022

stIncMale Apr 12, 2022 •

edited

emasab Apr 12, 2022

stIncMale Apr 12, 2022

stIncMale commented Apr 13, 2022

emasab commented Apr 13, 2022

emasab commented Apr 27, 2022 •

edited

stIncMale commented Apr 29, 2022 •

edited

emasab commented May 4, 2022 •

edited

fix: abort transaction in all cases #881

fix: abort transaction in all cases #881

Conversation

emasab commented Feb 18, 2022

esabellico-facephi commented Feb 19, 2022

emasab commented Mar 12, 2022

stIncMale Apr 12, 2022 • edited

Choose a reason for hiding this comment

emasab Apr 12, 2022

Choose a reason for hiding this comment

stIncMale Apr 12, 2022

Choose a reason for hiding this comment

stIncMale commented Apr 13, 2022

emasab commented Apr 13, 2022

emasab commented Apr 27, 2022 • edited

stIncMale commented Apr 29, 2022 • edited

emasab commented May 4, 2022 • edited

stIncMale Apr 12, 2022 •

edited

emasab commented Apr 27, 2022 •

edited

stIncMale commented Apr 29, 2022 •

edited

emasab commented May 4, 2022 •

edited