fix JAVA-5855 by Codex#1972
Conversation
rozza
left a comment
There was a problem hiding this comment.
LGTM!
Solid review and write up A+
Especially liked the gotchas section.
Key point of this PR fix is:
"The safer interpretation is that the failure path is still wrong because it opens a channel before the resolver can fail, and that channel is not closed."
| closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, e); | ||
| handler.failed(new MongoSocketOpenException("Exception opening socket", getServerAddress(), e)); | ||
| } catch (Throwable t) { | ||
| closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, t); | ||
| handler.failed(t); |
There was a problem hiding this comment.
Before the PR, handler was guaranteed to be called if anything goes wrong (if Throwable happens), but that is not so anymore. Not completing a handler in asynchronous callback-based code is equivalent of a method never returning control in synchronous code. It's a serious bug, which causes any application code that was supposed to be executed, to never be executed. The latter may lead to, for example, resource leaks, dead locks caused by locks not being released.
Writing
try {
closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, t);
} finally {
handler.failed(t);
}would have solved the problem.
There was a problem hiding this comment.
I have fixed in a reimplementation #1981 (after this change was reverted). Technically a regression from the previous unconditional guarantee. But the practical risk is near-zero because:
- tryCancelPendingConnection() is AtomicReference.getAndSet(null) — cannot throw
- socketChannel.close() IOException is already caught
- failure.addSuppressed(e) can't self-suppress (different exception instances)
The risk would be an unchecked exception from SocketChannel.close() on a non-standard implementation, which doesn't apply to JDK's SocketChannelImpl.
| InetAddressResolver inetAddressResolver = new InetAddressResolver() { | ||
| @Override | ||
| public List<InetAddress> lookupByName(final String host) { | ||
| throw exception; | ||
| } | ||
| }; |
There was a problem hiding this comment.
There is no reason to declare and instantiate an anonymous class here, and this is against the code style we currently use. Instead, a lambda expression should have been used:
InetAddressResolver inetAddressResolver = host -> {
throw exception;
};There was a problem hiding this comment.
This is just a nit that in reality is the same thing. I would make that an optional fix for a PR reviewer. I'll look into hardening our AGENTS.md to help steer AI agents.
There was a problem hiding this comment.
In the "Problem" section of the PR description, the AI says:
The current implementation already has a broad catch block...So the literal "not passed to the handler" behavior is not obvious on current main or the 5.5.0 tag.
The reality is not that the bug ("behavior") is "not obvious", but that it does not exist (at least, the way it is described in the ticket JAVA-5855.
The AI should have clearly stated that the bug reported in the ticket does not exist, instead of claiming that the bug is not obvious, and then fixing a completely different bug.
| SocketChannel openedSocketChannel = SocketChannel.open(); | ||
| socketChannel = openedSocketChannel; |
There was a problem hiding this comment.
socketChannel = openedSocketChannel looks redundant.
Right now the channel is assigned to openedSocketChannel and then immediately re-assigned to socketChannel, but there’s no semantic distinction: socketChannel only ever comes from SocketChannel.open(), and the code doesn’t use the two variables to represent different states (partially-opened, closed, etc.) or different sources.
Suggestion: keep a single variable (SocketChannel socketChannel) and remove openedSocketChannel to reduce cognitive load and avoid implying there’s a meaningful difference when there isn’t.
There was a problem hiding this comment.
Its used in lambdas so is needed to be final. Hence the reassignment.
| } | ||
| selectorMonitor.register(socketRegistration); | ||
| registered = true; | ||
| } catch (IOException e) { | ||
| closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, e); | ||
| handler.failed(new MongoSocketOpenException("Exception opening socket", getServerAddress(), e)); | ||
| } catch (Throwable t) { | ||
| closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, t); | ||
| handler.failed(t); | ||
| } |
There was a problem hiding this comment.
The registered flag is redundant here and adds complexity in both this method and closeUnregisteredSocketChannel. Given the current control flow, registered can’t be true in any of the catch blocks, so the parameter doesn’t appear to affect behavior.
Suggestion: remove the registered parameter entirely.
There was a problem hiding this comment.
Pull request overview
This PR addresses JAVA-5855 by fixing TlsChannelStream.openAsync resource ownership/ordering so that hostname resolution happens before opening a SocketChannel, and by ensuring an opened-but-not-yet-registered channel is closed on failure. This aligns TLSChannel stream behavior with other async stream implementations and prevents leaks when name resolution or pre-registration connection steps fail.
Changes:
- Resolve
InetSocketAddressprior toSocketChannel.open()inTlsChannelStream.openAsync. - Add pre-registration cleanup to close the
SocketChanneland cancel pending registration on failure. - Add functional regression tests covering resolver failures (no socket opened) and connect failures (socket closed before failure is reported).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| driver-core/src/main/com/mongodb/internal/connection/TlsChannelStreamFactoryFactory.java | Reorders resolution vs socket open; adds cleanup for failures before selector registration. |
| driver-core/src/test/functional/com/mongodb/internal/connection/TlsChannelStreamFunctionalTest.java | Adds focused regression tests for resolver failure and pre-registration connect failure cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try { | ||
| SocketChannel socketChannel = SocketChannel.open(); | ||
| socketChannel.configureBlocking(false); | ||
| //getConnectTimeoutMs MUST be called before connection attempt, as it might throw MongoOperationTimeout exception. |
| openedSocketChannel.connect(socketAddress); | ||
| socketRegistration = new SelectorMonitor.SocketRegistration( | ||
| openedSocketChannel, () -> initializeTslChannel(handler, openedSocketChannel)); | ||
|
|
JAVA-5855 TLS Channel Stream Analysis
Context
Ticket: JAVA-5855
Project: mongodb/mongo-java-driver
Component: Reactive Streams / driver-core TLS stream handling, primarily
com.mongodb.internal.connectionThe requested fix is for a TLS channel stream bug involving failures from
getSocketAddresses. The ticket describes a case where, when using TLSChannel, a failure fromresolver.lookupByNameis not passed to the async handler.The engineering context matters because the Java driver has several stream implementations that should behave consistently during connection establishment. The non-TLS async stream and Netty stream both resolve addresses before opening transport resources.
TlsChannelStreamwas different: it opened aSocketChannelbefore resolving the address.The narrowest safe fix is therefore to align TLSChannel stream ordering with the other stream implementations: resolve the address before opening the socket channel, preserve the existing async failure path, and make ownership of the socket channel explicit until the selector monitor accepts it.
User Request Context
The general ask was to help fix a public JIRA ticket for the MongoDB Java driver, beginning with analysis before implementation.
The user-provided sources of information were:
JAVA-5855: https://jira.mongodb.org/browse/JAVA-5855mongo-java-driver: https://github.com/mongodb/mongo-java-driverThe user specifically asked to check the driver specifications for TLS-related items in auth, connection handling, and any other relevant places.
The implementation plan was driven by these constraints:
Problem
JAVA-5855reports that whenTlsChannelStreamuses TLSChannel andresolver.lookupByNamefails insidegetSocketAddresses, the thrown exception is not passed to the async handler.The current implementation already has a broad catch block:
So the literal "not passed to the handler" behavior is not obvious on current
mainor the 5.5.0 tag. However, the ordering of the code exposes a concrete resource leak:If
getSocketAddressesthrows afterSocketChannel.open(), the exception is sent tohandler.failed(t), but the opened channel is not registered with the stream and is not closed by the failure path.That makes the practical bug:
The same ownership issue also applies to failures after the channel is opened but before it is registered with the selector monitor, such as socket-option or initial
connectfailures. Until registration succeeds,openAsyncowns the channel and is responsible for closing it on failure.This also makes
TlsChannelStreaminconsistent with the other async stream implementations, which resolve addresses before opening transport resources.Likely Code Area
The affected implementation is in:
The relevant method is:
The address resolution helper is:
Specifically:
That helper calls the configured
InetAddressResolverand maps resolvedInetAddressvalues intoInetSocketAddressinstances.UnknownHostExceptionis wrapped inMongoSocketException; runtime failures from a custom resolver propagate as thrown.Related stream implementations reviewed for expected behavior:
Both resolve names before opening their underlying channel resources.
Related test area:
The async socket-channel and Netty stream specs already had regression tests for name-resolution failure reaching the async handler.
TlsChannelStreamFunctionalTestdid not.Proposed Scope
Fix only the connection-establishment ordering and pre-registration cleanup in
TlsChannelStream.openAsync.The patch should resolve the socket address before opening a
SocketChannel:This preserves the existing requirement that
getConnectTimeoutMsis called before the connection attempt, while ensuring no socket channel is opened if DNS resolution fails.The patch should also close the opened channel if any later setup step fails before registration succeeds:
If failure occurs before the selector monitor accepts the registration,
openAsyncshould cancel the pending registration action, close the channel, attach any close failure as a suppressed exception, and then report the original failure to the async handler.The fix should avoid changing:
InetAddressResolverbehavior,ServerAddressHelper.getSocketAddresses,This keeps the patch small and focused on the reported TLSChannel failure mode.
Test Plan
Add focused regression coverage rather than broad connection-establishment tests.
Suggested tests:
A
TlsChannelStreamtest with a customInetAddressResolverthat throws a knownMongoSocketException.An async handler assertion proving:
is called.
is not called.
A static
SocketChannel.open()verification proving no socket channel is opened when name resolution fails.A pre-registration failure test where
SocketChannel.open()succeeds butconnect(...)throws, proving the channel is closed before the failure is reported.The implemented regression tests are:
with:
Recommended local verification:
In this environment, the targeted Gradle command could not be run because no Java runtime is available.
git diff --checkpassed.Gotchas
The ticket wording says the resolver exception is not passed to the async handler, but the current code already has
catch (Throwable)callinghandler.failed(t). The safer interpretation is that the failure path is still wrong because it opens a channel before the resolver can fail, and that channel is not closed.Do not fix this by adding another catch block around
getSocketAddressesafterSocketChannel.open(). That would preserve the resource-ordering problem and would require extra cleanup logic. Resolving before opening the channel is simpler and matches the other stream implementations.Do not stop at DNS resolution ordering alone. Once
SocketChannel.open()succeeds,openAsyncowns that channel untilselectorMonitor.register(...)succeeds. Any failure in that window should close the channel to avoid moving the same leak to socket configuration or initial connect failures.Do not wrap resolver failures in
MongoSocketOpenExceptionunless existing behavior requires it.ServerAddressHelperalready wrapsUnknownHostExceptioninMongoSocketException, and the existing broad catch forwards non-IOExceptionfailures as-is.Do not add backpressure labels or retry labels for this path. The CMAP specification explicitly distinguishes DNS lookup failures from server-overload cases.
Be careful with timeout ordering. The existing comment says
getConnectTimeoutMsmust be called before the connection attempt because it may throwMongoOperationTimeoutException. The fix keeps that behavior intact.This change does not add multi-address fallback behavior to
TlsChannelStream. The existing TLSChannel implementation connects only to the first resolved address, and adding retry across all resolved addresses would be a larger behavior change outside the scope ofJAVA-5855.PR Positioning
Recommended PR framing:
Recommended PR summary:
Recommended tests section:
Recommended verification note:
Sources Reviewed
mongodb/mongo-java-driver, especiallydriver-core/src/main/com/mongodb/internal/connection