Skip to content

fix JAVA-5855 by Codex#1972

Merged
rozza merged 1 commit into
mongodb:mainfrom
techbelle:Java-5855
May 20, 2026
Merged

fix JAVA-5855 by Codex#1972
rozza merged 1 commit into
mongodb:mainfrom
techbelle:Java-5855

Conversation

@techbelle
Copy link
Copy Markdown
Contributor

JAVA-5855 TLS Channel Stream Analysis

Context

Ticket: JAVA-5855
Project: mongodb/mongo-java-driver
Component: Reactive Streams / driver-core TLS stream handling, primarily com.mongodb.internal.connection

The requested fix is for a TLS channel stream bug involving failures from getSocketAddresses. The ticket describes a case where, when using TLSChannel, a failure from resolver.lookupByName is not passed to the async handler.

The engineering context matters because the Java driver has several stream implementations that should behave consistently during connection establishment. The non-TLS async stream and Netty stream both resolve addresses before opening transport resources. TlsChannelStream was different: it opened a SocketChannel before resolving the address.

The narrowest safe fix is therefore to align TLSChannel stream ordering with the other stream implementations: resolve the address before opening the socket channel, preserve the existing async failure path, and make ownership of the socket channel explicit until the selector monitor accepts it.

User Request Context

The general ask was to help fix a public JIRA ticket for the MongoDB Java driver, beginning with analysis before implementation.

The user-provided sources of information were:

The user specifically asked to check the driver specifications for TLS-related items in auth, connection handling, and any other relevant places.

The implementation plan was driven by these constraints:

  • keep the fix narrow and easy to review,
  • check the nearby stream implementations instead of updating only the obvious file,
  • preserve existing timeout and async handler behavior,
  • add focused regression coverage,
  • verify against the relevant public driver specifications,
  • save the analysis as Markdown for the engineering team.

Problem

JAVA-5855 reports that when TlsChannelStream uses TLSChannel and resolver.lookupByName fails inside getSocketAddresses, the thrown exception is not passed to the async handler.

The current implementation already has a broad catch block:

} catch (IOException e) {
    handler.failed(new MongoSocketOpenException("Exception opening socket", getServerAddress(), e));
} catch (Throwable t) {
    handler.failed(t);
}

So the literal "not passed to the handler" behavior is not obvious on current main or the 5.5.0 tag. However, the ordering of the code exposes a concrete resource leak:

SocketChannel.open()
configure socket options
getConnectTimeoutMs()
getSocketAddresses(...).get(0)
socketChannel.connect(...)

If getSocketAddresses throws after SocketChannel.open(), the exception is sent to handler.failed(t), but the opened channel is not registered with the stream and is not closed by the failure path.

That makes the practical bug:

name resolution failure during TLS stream open can leave an opened SocketChannel behind

The same ownership issue also applies to failures after the channel is opened but before it is registered with the selector monitor, such as socket-option or initial connect failures. Until registration succeeds, openAsync owns the channel and is responsible for closing it on failure.

This also makes TlsChannelStream inconsistent with the other async stream implementations, which resolve addresses before opening transport resources.

Likely Code Area

The affected implementation is in:

driver-core/src/main/com/mongodb/internal/connection/TlsChannelStreamFactoryFactory.java

The relevant method is:

public void openAsync(final OperationContext operationContext, final AsyncCompletionHandler<Void> handler)

The address resolution helper is:

driver-core/src/main/com/mongodb/internal/connection/ServerAddressHelper.java

Specifically:

public static List<InetSocketAddress> getSocketAddresses(
        final ServerAddress serverAddress,
        final InetAddressResolver resolver)

That helper calls the configured InetAddressResolver and maps resolved InetAddress values into InetSocketAddress instances. UnknownHostException is wrapped in MongoSocketException; runtime failures from a custom resolver propagate as thrown.

Related stream implementations reviewed for expected behavior:

driver-core/src/main/com/mongodb/internal/connection/AsynchronousSocketChannelStream.java
driver-core/src/main/com/mongodb/internal/connection/netty/NettyStream.java

Both resolve names before opening their underlying channel resources.

Related test area:

driver-core/src/test/functional/com/mongodb/internal/connection/TlsChannelStreamFunctionalTest.java
driver-core/src/test/functional/com/mongodb/internal/connection/AsyncSocketChannelStreamSpecification.groovy
driver-core/src/test/functional/com/mongodb/connection/netty/NettyStreamSpecification.groovy

The async socket-channel and Netty stream specs already had regression tests for name-resolution failure reaching the async handler. TlsChannelStreamFunctionalTest did not.

Proposed Scope

Fix only the connection-establishment ordering and pre-registration cleanup in TlsChannelStream.openAsync.

The patch should resolve the socket address before opening a SocketChannel:

getConnectTimeoutMs()
getSocketAddresses(...).get(0)
SocketChannel.open()
configure socket options
socketChannel.connect(resolvedAddress)
register with SelectorMonitor

This preserves the existing requirement that getConnectTimeoutMs is called before the connection attempt, while ensuring no socket channel is opened if DNS resolution fails.

The patch should also close the opened channel if any later setup step fails before registration succeeds:

SocketChannel.open()
configure socket options
socketChannel.connect(resolvedAddress)
schedule timeout / register with SelectorMonitor

If failure occurs before the selector monitor accepts the registration, openAsync should cancel the pending registration action, close the channel, attach any close failure as a suppressed exception, and then report the original failure to the async handler.

The fix should avoid changing:

  • public driver APIs,
  • InetAddressResolver behavior,
  • ServerAddressHelper.getSocketAddresses,
  • TLS handshake behavior,
  • selector monitor behavior,
  • timeout scheduling behavior,
  • connection-pool behavior,
  • backpressure or retryable error labels.

This keeps the patch small and focused on the reported TLSChannel failure mode.

Test Plan

Add focused regression coverage rather than broad connection-establishment tests.

Suggested tests:

  1. A TlsChannelStream test with a custom InetAddressResolver that throws a known MongoSocketException.

  2. An async handler assertion proving:

handler.failed(exception)

is called.

  1. A negative completion assertion proving:
handler.completed(null)

is not called.

  1. A static SocketChannel.open() verification proving no socket channel is opened when name resolution fails.

  2. A pre-registration failure test where SocketChannel.open() succeeds but connect(...) throws, proving the channel is closed before the failure is reported.

The implemented regression tests are:

driver-core/src/test/functional/com/mongodb/internal/connection/TlsChannelStreamFunctionalTest.java

with:

shouldFailAsyncCompletionHandlerWithoutOpeningSocketChannelIfNameResolutionFails
shouldCloseSocketChannelIfConnectFailsBeforeRegistration

Recommended local verification:

./gradlew :driver-core:test --tests com.mongodb.internal.connection.TlsChannelStreamFunctionalTest

In this environment, the targeted Gradle command could not be run because no Java runtime is available. git diff --check passed.

Gotchas

The ticket wording says the resolver exception is not passed to the async handler, but the current code already has catch (Throwable) calling handler.failed(t). The safer interpretation is that the failure path is still wrong because it opens a channel before the resolver can fail, and that channel is not closed.

Do not fix this by adding another catch block around getSocketAddresses after SocketChannel.open(). That would preserve the resource-ordering problem and would require extra cleanup logic. Resolving before opening the channel is simpler and matches the other stream implementations.

Do not stop at DNS resolution ordering alone. Once SocketChannel.open() succeeds, openAsync owns that channel until selectorMonitor.register(...) succeeds. Any failure in that window should close the channel to avoid moving the same leak to socket configuration or initial connect failures.

Do not wrap resolver failures in MongoSocketOpenException unless existing behavior requires it. ServerAddressHelper already wraps UnknownHostException in MongoSocketException, and the existing broad catch forwards non-IOException failures as-is.

Do not add backpressure labels or retry labels for this path. The CMAP specification explicitly distinguishes DNS lookup failures from server-overload cases.

Be careful with timeout ordering. The existing comment says getConnectTimeoutMs must be called before the connection attempt because it may throw MongoOperationTimeoutException. The fix keeps that behavior intact.

This change does not add multi-address fallback behavior to TlsChannelStream. The existing TLSChannel implementation connects only to the first resolved address, and adding retry across all resolved addresses would be a larger behavior change outside the scope of JAVA-5855.

PR Positioning

Recommended PR framing:

JAVA-5855 Resolve TLS channel address before opening socket

Recommended PR summary:

This change fixes the TLS channel stream connection-establishment path so address resolution happens before opening a SocketChannel. Previously, TlsChannelStream opened and configured a SocketChannel before calling getSocketAddresses. If the configured resolver failed, the exception was reported to the async handler, but the already-opened channel was not closed.

Resolving the address before opening the channel avoids the resolver-failure leak and aligns TlsChannelStream with the existing async socket-channel and Netty stream implementations. The setup path now also closes the channel if any pre-registration step fails after the channel has been opened.

Recommended tests section:

Added regression coverage for TlsChannelStream.openAsync with a failing InetAddressResolver. The test verifies that the async handler receives the resolver exception, completion is not signaled, and SocketChannel.open is not called.

Added regression coverage for a connect failure after SocketChannel.open succeeds but before selector registration. The test verifies that the original IOException is wrapped in MongoSocketOpenException and that the opened SocketChannel is closed.

Recommended verification note:

git diff --check
./gradlew :driver-core:test --tests com.mongodb.internal.connection.TlsChannelStreamFunctionalTest

Sources Reviewed

@techbelle techbelle requested a review from a team as a code owner May 15, 2026 17:48
@techbelle techbelle requested a review from strogiyotec May 15, 2026 17:48
@rozza rozza self-requested a review May 20, 2026 16:00
Copy link
Copy Markdown
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Solid review and write up A+
Especially liked the gotchas section.
Key point of this PR fix is:

"The safer interpretation is that the failure path is still wrong because it opens a channel before the resolver can fail, and that channel is not closed."

@rozza rozza merged commit 13d4aef into mongodb:main May 20, 2026
53 checks passed
Copy link
Copy Markdown
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional PR feedback.

Comment on lines +242 to 246
closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, e);
handler.failed(new MongoSocketOpenException("Exception opening socket", getServerAddress(), e));
} catch (Throwable t) {
closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, t);
handler.failed(t);
Copy link
Copy Markdown
Member

@stIncMale stIncMale May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the PR, handler was guaranteed to be called if anything goes wrong (if Throwable happens), but that is not so anymore. Not completing a handler in asynchronous callback-based code is equivalent of a method never returning control in synchronous code. It's a serious bug, which causes any application code that was supposed to be executed, to never be executed. The latter may lead to, for example, resource leaks, dead locks caused by locks not being released.

Writing

try {
    closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, t);
} finally {
    handler.failed(t);
}

would have solved the problem.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed in a reimplementation #1981 (after this change was reverted). Technically a regression from the previous unconditional guarantee. But the practical risk is near-zero because:

  1. tryCancelPendingConnection() is AtomicReference.getAndSet(null) — cannot throw
  2. socketChannel.close() IOException is already caught
  3. failure.addSuppressed(e) can't self-suppress (different exception instances)

The risk would be an unchecked exception from SocketChannel.close() on a non-standard implementation, which doesn't apply to JDK's SocketChannelImpl.

Comment on lines +84 to +89
InetAddressResolver inetAddressResolver = new InetAddressResolver() {
@Override
public List<InetAddress> lookupByName(final String host) {
throw exception;
}
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to declare and instantiate an anonymous class here, and this is against the code style we currently use. Instead, a lambda expression should have been used:

InetAddressResolver inetAddressResolver = host -> {
    throw exception;
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a nit that in reality is the same thing. I would make that an optional fix for a PR reviewer. I'll look into hardening our AGENTS.md to help steer AI agents.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "Problem" section of the PR description, the AI says:

The current implementation already has a broad catch block...So the literal "not passed to the handler" behavior is not obvious on current main or the 5.5.0 tag.

The reality is not that the bug ("behavior") is "not obvious", but that it does not exist (at least, the way it is described in the ticket JAVA-5855.

The AI should have clearly stated that the bug reported in the ticket does not exist, instead of claiming that the bug is not obvious, and then fixing a completely different bug.

Copy link
Copy Markdown
Member

@vbabanin vbabanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide an additional feedback about the agentic execution:

The agent introduced redundant code (openedSocketChannel, registered flag) that adds noise.

We should create a follow up clean up PR, if there is an agreement to the posted comments.

Comment on lines +220 to +221
SocketChannel openedSocketChannel = SocketChannel.open();
socketChannel = openedSocketChannel;
Copy link
Copy Markdown
Member

@vbabanin vbabanin May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

socketChannel = openedSocketChannel looks redundant.

Right now the channel is assigned to openedSocketChannel and then immediately re-assigned to socketChannel, but there’s no semantic distinction: socketChannel only ever comes from SocketChannel.open(), and the code doesn’t use the two variables to represent different states (partially-opened, closed, etc.) or different sources.

Suggestion: keep a single variable (SocketChannel socketChannel) and remove openedSocketChannel to reduce cognitive load and avoid implying there’s a meaningful difference when there isn’t.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its used in lambdas so is needed to be final. Hence the reassignment.

Comment on lines 238 to 247
}
selectorMonitor.register(socketRegistration);
registered = true;
} catch (IOException e) {
closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, e);
handler.failed(new MongoSocketOpenException("Exception opening socket", getServerAddress(), e));
} catch (Throwable t) {
closeUnregisteredSocketChannel(socketChannel, socketRegistration, registered, t);
handler.failed(t);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registered flag is redundant here and adds complexity in both this method and closeUnregisteredSocketChannel. Given the current control flow, registered can’t be true in any of the catch blocks, so the parameter doesn’t appear to affect behavior.

Suggestion: remove the registered parameter entirely.

@vbabanin vbabanin requested a review from Copilot May 20, 2026 19:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses JAVA-5855 by fixing TlsChannelStream.openAsync resource ownership/ordering so that hostname resolution happens before opening a SocketChannel, and by ensuring an opened-but-not-yet-registered channel is closed on failure. This aligns TLSChannel stream behavior with other async stream implementations and prevents leaks when name resolution or pre-registration connection steps fail.

Changes:

  • Resolve InetSocketAddress prior to SocketChannel.open() in TlsChannelStream.openAsync.
  • Add pre-registration cleanup to close the SocketChannel and cancel pending registration on failure.
  • Add functional regression tests covering resolver failures (no socket opened) and connect failures (socket closed before failure is reported).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
driver-core/src/main/com/mongodb/internal/connection/TlsChannelStreamFactoryFactory.java Reorders resolution vs socket open; adds cleanup for failures before selector registration.
driver-core/src/test/functional/com/mongodb/internal/connection/TlsChannelStreamFunctionalTest.java Adds focused regression tests for resolver failure and pre-registration connect failure cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

try {
SocketChannel socketChannel = SocketChannel.open();
socketChannel.configureBlocking(false);
//getConnectTimeoutMs MUST be called before connection attempt, as it might throw MongoOperationTimeout exception.
Comment on lines +232 to 235
openedSocketChannel.connect(socketAddress);
socketRegistration = new SelectorMonitor.SocketRegistration(
openedSocketChannel, () -> initializeTslChannel(handler, openedSocketChannel));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants