Skip to content

JAVA-6144 increase wait time for thread pool shutdown#1920

Merged
strogiyotec merged 2 commits intomongodb:mainfrom
strogiyotec:JAVA-6144
Mar 23, 2026
Merged

JAVA-6144 increase wait time for thread pool shutdown#1920
strogiyotec merged 2 commits intomongodb:mainfrom
strogiyotec:JAVA-6144

Conversation

@strogiyotec
Copy link
Copy Markdown
Contributor

@strogiyotec strogiyotec commented Mar 23, 2026

JAVA-6144
The test case was flaky
Here is 1 example of failing test
The test only fails for NoTls variant
The test itself calls close on the mongo client and expects underlying threadpool to await termination for less than 100MS
After looking at the code I found that the main cause is in AsynchronousSocketChannelStreamFactoryFactory specifically how close works there

    @Override
    public void close() {
        if (group != null) {
            group.shutdown();
        }
    }

The group is AsynchronousChannelGroup and it's created in StreamFactoryHelper here

                    group = AsynchronousChannelGroup.withThreadPool(executorService);

and from the java doc https://docs.oracle.com/javase/8/docs/api/java/nio/channels/AsynchronousChannelGroup.html

The shutdown method is used to initiate an orderly shutdown of a group. An orderly shutdown marks the group as shutdown; further attempts to construct a channel that binds to the group will throw ShutdownChannelGroupException. Whether or not a group is shutdown can be tested using the isShutdown method. Once shutdown, the group terminates when all asynchronous channels that are bound to the group are closed, all actively executing completion handlers have run to completion, and resources used by the group are released. No attempt is made to stop or interrupt threads that are executing completion handlers

Basically once we call shutdown on the channel this operation is non blocking and only marks executor as being shutdown , the gap between shutdown and awaitTermination within a test case is small and occasionally fails

To test this using a patch

  1. I increased timeout to 2 seconds
  2. Changed the test case to only test tlsEnabled=false
  3. Changed test to RepetableTest to run 100 times (that's why tlsEnabled was hardocded to false, RepeatedTest doesn't support ValueSource)
  4. Submitted a patch
  5. The test case has passed 100 times without flakiness , Evergreen patch

Another solution I could think of is to use shutDownNow however the shutdown will be blocking

In addition to the actions performed by the shutdown method, this method invokes the close method on all open channels in the group. This method does not attempt to stop or interrupt threads that are executing completion handlers. The group terminates when all actively executing completion handlers have run to completion and all resources have been released. This method may be invoked at any time. If some other thread has already invoked it, then another invocation will block until the first invocation is complete, after which it will return without effect.

@strogiyotec strogiyotec requested a review from a team as a code owner March 23, 2026 16:04
@strogiyotec strogiyotec requested a review from atandon2024 March 23, 2026 16:04
@stIncMale stIncMale requested review from stIncMale and removed request for atandon2024 March 23, 2026 16:30
Comment on lines +57 to +83
@@ -71,6 +80,6 @@ void testExternalExecutorWasShutDown(final boolean tlsEnabled) throws Interrupte
// ignored
}

assertTrue(executorService.awaitTermination(100, TimeUnit.MILLISECONDS));
assertTrue(executorService.awaitTermination(2, TimeUnit.SECONDS));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the code comment. My thoughts are below.

Your observed that most of the time was spent on a specific activity, which you described. That observation depends on the circumstances, not a general truth.

The MongoClient.close is asynchronous, and in this PR you increased the duration of time the test waits for the subset of the close work to be completed. The previous duration of 100 ms was arbitrary, and the new duration is also arbitrary (it's not a fixed number you can compute that is guaranteed to be enough in all circumstances). The only criteria - it should be long enough for the test to seemingly always pass (thank you for experimenting and confirming this!).

I think, you investigation was useful, it satisfied your curiosity, and a PR-level comment may be useful to satisfy the curiosity of a PR reviewer, but the code-level comment seems strictly harmful to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Valentin , thanks for the feedback

  1. I removed the javadoc from test method
  2. Call close directly instead of try-with-resources
  3. Added JIRA link to the PR description

Comment on lines 79 to 81
@@ -71,6 +80,6 @@ void testExternalExecutorWasShutDown(final boolean tlsEnabled) throws Interrupte
// ignored
}
Copy link
Copy Markdown
Member

@stIncMale stIncMale Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[unrelated to this PR]

Let's make this code more expressive, straightforward, and shorter:

new SyncMongoClient(mongoClientSettings).close();

As far as I can tell, there is no reason to use the try-with-resources statement in this case.

@stIncMale
Copy link
Copy Markdown
Member

@strogiyotec, could you please specify JAVA-6144 in the ticket description? This way a reader may click on it to navigate to the ticket, and Jira will automatically link the PR to the ticket.

@strogiyotec strogiyotec merged commit c956d94 into mongodb:main Mar 23, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants