New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock and couple more problems in DefaultConnectionPool
#699
Conversation
This commit fixes two problems in the `OpenConcurrencyLimiter`: - deadlock in `DefaultConnectionPool`; - potential removal from `OpenConcurrencyLimiter.desiredConnectionSlots` when its empty. Deadlock a) Before introducing OpenConcurrencyLimiter, "AsyncGetter" thread was used only to do blocking `ConcurrentPool.get`. b) After, I started to additionally use "AsyncGetter" to do blocking `OpenConcurrencyLimiter.waitUntilOpenPermitAvailable`. As a result, we may have a thread that gets the last connection (`maxSize` is reached) from `ConcurrentPool` and submits `waitUntilOpenPermitAvailable` to "AsyncGetter". Concurrently with this happening, the "AsyncGetter" tries to get from `ConcurrentPool` and is blocked because there are no more connections available. In such an execution, `waitUntilOpenPermitAvailable` cannot be completed by "AsyncGetter" because "AsyncGetter" is blocked doing a different task, which itself cannot be completed. a solution is to do `ConcurrentPool.get` and `waitUntilOpenPermitAvailable` in different threads. This way these two different kinds of tasks will not block each other by waiting in the same queue to be done by a single thread. Potential removal from `OpenConcurrencyLimiter.desiredConnectionSlots` when its empty. If `acquirePermitOrGetAvailableOpenedConnection` is called with `true` as `tryGetAvailable`, and `getPooledConnectionImmediately` throws an exception, then `expressDesireToGetAvailableConnection` is not called but `giveUpOnTryingToGetAvailableConnection` is still called, which is incorrect. JAVA-3928
private static Stream<Arguments> concurrentUsageArguments() { | ||
return Stream.of( | ||
Arguments.of(0, 1, 8, true, false, 0.02f), | ||
Arguments.of(0, 1, 8, false, true, 0.02f), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of all 5 parameter sets, only this one reproduced the deadlock.
@@ -710,7 +698,7 @@ public Prune shouldPrune(final UsageTrackingInternalConnection usageTrackingConn | |||
} | |||
|
|||
/** | |||
* Package-private methods are thread-safe, | |||
* Package-access methods are thread-safe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"package-private" is not a term the JLS uses, while "package-access" is.
@@ -874,7 +864,7 @@ private PooledConnection acquirePermitOrGetAvailableOpenedConnection(final boole | |||
return availableConnection; | |||
} finally { | |||
try { | |||
if (tryGetAvailable && availableConnection == null) {//the desired connection slot has not yet been removed | |||
if (expressedDesireToGetAvailableConnection && availableConnection == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the second problem I discovered while implementing pausable pool. getPooledConnectionImmediately
there may throw an exception, which in turn may result in this code calling giveUpOnTryingToGetAvailableConnection
despite expressDesireToGetAvailableConnection
not being called.
This is an example of how one should try not to rely if possible on methods not throwing exceptions.
JAVA-3927
*/ | ||
private long awaitNanos(final long timeoutNanos) { | ||
private long awaitNanos(final Condition condition, final long timeoutNanos) throws MongoInterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes the behavior of this method consistent with Timeout
and allows to see explicitly what condition it waits for on the calling side.
DefaultConnectionPool
DefaultConnectionPool
On one hand `ConcurrentPool.ensureMinSize` tries to not require the caller to do what can be done by the method itself, e.g., it releases the connection to the pool itself. On the other hand, always releasing in `ensureMinSize` may lead to releasing a permit for the same connection twice, thus not respecting the `maxSize`. It is not easy (requires an additional knob and logic) to prevent the caller (`DefaultConnectionPool`) from releasing a connection when initialization fails. It is, therefore, seems better to mandate that the caller releases a connection if initialization fails. JAVA-3927
DefaultConnectionPool
DefaultConnectionPool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question but LGTM
driver-core/src/main/com/mongodb/internal/connection/ConcurrentPool.java
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/internal/connection/DefaultConnectionPool.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…godb#699) This commit fixes three problems in `DefaultConnectionPool`/`ConcurrentPool`: - deadlock in `DefaultConnectionPool`; - unnecessary and harmful removal from `OpenConcurrencyLimiter.desiredConnectionSlots`; - double release in `DefaultConnectionPool`/`ConcurrentPool`. Deadlock in `DefaultConnectionPool` a) Before introducing OpenConcurrencyLimiter, "AsyncGetter" thread was used only to do blocking `ConcurrentPool.get`. b) After, I started to additionally use "AsyncGetter" to do blocking `OpenConcurrencyLimiter.waitUntilOpenPermitAvailable`. As a result, we may have a thread that gets the last connection (`maxSize` is reached) from `ConcurrentPool` and submits `waitUntilOpenPermitAvailable` to "AsyncGetter". Concurrently with this happening, the "AsyncGetter" tries to get from `ConcurrentPool` and is blocked because there are no more connections available. In such an execution, `waitUntilOpenPermitAvailable` cannot be completed by "AsyncGetter" because "AsyncGetter" is blocked doing a different task, which itself cannot be completed. A solution is to do `ConcurrentPool.get` and `waitUntilOpenPermitAvailable` in different threads. This way these two different kinds of tasks will not block each other by waiting in the same queue to be done by a single thread. Potential unnecessary and harmful removal from `OpenConcurrencyLimiter.desiredConnectionSlots` If `acquirePermitOrGetAvailableOpenedConnection` is called with `true` as `tryGetAvailable`, and `getPooledConnectionImmediately` throws an exception, then `expressDesireToGetAvailableConnection` is not called but `giveUpOnTryingToGetAvailableConnection` is still called, which is incorrect. Double release in `DefaultConnectionPool`/`ConcurrentPool` On one hand `ConcurrentPool.ensureMinSize` tries to not require the caller to do what can be done by the method itself, e.g., it releases the connection to the pool itself. On the other hand, always releasing in `ensureMinSize` may lead to releasing a permit for the same connection twice, thus not respecting the `maxSize`. It is not easy (requires an additional knob and logic) to prevent the caller (`DefaultConnectionPool`) from releasing a connection when initialization fails. It is, therefore, seems better to mandate that the caller releases a connection if initialization fails. JAVA-3927
I discovered 3 problems in
DefaultConnectionPool
andConcurrentPool
introduced by me in #685.OpenConcurrencyLimiter.desiredConnectionSlots
despite not putting anything there previously. See this commit message for the details.ConcurrentPool.ensureMinSize
explicitly requires a caller to release resources ifinitialize
throws an exception, but then releases the permit fornewItem
itself, which leads to double release. This was by far the hardest problem to investigate and find the cause. See this commit message for more details.Commits in this PR can be reviewed one by one.
JAVA-3927