You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ConnectionPool code contains race condition around the RecoverLeakedSessionsAsync functionality. This has two main consequences:
The m_sessions linked list might end up containing the same session twice. Which then leads to that session being contained in both m_sessions and m_leasedSessions. Which in turn leads to the GetSessionAsync call failing with a wierd-looking exception
The m_leasedSessions dictionary might contain more sessions than the value of CurrentCount property of the m_sessionSemaphore, thus exceeding the configured MaximumPoolSize
This race condition occurs if two different threads concurrently enter RecoverLeakedSessionsAsync.
This can be encountered, for instance, when:
Both threads evaluate the condition unchecked(((uint) Environment.TickCount) - m_lastRecoveryTime) >= 1000u as true, both enter the method RecoverLeakedSessionsAsync. First one to enter the critical section finds all the leaked sessions wihthout OwningConnection, and after leaving the critical section starts to recover them one-by-one. When the second thread enters the critical section, some the leaked sessions found by first thread are very likely to still be in the m_leasedSessions, thus they are also found by the second thread, which means that now both threads will attempt to recover the same session
Thread that entered RecoverLeakedSessionsAsync has paused (due to long GC-pause/network latency/thread-pool starvation/you name it) for more than 1 second somewhere between leaving the critical section and recovering the last leaked session. In this case, after 1 second passes, another thread will be able to also enter the RecoverLeakedSessionsAsync method and see some of 1st thread's leaked sessions in m_leasedSessions
In both cases, ReturnAsync is called twice for the same session. This means that:
m_leasedSessions.Remove(session.Id) is invoked twice, which is fine, because the second invocation will just be a no-op
m_sessions.AddFirst(session) is invoked twice, which is our first issue
m_sessionSemaphore.Release() is invoked twice, which is our second issue
This might be related to #841. #305 might also be related, though I'm not sure
We encountered this in our production environment - I first thought that I was out of my mind when I saw MySqlConnector reporting 120 connections in the pool, given we use the default value of 100 for MaximumPoolSize, but after collecting some more diagnostics, this issue became apparent. It happens only when there is a connection leak (which we had). And though it does not lead to any significant issues, besides having a couple excess connections and some unclear exceptions every now and then, I still think that it is worth fixing.
As for the fix, I see three options. Ranged from the sipmlest to the most complex they are:
In ReturnAsync we check if m_leasedSessions.Remove(session.Id) actually deleted something. If it didn't delete anything, we just return. The downside of this option is that ReturnAsync going to still be invoked twice, the second invocation being a noop. In fact, I have prepared a PR Resolve race condition during the RecoverLeakedSessionsAsync #1262, implementing this option
In RecoverLeakedSessionsAsync we remove the leaked sessions from m_leasedSessions without leaving the critical section, so any successive calls to that method won't observe any of them. The downside of this option is that if anything goes wrong when disposing those connections, the dispose won't be retried, and they're gone for good
In RecoverLeakedSessionsAsync method, we put leaked sessions into some separate collection, designated specifically to be holding sessions to be recovered
ConnectionPool code contains race condition around the
RecoverLeakedSessionsAsyncfunctionality. This has two main consequences:m_sessionslinked list might end up containing the same session twice. Which then leads to that session being contained in bothm_sessionsandm_leasedSessions. Which in turn leads to theGetSessionAsynccall failing with a wierd-looking exceptionm_leasedSessionsdictionary might contain more sessions than the value ofCurrentCountproperty of them_sessionSemaphore, thus exceeding the configuredMaximumPoolSizeThis race condition occurs if two different threads concurrently enter
RecoverLeakedSessionsAsync.This can be encountered, for instance, when:
unchecked(((uint) Environment.TickCount) - m_lastRecoveryTime) >= 1000uastrue, both enter the methodRecoverLeakedSessionsAsync. First one to enter the critical section finds all the leaked sessions wihthoutOwningConnection, and after leaving the critical section starts to recover them one-by-one. When the second thread enters the critical section, some the leaked sessions found by first thread are very likely to still be in them_leasedSessions, thus they are also found by the second thread, which means that now both threads will attempt to recover the same sessionRecoverLeakedSessionsAsynchas paused (due to long GC-pause/network latency/thread-pool starvation/you name it) for more than 1 second somewhere between leaving the critical section and recovering the last leaked session. In this case, after 1 second passes, another thread will be able to also enter theRecoverLeakedSessionsAsyncmethod and see some of 1st thread's leaked sessions inm_leasedSessionsIn both cases,
ReturnAsyncis called twice for the same session. This means that:m_leasedSessions.Remove(session.Id)is invoked twice, which is fine, because the second invocation will just be a no-opm_sessions.AddFirst(session)is invoked twice, which is our first issuem_sessionSemaphore.Release()is invoked twice, which is our second issueThis might be related to #841. #305 might also be related, though I'm not sure
We encountered this in our production environment - I first thought that I was out of my mind when I saw MySqlConnector reporting 120 connections in the pool, given we use the default value of 100 for
MaximumPoolSize, but after collecting some more diagnostics, this issue became apparent. It happens only when there is a connection leak (which we had). And though it does not lead to any significant issues, besides having a couple excess connections and some unclear exceptions every now and then, I still think that it is worth fixing.As for the fix, I see three options. Ranged from the sipmlest to the most complex they are:
ReturnAsyncwe check ifm_leasedSessions.Remove(session.Id)actually deleted something. If it didn't delete anything, we just return. The downside of this option is thatReturnAsyncgoing to still be invoked twice, the second invocation being a noop. In fact, I have prepared a PR Resolve race condition during the RecoverLeakedSessionsAsync #1262, implementing this optionRecoverLeakedSessionsAsyncwe remove the leaked sessions fromm_leasedSessionswithout leaving the critical section, so any successive calls to that method won't observe any of them. The downside of this option is that if anything goes wrong when disposing those connections, the dispose won't be retried, and they're gone for goodRecoverLeakedSessionsAsyncmethod, we put leaked sessions into some separate collection, designated specifically to be holding sessions to be recovered