Skip to content

ConnectionPool code contains race condition #1263

@eranikid

Description

@eranikid

ConnectionPool code contains race condition around the RecoverLeakedSessionsAsync functionality. This has two main consequences:

  1. The m_sessions linked list might end up containing the same session twice. Which then leads to that session being contained in both m_sessions and m_leasedSessions. Which in turn leads to the GetSessionAsync call failing with a wierd-looking exception
  2. The m_leasedSessions dictionary might contain more sessions than the value of CurrentCount property of the m_sessionSemaphore, thus exceeding the configured MaximumPoolSize

This race condition occurs if two different threads concurrently enter RecoverLeakedSessionsAsync.

This can be encountered, for instance, when:

  1. Both threads evaluate the condition unchecked(((uint) Environment.TickCount) - m_lastRecoveryTime) >= 1000u as true, both enter the method RecoverLeakedSessionsAsync. First one to enter the critical section finds all the leaked sessions wihthout OwningConnection, and after leaving the critical section starts to recover them one-by-one. When the second thread enters the critical section, some the leaked sessions found by first thread are very likely to still be in the m_leasedSessions, thus they are also found by the second thread, which means that now both threads will attempt to recover the same session
  2. Thread that entered RecoverLeakedSessionsAsync has paused (due to long GC-pause/network latency/thread-pool starvation/you name it) for more than 1 second somewhere between leaving the critical section and recovering the last leaked session. In this case, after 1 second passes, another thread will be able to also enter the RecoverLeakedSessionsAsync method and see some of 1st thread's leaked sessions in m_leasedSessions

In both cases, ReturnAsync is called twice for the same session. This means that:

  • m_leasedSessions.Remove(session.Id) is invoked twice, which is fine, because the second invocation will just be a no-op
  • m_sessions.AddFirst(session) is invoked twice, which is our first issue
  • m_sessionSemaphore.Release() is invoked twice, which is our second issue

This might be related to #841. #305 might also be related, though I'm not sure

We encountered this in our production environment - I first thought that I was out of my mind when I saw MySqlConnector reporting 120 connections in the pool, given we use the default value of 100 for MaximumPoolSize, but after collecting some more diagnostics, this issue became apparent. It happens only when there is a connection leak (which we had). And though it does not lead to any significant issues, besides having a couple excess connections and some unclear exceptions every now and then, I still think that it is worth fixing.

As for the fix, I see three options. Ranged from the sipmlest to the most complex they are:

  • In ReturnAsync we check if m_leasedSessions.Remove(session.Id) actually deleted something. If it didn't delete anything, we just return. The downside of this option is that ReturnAsync going to still be invoked twice, the second invocation being a noop. In fact, I have prepared a PR Resolve race condition during the RecoverLeakedSessionsAsync #1262, implementing this option
  • In RecoverLeakedSessionsAsync we remove the leaked sessions from m_leasedSessions without leaving the critical section, so any successive calls to that method won't observe any of them. The downside of this option is that if anything goes wrong when disposing those connections, the dispose won't be retried, and they're gone for good
  • In RecoverLeakedSessionsAsync method, we put leaked sessions into some separate collection, designated specifically to be holding sessions to be recovered

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions