Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock when command is being cancelled from two threads #1131

Merged
merged 4 commits into from
Feb 5, 2022

Conversation

ahydrax
Copy link
Contributor

@ahydrax ahydrax commented Feb 5, 2022

Hi there,
A couple of days ago we encountered a deadlock inside mysql connector which lead our system to downtime.

What happened?
Our monitoring detected that there are no new writes are happened to database in about 10 minutes. We dumped process memory and started looking for a problem. It turned out that deadlock happened in TimerQueue and connection Session.

The problem
The problem is that timer callback is being called under TimerQueue lock. Since callback can be almost any user code no one restricts it from acquiring other locks inside which exactly happened in our case.
ServerSession.TryStartCancel and ServerSession.DoCancel already uses a lock inside to protect internal session state and when it's called from TimerQueue it also implicitly acquires the lock on TimerQueue.
I attached screenshots which clearly illustrates the problem.
1
2
4

Reproduction
I was able to create a pretty stable repro. All you need is to make cancellation happenned simultaneously both from external cancellation source (CancellationTokenSource with timeout/HttpContext.RequestAborted etc.) and command timeout watchdog.
You can examine the code here https://github.com/ahydrax/StableMysqlConnectorDeadlockRepro

The solution
From my perspective solution should be pretty simple: timer queue calls pending callbacks outside of TimerQueue internal lock.

Closing notes
After fixing deadlock I've encountered a small bug inside Session.TryStartCancel which didn't respect all command states inside during state verifying.

Signed-off-by: Viktor Svyatokha <ahydrax@gmail.com>
Signed-off-by: Viktor Svyatokha <ahydrax@gmail.com>
Signed-off-by: Viktor Svyatokha <ahydrax@gmail.com>
Signed-off-by: Viktor Svyatokha <ahydrax@gmail.com>
@bgrainger
Copy link
Member

Thanks for the diagnosis and fix!

@bgrainger bgrainger merged commit 69e8353 into mysql-net:master Feb 5, 2022
@ahydrax ahydrax deleted the timer-queue-deadlock-fix branch February 6, 2022 11:25
@bgrainger
Copy link
Member

Fixed in 2.1.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants