-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryLocker deadlock #1603
Comments
Good job! |
@xingfudeshi |
Hey.Thanks for your work.I'd like to merge this PR.But we need at least two approvals from committer. |
Thank you for your review request. |
@BeiKeJieDeLiuLangMao It must obtain a database lock before getting a global lock. So for BT1 it needs to get all the db locks of table1 (5 row locks), and then try to get the global lock. At the same time, BT2 can't get any db lock in 5 db locks, so it can't do the global lock acquisition in step 2. |
@slievrly If BT1 send branchRegister request successfully, then crash, db locks released. After that, BT2 send a branchRegister request with same locks but different order. Dead lock still could happen. |
Ⅰ. Issue Description
Deadlock happened in
io.seata.server.lock.memory.MemoryLocker
, when two branch transactions of same TM concurrently execute, and branch transaction1(BT1) wanna acquire lock with lockKeys:table1:1,2,3,4,5
, BT2 wanna acquire lock with lockKeys:table1:5,4,3,2,1
.Ⅱ. Describe what happened
My analysis result is:
In
io.seata.server.lock.memory.MemoryLocker#acquireLock
, it will hold the bucketLockMap's monitor lock then do the acquire lock process, if PK is locked by other transaction, it means lock conflict happened, and then it will release all branch transaction acquired locks. But the point is it release all acquired locks with the bucket monitor lock holding. If two branch transaction acquire locks in a different order, deadlock happen.Go back to what i described before, BT1 wanna acquire PK locks
1,2,3,4,5
,BT2 wanna acquire PK locks5,4,3,2,1
, they execute concurrently:1,2
5,4
3
, and BT1 actually finish it4
, but finds it locked by BT2, then BT1 tries to release PK locks1,2,3
, at this time BT1 holds the PK lock4
's bucket monitor lock3
, but finds it locked by BT1, then BT2 tries to release PK locks5,4
, at this time BT2 holds the PK lock3
's bucket monitor lock4
's bucket lock and wanna release3
(which need3
's bucket monitor lock), BT2 holds3
's bucket lock and wanna release4
, deadlock happenedⅢ. Describe what you expected to happen
MemoryLocker could work without deadlock.
Ⅳ. How to reproduce it (as minimally and precisely as possible)
io.seata.server.lock.LockManagerTest
deadlockBranchSessionsProvider
and functiondeadlockTest
toLockManagerTest
deadlockTest
I add some logs in
MemoryLocker
, these logs could prove my analysis:Ⅴ. Anything else we need to know?
This bug could been resolved by moving the lock release process to the out of bucket monitor lock scope:
There is another solution to fix deadlock: sort PK list in alphabetical order before acquire these locks, this solution not only could resolve this problem, but also bring another advantage, i will talk about that later. Now go back to lock release process, i think it's necessary to take it out from bucket lock scope, because there isn't the place it should to be.It could reduce code risk of deadlock, and reduce bucket monitor lock's holding duration.
Now let's talk about advantage of sorting PK lock list. If two branch transaction concurrently execute with different PK list order, these two BTs will both failed, of course RM will retry to acquire PK locks, but may failure again, and in extreme cases, such a scenario will always be staged.Once you make these two branch transaction execute with same PK list order, always one BT could execute normally.
Acquire locks in same order:
Acquire locks in different order:
In summary, i think sort PK list before acquire locks is vary helpful. If you guys highly care about TC's performance, could do this work in RM and make sure PK list order could been kept until acquiring locks.
Ⅵ. Environment:
The text was updated successfully, but these errors were encountered: