Fix error from transactiondb layer in stress test#13950
Fix error from transactiondb layer in stress test#13950xingbowang wants to merge 3 commits intofacebook:mainfrom
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D82373959. |
Not sure if I understand this. Are you suggesting that many threads at the same time can write to the same key? As far as I know, there should be only 1 thread writing and other threads can read (with some degree of error allowed). Can you walk me through how a deadlock can be produced? And what layer of deadlock is it - ExpectedValue in stress test or something else? |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
|
Discussed offline, MaybeAddKeyToTxnForRYW is called by TestMultiGetEntity, which could be executed by multiple threads concurrently. When they perform write, they take X lock on the key. There is no client side synchronization, so it is possible to lock same set of keys in different order, run into deadlock or timeout(when deadlock detection is disabled, and a transaction timeout is configured). |
| // It is possible that multiple thread concurrently try to write to the | ||
| // same key, which could cause lock timeout or deadlock, even before | ||
| // transaction is rolled back. E.g. | ||
| // Timestamp 1: Transaction A: lock key M for write | ||
| // Timestamp 2: Transaction B: lock key N for write | ||
| // Timestamp 3: Transaction B: try to lock key M for write -> wait | ||
| // Timestamp 4: Transaction A: try to lock key N for write -> deadlock |
There was a problem hiding this comment.
Great commit - can you further specifying it's the lock/deadlock in transaction layer.
There was a problem hiding this comment.
Thanks for fixing it. I would also update the PR summary to be clear on it's the transaction layer locking issue.
Copied from offline conversation. This PR is supposed to not raise deadlock in this case.
// In transaction layer the following can happen
TS1: TXN A: lock key M for write
TS2: TXN B: lock key N for write
TS3: TXN B: try to lock key M for write
TS4: TXN A: try to lock key N for write -> deadlock
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D82373959. |
|
@xingbowang merged this pull request in 95813a8. |
Summary:
The stress test runs concurrent transactions through many threads at the same time on a shared key space. It is possible that a dead lock or a timeout is detected from the transactiondb layer. When this happens, simply return from the function and continue the test, instead of fail the test.
Test Plan:
Stress test pass locally with the same random seed from stress test 14723229280871643749.
Reviewers:
Subscribers:
Tasks:
Tags: