Skip to content

Fix error from transactiondb layer in stress test#13950

Closed
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:plm_stress_fix
Closed

Fix error from transactiondb layer in stress test#13950
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:plm_stress_fix

Conversation

@xingbowang
Copy link
Contributor

@xingbowang xingbowang commented Sep 12, 2025

Summary:

The stress test runs concurrent transactions through many threads at the same time on a shared key space. It is possible that a dead lock or a timeout is detected from the transactiondb layer. When this happens, simply return from the function and continue the test, instead of fail the test.

Test Plan:

Stress test pass locally with the same random seed from stress test 14723229280871643749.

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@meta-cla meta-cla bot added the CLA Signed label Sep 12, 2025
@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D82373959.

@xingbowang xingbowang marked this pull request as ready for review September 13, 2025 11:48
@hx235
Copy link
Contributor

hx235 commented Sep 15, 2025

The stress test runs concurrent transactions through many threads at the same time on a shared key space. It is possible that a dead lock is detected. When dead lock handle, handle them gracefully, instead of fail the test.

Not sure if I understand this. Are you suggesting that many threads at the same time can write to the same key? As far as I know, there should be only 1 thread writing and other threads can read (with some degree of error allowed). Can you walk me through how a deadlock can be produced? And what layer of deadlock is it - ExpectedValue in stress test or something else?

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@xingbowang
Copy link
Contributor Author

Discussed offline, MaybeAddKeyToTxnForRYW is called by TestMultiGetEntity, which could be executed by multiple threads concurrently. When they perform write, they take X lock on the key. There is no client side synchronization, so it is possible to lock same set of keys in different order, run into deadlock or timeout(when deadlock detection is disabled, and a transaction timeout is configured).

Comment on lines +3200 to +3206
// It is possible that multiple thread concurrently try to write to the
// same key, which could cause lock timeout or deadlock, even before
// transaction is rolled back. E.g.
// Timestamp 1: Transaction A: lock key M for write
// Timestamp 2: Transaction B: lock key N for write
// Timestamp 3: Transaction B: try to lock key M for write -> wait
// Timestamp 4: Transaction A: try to lock key N for write -> deadlock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great commit - can you further specifying it's the lock/deadlock in transaction layer.

Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing it. I would also update the PR summary to be clear on it's the transaction layer locking issue.

Copied from offline conversation. This PR is supposed to not raise deadlock in this case.

// In transaction layer the following can happen
TS1: TXN A: lock key M for write
TS2: TXN B: lock key N for write
TS3: TXN B: try to lock key M for write
TS4: TXN A: try to lock key N for write -> deadlock

@xingbowang xingbowang changed the title Fix deadlock handling in stress test Fix deadlock timeout handling in stress test Sep 16, 2025
@xingbowang xingbowang changed the title Fix deadlock timeout handling in stress test Fix error from transactiondb layer in stress test Sep 16, 2025
@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D82373959.

@facebook-github-bot
Copy link
Contributor

@xingbowang merged this pull request in 95813a8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants