Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(db): race condition mdbx abort tx #6798

Closed
wants to merge 31 commits into from
Closed

Conversation

emhane
Copy link
Member

@emhane emhane commented Feb 26, 2024

Closes #6441.

  • Adds a list for aborted rw transactions, which follows the pattern for read transactions. Checks if a transaction is already aborted before calling ffi::mdbx_txn_abort(tx_ptr).
  • Changes method signature for adding aborted read transaction so that it returns whether or not the transaction was already present in the aborted list.
  • Sets read transactions for TxnManager before starting message listener. Before message listener would always run with read transactions set to None. (cherry-picked to main already)
  • Adds metrics for observing length of aborted list, from which entries are only removed if the "user tries to use" (or mdbx assigns when user opens new tx) this pointer again.

@emhane emhane added C-bug An unexpected or incorrect behavior A-db Related to the database labels Feb 26, 2024
Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_with_max_read_transaction_duration change is legit, I like it and believe it's fully implemented in #6809 now.

Regarding write transactions: we don't actually need to track them in any way, because we don't enforce any timeouts on them, and they can be running for however long they want. See bullet point 2 in #5228 for the reason why long-running read transactions are bad (and this is why we implemented a hard timeout on them).

@emhane emhane requested a review from gakonst as a code owner February 27, 2024 20:20
Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main things:

  1. ReadTransactions.aborted should only keep track of the transactions that were aborted due to a timeout, so we can check against it on MDBX_EBADSIGN and report a nice error.
  2. We still don't know if MDBX re-uses transaction pointers, and it's important because my previous assumption was that it doesn't.

Comment on lines 229 to 230
txn_manager.remove_aborted_read_transaction(txn).is_some() &&
err_code == ffi::MDBX_EBADSIGN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please explain this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it allows a txn ptr to be reused if mdbx deems it fit to reuse, i.e. error code is 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err_code == 0, but the txn stays in aborted txns, then aborting it will lead to Error::ReadTransactionAborted

Copy link
Collaborator

@shekhirin shekhirin Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the error code is not MDBX_EBADSIGN (maybe even 0 which is success), why do we want to remove it from the list of aborted transactions? MDBX_EBADSIGN is what usually returned by MDBX in case when we close the transaction and someone tries to re-use it later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no wait got that mixed up.

txn is aborted because time out, makes mdbx say "hey this ptr can be assigned"
open "new" txn and mdbx uses that same ptr, so err code is 0, so that check fails already at err_code == ffi::MDBX_EBADSIGN
txn stays in aborted list
user tries to abort the "new" txn, gets Error::TransactionAborted, very confused user

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes more sense to change order of that guard I think, then to remove from the aborted list when user tries to abort by drop or by TxnManagerMessage

Copy link
Collaborator

@shekhirin shekhirin Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user tries to abort the "new" txn, gets Error::TransactionAborted, very confused user

why would that happen? If a user tries to abort a new transaction, it will go through the mdbx_txn_abort just fine and return a code 0, I think?

Copy link
Member Author

@emhane emhane Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to its own scope #6844

crates/storage/libmdbx-rs/Cargo.toml Outdated Show resolved Hide resolved
crates/storage/libmdbx-rs/src/txn_manager.rs Outdated Show resolved Hide resolved
crates/storage/libmdbx-rs/src/txn_manager.rs Outdated Show resolved Hide resolved
crates/storage/libmdbx-rs/src/txn_manager.rs Outdated Show resolved Hide resolved
crates/storage/libmdbx-rs/src/txn_manager.rs Outdated Show resolved Hide resolved
@emhane
Copy link
Member Author

emhane commented Feb 29, 2024

closed in favour of #6850 (review).

@emhane emhane closed this Feb 29, 2024
@emhane emhane deleted the emhane/race-condition-mdbx branch February 29, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-db Related to the database C-bug An unexpected or incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in MDBX bindings
2 participants