Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database: Potential data race for mutual exclusion #9193

Closed
hinto-janai opened this issue Feb 21, 2024 · 15 comments
Closed

database: Potential data race for mutual exclusion #9193

hinto-janai opened this issue Feb 21, 2024 · 15 comments
Labels

Comments

@hinto-janai
Copy link

What

There may be a data race when needing to acquire mutual exclusive access to monerod's database, e.g. when resizing.

Invariant

When resizing LMDB's memory map, the caller must ensure they have mutual exclusive access to it.

As per mdb_env_set_mapsize() docs:

This function [...] may be called at later times if no transactions are active in this process. Note that the library does not check for this condition, the caller must ensure it explicitly.

An error is returned if there are other write transactions and presumably UB will occur if read transaction(s) exist.

Implementation

The solution to this was implemented in #289, and is still used today. My understanding of this code is:

There are 2 atomic values used to achieve mutual exclusive access to the database:

std::atomic<uint64_t> mdb_txn_safe::num_active_txns{0};
std::atomic_flag mdb_txn_safe::creation_gate = ATOMIC_FLAG_INIT;

The atomic bool is used to indicate "do not enter, we are resizing". Before starting a transaction, this bool will be spinned on until it is false. It is set to true when LMDB is resizing:

void mdb_txn_safe::prevent_new_txns()
{
while (creation_gate.test_and_set());
}

When resizing, this prevents new transactions from starting.

To handle currently active transactions, each transaction will num_active_txns++ after successfully passing the atomic bool, and will num_active_txns-- when done. The thread resizing will spin until num_active_txns is 0, indicating there are no more transactions:

void mdb_txn_safe::wait_no_active_txns()
{
while (num_active_txns > 0);
}

Now, with no new transactions allowed, and all current ones gone, we should have mutual exclusive access to the database and resizing should be OK.

Problem

2 atomic operations back-to-back are not atomic. Other threads are free to execute in-between 2 atomic operations. What may occur in the above implementation is such:

  Thread 1 (resizing)                             Thread 2 (normal read tx)
          |                                                 |
          |                                                 |
          |                                           lmdb_txn_begin()
          |                                     "not resizing, OK to start tx"
  prevent_new_txns()                                        |
"no new tx's can start"                                     |
          |                                                 |
 wait_no_active_txns()                                      |
  "spin until 0 tx's"                                       |
          |                                          num_active_txns++
          |                                                 |
 mdb_env_set_mapsize()   <--- ⚠️ unsafe ⚠️ --->   start_doing_tx_stuff()

There is space in-between Thread 2 succesfully entering a transaction and updating num_active_txns.

If Thread 2 is scheduled out by the OS after lmdb_txn_begin() but before num_active_txns++ (unlikely but non-zero chance) Thread 1 will incorrectly assume it has mutual exclusive access and start the resize.

mdb_txn_safe::wait_no_active_txns();
int result = mdb_env_set_mapsize(m_env, new_mapsize);

Some crashes occurring near a resize that could be explained by this:

@0xFFFC0000
Copy link
Collaborator

0xFFFC0000 commented Feb 22, 2024

In case you use blockchain.cpp to interact with db_lmdb.cpp, 9181 does fix this. Since no read transaction can start while there is a write transaction going on, and write transactions cannot start until acquires an exclusive lock on db.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

@hinto-janai I believe you mis-read the code:

while (creation_gate.test_and_set());
   num_active_txns++;
   creation_gate.clear();

Thread1 is blocked from advancing until creation_gate.clear() is called. This is basically just a spin mutex protecting num_active_txns.

The patch by @0xFFFC0000 is likely overkill, and not needed (that's just my knee-jerk reaction at this point).

@selsta
Copy link
Collaborator

selsta commented Feb 22, 2024

@vtnerd this isn't related to this specific issue but sometimes when there is heavy RPC traffic it's possible that the node can't add new blocks and falls behind due to the current locking mechanism. moneromooo suggested a while ago that a rw lock would help here.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

And to describe why the example doesn't work as @hinto-janai suggested - lmdb_txn_begin() should have creation_gate as true, so Thread1 is blocked in prevent_new_txs because the prior value is true until after num_active_txns++ is modified.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

@vtnerd this isn't related to this specific issue but sometimes when there is heavy RPC traffic it's possible that the node can't add new blocks and falls behind due to the current locking mechanism. moneromooo suggested a while ago that a rw lock would help here.

Yes, this may have some resource starvation issues since it's a dirty spin lock instead of some queued lock. But there is no data race afaik.

I'm not certain that #9181 fixes this though, I think the original spin lock code would have to be removed as well.

@0xFFFC0000
Copy link
Collaborator

The patch by @0xFFFC0000 is likely overkill, and not needed (that's just my knee-jerk reaction at this point).

I believe if you read the PR carefully, you will realize that PR is addressing another issue. As a side-effect, it does solve the data race problem on lmdb_db too if you are accessing lmdb via blockchain.cpp APIs.

If you want to go into more technical depth, I am happy to do it :)

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

I'm not certain that #9181 fixes this though, I think the original spin lock code would have to be removed as well.

Scratch that, #9181 probably works, but we just wasting cycles on atomic operations.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

@0xFFFC0000 well there is no data race where @hinto-janai describes, so you are saying there is yet another one? I'm doubting this as well, but sure why not enlighten me.

@0xFFFC0000
Copy link
Collaborator

@0xFFFC0000 well there is no data race where @hinto-janai describes, so you are saying there is yet another one? I'm doubting this as well, but sure why not enlighten me.

I did specifically talk about rw access from blockchain.cpp.

About the data race, directly from lmdb_db, as hinto explains in his comment, I have not tried to prove (or disprove) it either.

Keep in mind, if you are interacting with lmdb_db via blockchain.cpp, that data race cannot happen due to rwlock.

Though I am tracking you and @hinto-janai discussion, to understand the situation.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

And sorry for being a bit rude, I'm just crabby that I have to review this DB change :/

@0xFFFC0000
Copy link
Collaborator

0xFFFC0000 commented Feb 22, 2024

And sorry for being a bit rude, I'm just crabby that I have to review this DB change :/

No no, I didn't even realize or notice anything rude from you :)

We were just discussing technical matters as always. Don't mention it at all. Looking forward to continuing our discussion. Because (possibility of) data race is a multi-level issue, we have to address it.

@hyc
Copy link
Collaborator

hyc commented Feb 22, 2024

@vtnerd is correct in his first reply #9193 (comment). Because of the first creation_gate.test_and_set() when mdb_txn_safe begins, prevent_new_txns() cannot proceed until the creation_gate.clear(). There is no race there.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 22, 2024

@0xFFFC0000 is this about the potential starvation issue that @selsta mentioned or some other data race? I still don't know why #9181 was created (perhaps the discussion should move to that PR). It sounds like #9181 was not created due to the specific issue referenced by @hinto-janai in this issue.

@0xFFFC0000
Copy link
Collaborator

0xFFFC0000 commented Feb 22, 2024

It sounds like #9181 was not created due to the specific issue referenced by @hinto-janai in this issue.

Exactly, the 9181 is for a different issue. But a side-effect of that locking is, as I mentioned in my first comment, if there was a data race in lmdb_db, it would've not happened if you were accessing it via Blockchain.cpp APIs, since it does have its own locking mechanism.

I am available to discuss it on the PR page.

@hinto-janai
Copy link
Author

@vtnerd Ah you are correct. I was looking at lmdb_resized() where that constructor doesn't run so the 2 atomics being checked appear like an ABA problem. Although, every transaction (in db_lmdb.cpp at least) seems to be guarded by mdb_txn_safe at some point in the call stack.

There's BlockchainLMDB::do_resize() which doesn't use mdb_txn_safe but seems like that's guarded by a lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants