Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

servers, util: fix deadlock caused by conflicting lock order #3340

Merged
merged 1 commit into from Jun 8, 2020

Conversation

@BurtonQin
Copy link
Contributor

@BurtonQin BurtonQin commented Jun 4, 2020

  • Description:
    This PR fixes two deadlock bugs caused by conflicting lock order in stratumserver and log.
    In servers/src/mining/stratumserver.rs
    add_worker() calls stratum_stats.write() before workers_list.write()
    pub fn add_worker(&self, tx: Tx) -> usize {
    let mut stratum_stats = self.stratum_stats.write();
    let worker_id = stratum_stats.worker_stats.len();
    let worker = Worker::new(worker_id, tx);
    let mut workers_list = self.workers_list.write();

    remove_worker() calls workers_list.read() before stratum_stats.write()
    self.stratum_stats.write().num_workers = self.workers_list.read().len();

    When add_worker() and remove_worker() are called simultaneously, the following deadlock may happen:
Thread-A Thread-B
A.lock()
B.lock()
B.lock()//deadlock!
A.lock()//deadlock!

Similarly,
init_logger() calls LOGGING_CONFIG.lock() before WAS_INIT.lock()

let mut config_ref = LOGGING_CONFIG.lock();

let mut was_init_ref = WAS_INIT.lock();

init_test_logger() calls WAS_INIT.lock() before LOGGING_CONFIG.lock()

grin/util/src/logger.rs

Lines 261 to 262 in c7c9a32

pub fn init_test_logger() {
let mut was_init_ref = WAS_INIT.lock();

let mut config_ref = LOGGING_CONFIG.lock();

  • How I fix:
    The fix is to enforce the order of the locks.

In stratumserver.rs, I use only one write lock of workers_list and remove the following read lock. This is to avoid possible atomicity violation when workers_list is written by another thread before being read.
In logger.rs, WAS_INIT is lifted before LOGGING_CONFIG in remove_worker to prevent the interleaving of init_logger() and init_test_logger().
Note that dropping the lockguard of Lock-A before calling Lock-B is also a possible solution for conflicting lock order. But I did not use it here to prevent possible atomicity violation.

  • Minor suggestion:
    When the critical sections of different locks may overlap, I suggest always locking in the order that they are declared to prevent such deadlocks.

There is only one thing that concerns me:
fn remove_worker() calls update_stats() where stratum_stats.write() is called. Then stratum_stats.write() is called again. Is it okay to have the two critical sections interleaved with remove_worker() or add_worker() from another thread?

@lehnberg
Copy link
Collaborator

@lehnberg lehnberg commented Jun 5, 2020

Ping @hashmap - you might find this up your alley

Copy link
Member

@quentinlesceller quentinlesceller left a comment

Thank you for the PR @BurtonQin !

  1. This makes sense. 👍
  2. Is likely to never happen since init_logger and init_test_logger should never been called together. But overall 👍 for safety sake.

There is only one thing that concerns me:
fn remove_worker() calls update_stats() where stratum_stats.write() is called. Then stratum_stats.write() is called again. Is it okay to have the two critical sections interleaved with remove_worker() or add_worker() from another thread?

Could you details what you think is the problem here?

Overall 👍 LGTM

@BurtonQin
Copy link
Contributor Author

@BurtonQin BurtonQin commented Jun 6, 2020

Well, after a second thought, I did not find any problem.

@quentinlesceller quentinlesceller merged commit 992d450 into mimblewimble:master Jun 8, 2020
10 checks passed
10 checks passed
@azure-pipelines
mimblewimble.grin Build #20200604.6 succeeded
Details
@azure-pipelines
mimblewimble.grin (linux api/util/store) linux api/util/store succeeded
Details
@azure-pipelines
mimblewimble.grin (linux chain/core/keychain) linux chain/core/keychain succeeded
Details
@azure-pipelines
mimblewimble.grin (linux pool/p2p/src) linux pool/p2p/src succeeded
Details
@azure-pipelines
mimblewimble.grin (linux release) linux release succeeded
Details
@azure-pipelines
mimblewimble.grin (linux servers) linux servers succeeded
Details
@azure-pipelines
mimblewimble.grin (macos release) macos release succeeded
Details
@azure-pipelines
mimblewimble.grin (macos test) macos test succeeded
Details
@azure-pipelines
mimblewimble.grin (windows release) windows release succeeded
Details
@azure-pipelines
mimblewimble.grin (windows test) windows test succeeded
Details
@quentinlesceller
Copy link
Member

@quentinlesceller quentinlesceller commented Jun 8, 2020

@BurtonQin thank you so much for the PR 🎆. This is merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants