Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using rayon under a Mutex can lead to deadlocks #592

Open
cuviper opened this issue Sep 5, 2018 · 15 comments
Open

Using rayon under a Mutex can lead to deadlocks #592

cuviper opened this issue Sep 5, 2018 · 15 comments

Comments

@cuviper
Copy link
Member

cuviper commented Sep 5, 2018

https://users.rust-lang.org/t/using-rayon-in-code-locked-by-a-mutex/20119

@kornelski presented the following pseudo-code representing a possible deadlock:

fn mutex_and_par() {
   some_mutex.lock().par_iter().collect();
}

collection.par_iter().for_each(mutex_and_par);

I think the way this fails is something like:

  • Thread 1 starts processing one of the for_each(mutex_and_par) calls.
    • grabs the lock
    • splits up the par_iter() into a few jobs
    • starts the collect() on one of those jobs
  • Another thread steals one of those jobs to help out
  • Thread 1 finishes its current job
    • looks for the other jobs and sees they're stolen elsewhere
    • goes to steal more work itself while it's waiting...
    • finds another of the for_each(mutex_and_par) calls
    • tries to grab the lock (again) ... DEADLOCK

If serialized, this code would never be a problem, but a work-stealing threadpool can cause it to implicitly recurse the lock. I'm not sure what we can do about this, or if there's any better advice we can give than just "don't call rayon under a mutex"...

@kornelski
Copy link
Contributor

kornelski commented Sep 5, 2018

Solutions that come to my mind:

  • Allow running a job/parallel iterator on a separate "private" threadpool. I guess current ThreadPool.install() won't cut it, as it's still global, and it'd merely delay the same cross-mutex stealing. For big chunks of work under a mutex wasting another set of threads would be an acceptable cost IMHO.

  • Have RayonAwareMutex. When code ran from a Rayon job tries to acquire the mutex, and the mutex is already locked, execution switches to another job instead of blocking. Maybe it could be generalized as rayon::yield() or such loop { if try_lock() { work(); break; } else {rayon::yield(); } }.

@nikomatsakis
Copy link
Member

Mostly, my expectation is that the answer here would be "don't do that", but it is sort of disappointing, since we would like you to be able to add par_iter kind of 'for free' and have it work out.

That doesn't realistically work around blocking things though, and mutexes fit that model of course.

@nikomatsakis
Copy link
Member

I think that using a ThreadPool::install inside the mutex would prevent the problem in this particular case, though..?

(Of course, you may not have control of all the relevant code.)

@cuviper
Copy link
Member Author

cuviper commented Sep 12, 2018

  • Allow running a job/parallel iterator on a separate "private" threadpool. I guess current ThreadPool.install() won't cut it, as it's still global, and it'd merely delay the same cross-mutex stealing.

Why do you call this global? It should install in whatever threadpool you call it on. As long as that's distinct from the threadpool that grabbed the mutex, and there's nothing in it that calls back to the mutex's threadpool, I think it would work.

@cuviper
Copy link
Member Author

cuviper commented Sep 12, 2018

  • Have RayonAwareMutex. When code ran from a Rayon job tries to acquire the mutex, and the mutex is already locked, execution switches to another job instead of blocking. Maybe it could be generalized as rayon::yield() or such loop { if try_lock() { work(); break; } else {rayon::yield(); } }.

I think this would only help if they had separate stacks, otherwise your try_lock loop will still be sitting on top of the context that's holding the lock. You'd basically have to switch stacks every time you steal a job -- which I think @Zoxc did in the rustc-rayon fork to solve similar cross-cutting dependencies.

@kornelski
Copy link
Contributor

Why do you call this global?

Sorry, I misread it as ThreadPool::install() rather than ThreadPool::install(callback). I'll try that solution then.

@cuviper
Copy link
Member Author

cuviper commented Sep 12, 2018

Oh, actually, I think ThreadPool::install won't solve it -- might even make the deadlock more sure to happen. #449 made it so calls into other threadpools won't block the current pool. So when your mutex-holder installs to another pool, it will immediately look for more local work while it waits.

@kornelski
Copy link
Contributor

I've ran into this problem again (or rather, I've never fixed it and was lucky not to get a deadlock in the meantime)

I've tried:

with_mutex(|| {            
    rayon::ThreadPoolBuilder::new()
    .build()
    .unwrap()
    .install(|| {
        parallel_processing()
    })
})

but as you've said, it didn't work around the issue.

All threads from the global pool are waiting for the mutex (which is entirely expected in my case), but then all the threads from the second pool end up locked at:

std::sys::unix::condvar::Condvar::wait::hc2a73c745cc299a2 [inlined]
std::sys_common::condvar::Condvar::wait::h657cb2db59f167a7 [inlined]
std::sync::condvar::Condvar::wait::h53da7761c4d81005 [inlined]
rayon_core::sleep::Sleep::sleep::haa73a51ff31e5141
rayon_core::sleep::Sleep::no_work_found::hde4ba6d7ec74532c [inlined]
rayon_core::registry::WorkerThread::wait_until_cold::hfeafc7e947f56fc0

@kornelski
Copy link
Contributor

Is there any hope of this being fixed in Rayon? I don't know if I should wait for a fix, or remove Rayon from the mutex-using part of the code.

@cuviper
Copy link
Member Author

cuviper commented Apr 30, 2019

I don't have a sense of what the fix should be, nevermind giving any ETA on it.

@kornelski
Copy link
Contributor

The fixes I suppose could solve it:

  • rayon::yield() that makes the current thread execute any pending rayon tasks on the current thread. I would use it something like loop {if mutex.try_lock() {…} else {rayon::yield()}}

  • rayon::Mutex that does the above, possibly with lesser chance of race conditions or spinning.

  • A version of ThreadPool::install that is truly separate and isolated.

@cuviper
Copy link
Member Author

cuviper commented May 10, 2019

  • rayon::yield() that makes the current thread execute any pending rayon tasks on the current thread. I would use it something like loop {if mutex.try_lock() {…} else {rayon::yield()}}

That use would still be a deadlock-loop if the lock holder -- or anything the lock holder is waiting for -- is blocked by you on the call stack. They'll never complete until you return to them.

  • A version of ThreadPool::install that is truly separate and isolated.

I think this is feasible.

@cuviper
Copy link
Member Author

cuviper commented Jun 27, 2019

If you're really only reading the mutexed data in the thread pool, you could also consider a RwLock, as that does allow reentrant readers.

@naim94a
Copy link

naim94a commented Mar 29, 2020

If this isn't going to be fixed, I would suggest documenting it as a known issue and adding a warning to par_* documentation.

I just spent nearly two hours debugging a dead-lock before finding this issue 😢

@nhukc
Copy link

nhukc commented Jun 18, 2024

I was able to get something like this working by adding a feature for full_blocking thread pools. This prevents the #449 behavior when a pool is created with the full_blocking feature.

use std::sync::{Arc, Mutex};
use rayon::ThreadPoolBuilder;
use rayon::iter::IntoParallelRefIterator;
use rayon::iter::ParallelIterator;

fn mutex_and_par(mutex: Arc<Mutex<Vec<i32>>>, blocking_pool: &rayon::ThreadPool) {
    // Lock the mutex and collect items using the full blocking thread pool
    let vec = mutex.lock().unwrap();
    let result: Vec<i32> = blocking_pool.install(|| vec.par_iter().cloned().collect());
    println!("{:?}", result);
}

#[test]
fn test_issue592() {
    let collection = vec![1, 2, 3, 4, 5];
    let mutex = Arc::new(Mutex::new(collection));

    let blocking_pool = ThreadPoolBuilder::new().full_blocking().num_threads(4).build().unwrap();

    let dummy_collection: Vec<i32> = (1..=100).collect();
    dummy_collection.par_iter().for_each(|_| {
        mutex_and_par(mutex.clone(), &blocking_pool);
    });
}

See for more details:
#1175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants