Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

thread blocked indefinitely in an MVar operation with parMap #23

Closed
Shimuuar opened this Issue · 16 comments

7 participants

Aleksey Khudyakov Bryan O'Sullivan Simon Marlow Ken Friis Larsen Ben Gamari Carter Tazio Schonwald Ryan Newton
Aleksey Khudyakov

Bug was originally reported against crierion. But Till Berger created reduced test case:

import Control.Monad.Par

test :: [Int] -> IO [Int]
test xs = do
    let list = runPar $ parMap (\x -> x + 1) xs
    putStrLn $ show list
    test list

main = do
    test [1]

If compiled with -threaded program fails after a few (5-30) iterations with message thread blocked indefinitely in an MVar operation or occasionaly with message Impossible state in globalWorkComplete. If it's compiled without threadng it still fails but it require much more iterations (tens of thousands)

Bryan O'Sullivan
bos commented

Still a concern here.

Simon Marlow
Owner

This is likely the same as issue #21. Sorry about this - we've known about the problem for a while, but unfortunately the code in question was written by Daniel Winograd-Court during his internship at Microsoft and it is a bit inscrutable.

There are workarounds:

  1. Use the direct scheduler: import Control.Monad.Par.Scheds.Direct instead of Control.Monad.Par
  2. Use monad-par-0.1.0.3 instead of 0.3

I propose to do one of the following (Ryan, please let me know your preference): either

  1. make the direct scheduler the default, for the time being, or
  2. go back to the original non-nested Trace scheduler from 0.1.0.3
Ken Friis Larsen

It is now even necessary to use parMap a simple spawn will do.

For instance,

import Data.List(foldl')
import qualified Control.Monad.Par as P

psum :: [Int] -> Int
psum xs = foldl' fun 0 xs
  where fun acc i = P.runPar $ (P.spawn.return $ i+acc) >>= P.get >>= return

main = do
    print $ psum [1..128]

Compiled with -threaded will fail with thread blocked indefinitely in an MVar operation. Even with +RTS -N1.

But as @simonmar says, using Control.Monad.Par.Scheds.Direct seems to fix it.

Simon Marlow
Owner

After looking at this a bit more, I'm not sure it has anything to do with nesting. There's no nesting going on in this particular example, unlike #21. I think it's just a flat-out bug, triggered by a particular interleaving of threads while runPar is shutting down. Here is the RTS debugging output:

2b3dcd264b40: cap 0: running thread 3 (ThreadRunGHC)
2b3dcd264b40: cap 0: created thread 11
2b3dcd264b40: cap 0: thread 3 stopped (blocked on an MVar)
        thread    3 @ 0x2b3dcda05ee0 is blocked on an MVar @ 0x2b3dcda16f50 (TSO
_DIRTY)
2b3dcd264b40: giving up capability 0
2b3dcd264b40: passing capability 0 to worker 0x2b3dcdf01700
2b3dcdf01700: woken up on capability 0
2b3dcdf01700: resuming capability 0
2b3dcdf01700: cap 0: running thread 11 (ThreadRunGHC)
2b3dcdf01700: cap 0: waking up thread 3 on cap 0
2b3dcdf01700: cap 0: thread 11 stopped (yielding)
2b3dcdf01700: giving up capability 0
2b3dcdf01700: passing capability 0 to bound task 0x2b3dcd264b40
2b3dcd264b40: woken up on capability 0
2b3dcd264b40: resuming capability 0
2b3dcd264b40: cap 0: running thread 3 (ThreadRunGHC)
2b3dcd264b40: cap 0: thread 3 stopped (blocked on an MVar)
        thread    3 @ 0x2b3dcda05ee0 is blocked on an MVar @ 0x2b3dcda18828 (TSO
_DIRTY)
2b3dcd264b40: giving up capability 0
2b3dcd264b40: passing capability 0 to worker 0x2b3dcdf01700
2b3dcdf01700: woken up on capability 0
2b3dcdf01700: resuming capability 0
2b3dcdf01700: cap 0: running thread 11 (ThreadRunGHC)
2b3dcdf01700: cap 0: thread 11 stopped (finished)
2b3dcdf01700: giving up capability 0
2b3dcdf01700: freeing capability 0
2b3dcdd00700: returning; I want capability 0
2b3dcdd00700: resuming capability 0
2b3dcdd00700: cap 0: running thread 2 (ThreadRunGHC)
2b3dcdd00700: cap 0: thread 2 stopped (suspended while making a foreign call)
2b3dcdd00700: passing capability 0 to worker 0x2b3dcdf01700
2b3dcdf01700: woken up on capability 0
2b3dcdf01700: resuming capability 0
2b3dcdf01700: deadlocked, forcing major GC...

thread 11 is the Par monad thread, thread 3 is the main thread. Thread 11 wakes up thread 3, and then yields (this seems to be crucial). Then thread 3 gets blocked again, and never wakes up.

I don't understand the nested trace scheduler well enough to say why, but maybe this will help Daniel.

Ben Gamari

Any progress here?

Simon Marlow
Owner

@rrnewton is preparing a release that will have the fix (workaround actually). See #26.

Simon Marlow
Owner

Also, I backed off the trace scheduler to the non-nested version (18e1968), because the nested version has at least two separate bugs (this one and #21).

Simon Marlow
Owner

Released version 0.3.4 that doesn't suffer from this bug.

Simon Marlow simonmar closed this
Carter Tazio Schonwald

I seem to have this bug or something very much like it happen with criterion for me today with the new haskell platform release when running criterion. I'll try and see if its the same one or not

@bos
@simonmar

Carter Tazio Schonwald

the test case at the top of this ticket doesn't trigger the problem, will investigate more, might be a criterion side problem instead.

Simon Marlow
Owner

@cartazio: this ticket is closed, we released a version of monad-par without the bug (0.3.4). Maybe you're using an older version?

Carter Tazio Schonwald

@simonmar I'm on the haskell platform. the one released last week

it might be an unrelated problem in criterion that triggers a similar error message.

The test case at the opening of the ticket doesn't seem to trigger the bug, but building my criterion test suite with -threaded triggers the error.

Might not be a monad-par bug, but if i can figure out a simple small repro, i'll share it here as well as opening an suitable criterion ticket

Ryan Newton
Collaborator
Ryan Newton
Collaborator
Carter Tazio Schonwald

@rrnewton bos/criterion#28 heres the repro

i'm just using vanilla haskell platform 64bit released last week on my mac

Carter Tazio Schonwald

#31 is my repro with current monad par (i've not had the time to unpeal the statistics / criterion wrapper from it, but it seems related to this issue since the only use of monad-par in the code is indirectly, via par-map and runPar)

Ryan Newton rrnewton referenced this issue from a commit in rrnewton/criterion
Aleksey Khudyakov Shimuuar Workaround for the bug in the monad-par
* simonmar/monad-par#23

As suggested by Simon Marlow direct scheduler is used.
1e4dd69
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.