MVar deadlock exceptions cause waitCatch to throw an exception #14

snoyberg · 2014-07-31T10:32:54Z

I originally discovered this problem when working around an issue with hlint (so pinging @ndmitchell, who may be interested). Consider the following program:

import Control.Concurrent
import Control.Concurrent.Async
import Control.Concurrent.MVar
import Control.Exception

main :: IO ()
main = do
    e <- try $ newEmptyMVar >>= takeMVar
    print (e :: Either SomeException Int)

    _ <- forkIO $ do
        e <- try $ newEmptyMVar >>= takeMVar
        print ("thread", e :: Either SomeException Int)
    threadDelay 1000000

    putStrLn "calling async"
    x <- async $ newEmptyMVar >>= takeMVar
    putStrLn "calling waitCatch"
    y <- waitCatch x
    putStrLn "completed waitCatch"
    print (y :: Either SomeException Int)

    putStrLn "Exiting..."

There are three logical blocks in this program, each doing essentially the same thing: creating an MVar deadlock exception and trying to catch it. The first approach does it in the main program thread, and the second in a child thread. Both of them work as expected: the exception is caught and printed, and the program continues executing.

However, the third block behaves differently. The MVar exception seems to not be got by the call to try in asyncUsing, and therefore the TMVar is never filled. This means that a later call to waitCatch throws an exception, and therefore the program exits prematurely. I've seen the same behavior with asyncBound as well, so I do not believe this is related to rawForkIO.

Here's an example output from my system:

$ ghc --make -O2 -threaded deadlock.hs && ./deadlock 
[1 of 1] Compiling Main             ( deadlock.hs, deadlock.o )
Linking deadlock ...
Left thread blocked indefinitely in an MVar operation
("thread",Left thread blocked indefinitely in an MVar operation)
calling async
calling waitCatch
deadlock: thread blocked indefinitely in an STM transaction

I'm on Ubuntu 12.04 64-bit, using GHC 7.8.3, stm-2.4.3 and async-2.0.1.5.

The text was updated successfully, but these errors were encountered:

snoyberg · 2014-08-03T16:04:16Z

I reduced this to a bug reproducible with just base, so this does not appear to be a bug in async. I've opened a GHC trac item for this:

https://ghc.haskell.org/trac/ghc/ticket/9401

Feel free to close this issue if you feel it makes more sense to just follow this in GHC trac.

simonmar · 2014-08-04T08:20:11Z

See the comment on https://ghc.haskell.org/trac/ghc/ticket/9401, this isn't really a bug, just a consequence of the way that deadlock detection works.

If you want a particular thread to be immune to deadlock detection, you can use this trick:

  t <- myThreadId
  s <- newStablePtr t
  -- now I'm immune!
  freeStablePtr s
  -- now I'm vulnerable again!

@simonmar

Pinging @simonmar. Thank you for your response in the tickets. However, the StablePtr technique simply caused this test suite to hang indefinitely. Instead, I'm trying to catch the BlockedIndefinitelyOnSTM exception sent by the RTS, which seems to work reliably. Pinging @sol. This change needs to be included in hspec as well. Without a patch, for example, the enclosed-exceptions test suite still fails, since hspec ends up letting the test die due to the BlockedIndefinitelyOnSTM exception being sent to it as well. I can send a pull request after this is reviewed/accepted into enclosed-exceptions.

snoyberg · 2014-08-04T09:25:54Z

Thanks for the clarification. While the StablePtr trick didn't work for me, catching the BlockedIndefinitelyOnSTM exception does seem to have solved the problem. I've created pull request jcristovao/enclosed-exceptions#5.

simonmar · 2014-08-04T09:42:57Z

Making a StablePtr will prevent the main thread from receiving deadlock exceptions, so if you need that exception to recover from a main thread deadlock then it will just hang instead. The StablePtr trick works for me on the example in https://ghc.haskell.org/trac/ghc/ticket/9401.

This uses the same technique I described in my enclosed-exceptions pull request. I think it would be better to fix this in async itself, so that other packages trying to use async for reliable exception handling get the semantics they're looking for.

Add workaround to waitCatch for #14

`waitCatch` is used to have the fix for <simonmar#14> also in `cancel`

mitchellwrosen · 2021-08-25T14:58:09Z

Here is the fix implemented in async:

wait :: Async a -> IO a
wait = tryAgain . atomically . waitSTM
  where
    -- See: https://github.com/simonmar/async/issues/14
    tryAgain f = f `catch` \BlockedIndefinitelyOnSTM -> f

But my colleague @tstat pointed out that catch executes the RHS with an "implicit mask", as documented in Catching Exceptions (but curiously missing from the haddocks of catch itself):

The difference between using try and catch for recovery is that in catch the handler is inside an implicit mask (see "Asynchronous Exceptions") which is important when catching asynchronous exceptions, but when catching other kinds of exception it is unnecessary. Furthermore it is possible to accidentally stay inside the implicit mask by tail-calling rather than returning from the handler, which is why we recommend using try rather than catch for ordinary exception recovery.

For that reason, do we want to use try instead, so the RHS (the second call to f) isn't run in a MaskedInterruptible state?

wait :: Async a -> IO a
wait = tryAgain . atomically . waitSTM
  where
    -- See: https://github.com/simonmar/async/issues/14
    tryAgain f = try f >>= \case 
      Left BlockedIndefinitelyOnSTM -> f
      Right x -> pure x

simonmar · 2021-09-07T09:18:13Z

I don't think it actually matters, because the only thing that f does is block in STM, and mask will be a no-op since blocking is interruptible. try would be slightly less efficient.

mitchellwrosen · 2021-09-15T14:57:23Z

@simonmar Thanks for the clarification. I'm still a little bit puzzled - could it not be the case that on the second invocation of

atomically (waitSTM x)

(with async exceptions masked due to catch), the STM action might not retry, and if not, any pending asynchronous exception would not be delivered until after the transaction commits?

A related question - if the transaction gets rolled back and retried due to an inconsistent view of memory, does that also count as an "interruptible" moment for the runtime to poll for an async exception?

Overall I agree there is not a very significant distinction here, but I am curious about the nitty-gritty timings :)

Thanks!

simonmar · 2021-11-08T10:15:16Z

(with async exceptions masked due to catch), the STM action might not retry, and if not, any pending asynchronous exception would not be delivered until after the transaction commits?

That's true, but you never have any guarantees about when an asynchronous exception will be delivered anyway.

I mean, strictly speaking you're probably right. I'll accept a pull request to fix it.

A related question - if the transaction gets rolled back and retried due to an inconsistent view of memory, does that also count as an "interruptible" moment for the runtime to poll for an async exception?

I don't believe so.

simonmar closed this as completed Aug 4, 2014

simonmar added a commit that referenced this issue Aug 4, 2014

Merge pull request #15 from fpco/master

d97f2db

Add workaround to waitCatch for #14

snoyberg mentioned this issue Jun 15, 2016

Use MVar in tryAny, drop async dependency jcristovao/enclosed-exceptions#9

Merged

bitonic added a commit to fpco/async that referenced this issue Jul 12, 2016

cancel docs, use waitCatch

c819101

`waitCatch` is used to have the fix for <simonmar#14> also in `cancel`

qnikst mentioned this issue May 16, 2017

STM exceptions are not written to result MVar #62

Closed

snoyberg mentioned this issue Mar 18, 2018

Test suite hangs with ghc 8.4 (apparently due to hspec-2.5.0) jcristovao/enclosed-exceptions#12

Closed

ttuegel mentioned this issue Mar 2, 2020

waitBoth: Apply the workaround from #15 #107

Merged

This was referenced Dec 20, 2022

cabal-cache exits with "thread blocked indefinitely" haskell-works/cabal-cache#76

Closed

Catch log and rethrow exceptions during download. haskell-works/cabal-cache#198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVar deadlock exceptions cause waitCatch to throw an exception #14

MVar deadlock exceptions cause waitCatch to throw an exception #14

snoyberg commented Jul 31, 2014

snoyberg commented Aug 3, 2014

simonmar commented Aug 4, 2014

snoyberg commented Aug 4, 2014

simonmar commented Aug 4, 2014

mitchellwrosen commented Aug 25, 2021

simonmar commented Sep 7, 2021

mitchellwrosen commented Sep 15, 2021

simonmar commented Nov 8, 2021

MVar deadlock exceptions cause waitCatch to throw an exception #14

MVar deadlock exceptions cause waitCatch to throw an exception #14

Comments

snoyberg commented Jul 31, 2014

snoyberg commented Aug 3, 2014

simonmar commented Aug 4, 2014

snoyberg commented Aug 4, 2014

simonmar commented Aug 4, 2014

mitchellwrosen commented Aug 25, 2021

simonmar commented Sep 7, 2021

mitchellwrosen commented Sep 15, 2021

simonmar commented Nov 8, 2021