Reducing use of F.unsafeRunSync #2735

Dennis4b · 2019-07-21T14:00:21Z

(See also my last comment in #2695)

When running with a very low number of threads in order to provoke deadlocks that could manifest under load in production I run into a deadlock in websocket/Http4sWSStage.scala:

138   override protected def stageShutdown(): Unit = {                                                                         
139     F.toIO(deadSignal.set(true)).unsafeRunSync()                                                                           
140     super.stageShutdown()                                                                                                  
141   }

where the unsafeRunSync is called, via stageShutdown all the way back to sendClose from within an effect, and can not synchronously run deadSignal.set as there are no more threads available:


"pool-1-thread-1" #15 prio=5 os_prio=0 tid=0x00007f09e6d75800 nid=0x73b0 waiting on condition [0x00007f0943ffe000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000fa9d0920> (a cats.effect.internals.IOPlatform$OneShotLatch)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at cats.effect.internals.IOPlatform$.$anonfun$unsafeResync$2(IOPlatform.scala:50)
        at cats.effect.internals.IOPlatform$$$Lambda$2210/25411324.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
        at scala.concurrent.package$.blocking(package.scala:146)
        at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:50)
        at cats.effect.IO.unsafeRunTimed(IO.scala:325)
        at cats.effect.IO.unsafeRunSync(IO.scala:240)
        at org.http4s.blazecore.websocket.Http4sWSStage.stageShutdown(Http4sWSStage.scala:139)
        at org.http4s.blaze.pipeline.Tail.closePipeline(Stages.scala:78)
        at org.http4s.blaze.pipeline.Tail.closePipeline$(Stages.scala:77)
        at org.http4s.blazecore.websocket.Http4sWSStage.closePipeline(Http4sWSStage.scala:19)
        at org.http4s.blazecore.websocket.Http4sWSStage.$anonfun$stageStartup$1(Http4sWSStage.scala:116)
        at org.http4s.blazecore.websocket.Http4sWSStage$$Lambda$2222/996996997.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at zio.internal.FiberContext.evaluateNow(FiberContext.scala:410)
        at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:599)
        at zio.internal.FiberContext$$Lambda$1657/745424370.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Is there a reason this is done synchronously, as with:

138   override protected def stageShutdown(): Unit = {                                                                         
139     F.toIO(deadSignal.set(true)).unsafeRunAsync(_ => ())                                                                   
140     super.stageShutdown()                                                                                                  
141   }

I can run my app with a single thread.

The text was updated successfully, but these errors were encountered:

rossabaker · 2019-07-21T14:34:24Z

I thought blaze internals required at least two threads, but it's possible we overcame that. I haven't tried, but I can't certify a single thread as viable. That said, I think making this async would be reasonable. There's no guarantee that wsStream's finalizers run before the stage itself is shutdown anyway.

rossabaker · 2019-07-21T14:40:13Z

Regarding #2695, I think we have to set the deadSignal in stageShutdown, which can be triggered in a lot of ways, to get the finalizers to run. I think the unsafeRunAsync would address the deadlock concerns.

I would also log any lefts in the unsafeRunAsync callback. .unsafeRunAsync(_ => ()) is roughly the same as writing an empty catch block in synchronous code.

Dennis4b · 2019-07-21T16:02:09Z

Thinking about it some more, even though using >1 threads solves the immediate deadlock, there is always the (very small but non-zero) risk of all worker threads (4 on a 4-core system?) being involved in cleaning up a connection at the same time and then deadlocking as none of them are giving up their thread until they can run their effect on another one.

I'm not sure about coding style and error logging specifics of http4s, so I hope someone else can make the PR or change.

Thank your for your quick replies Ross.

rossabaker · 2019-07-22T00:19:07Z

Your proposal looks close to good to me, if you want to try a PR. I'd say:

Target series/0.20 branch, and we can make it a patch
For error handling, make it something like:

unsafeRunAsync {
  case Left(t) => logger.error(t)("Error setting dead signal")
  case Right(_) => ()
}

Don't forget to run test:scalafmt

If you don't have time, I can in the next day or two, but I've got a few others in flight.

Dennis4b · 2019-07-22T09:50:08Z

I don't know what it is about using ZIO instead of cats.effect.IO as the effect type, or if there is some kind of synchronization point in scheduling/networking/proxying/logging, but all 12 worker threads ended up deadlocked like this after an hour of uptime with moderate load using 0.20.6.

I've created a pull request, but it's small so feel free to just add the change into some other work you're doing. I thought I could get by with 0.20.6 but am now using a local 0.20.7-SNAPSHOT.

vasilmkd · 2021-04-16T14:26:38Z

This looks like it has been already addressed?

rossabaker · 2021-04-18T04:30:26Z

Yeah, CE3 cleared that up.

rossabaker added bug Determined to be a bug in http4s module:blaze-server labels Apr 6, 2021

rossabaker closed this as completed Apr 18, 2021

rossabaker mentioned this issue Apr 18, 2021

Remove obsolete comment about deadlock #4759

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing use of F.unsafeRunSync #2735

Reducing use of F.unsafeRunSync #2735

Dennis4b commented Jul 21, 2019

rossabaker commented Jul 21, 2019

rossabaker commented Jul 21, 2019

Dennis4b commented Jul 21, 2019

rossabaker commented Jul 22, 2019

Dennis4b commented Jul 22, 2019

vasilmkd commented Apr 16, 2021

rossabaker commented Apr 18, 2021

Reducing use of F.unsafeRunSync #2735

Reducing use of F.unsafeRunSync #2735

Comments

Dennis4b commented Jul 21, 2019

rossabaker commented Jul 21, 2019

rossabaker commented Jul 21, 2019

Dennis4b commented Jul 21, 2019

rossabaker commented Jul 22, 2019

Dennis4b commented Jul 22, 2019

vasilmkd commented Apr 16, 2021

rossabaker commented Apr 18, 2021